Making Sense of Big Data File Formats: Avro and Parquet

Raoul-Gabriel Urma | Sunday 12:30 | Room C

Modern applications generate and manipulate a lot of data. The growth rate of the data is staggering. Unfortunately, large datasets can be expensive to store at large scale and also slow to process. In fact, memory speed has been evolving at a much lower rate in comparison to CPUs. Thankfully, there are various file formats suited for big data systems to help. In this talk, you will learn about two popular file formats suitable for big data systems: Avro and Parquet. Through live coded examples in Python, you will learn the good, the bad, the ugly, and how you can make use of Avro and Parquet in practice.