Parquet

Open columnar file format optimised for analytical workloads — the standard for data-lake storage.

Definition

Apache Parquet stores tabular data column-by-column with per-column compression and statistics, making analytical scans dramatically faster than row formats like CSV. It is the default storage format on S3 and GCS data lakes, queried by DuckDB, Spark, Trino, and Snowflake. Schema evolution and predicate pushdown are first-class.

When to use

See also