Parquet
Open columnar file format optimised for analytical workloads — the standard for data-lake storage.
Definition
Apache Parquet stores tabular data column-by-column with per-column compression and statistics, making analytical scans dramatically faster than row formats like CSV. It is the default storage format on S3 and GCS data lakes, queried by DuckDB, Spark, Trino, and Snowflake. Schema evolution and predicate pushdown are first-class.
When to use
See also
- DuckDB — Embedded analytical SQL database — like SQLite for columnar OLAP queries over Parquet and CSV.