For years, the Shapefile and PostGIS covered most vector needs. But as datasets grow into the hundreds of millions of features, analytics workflows need something built for scale. That something is increasingly GeoParquet.
What makes it different
GeoParquet stores geometries in Apache Parquet's columnar format. That brings:
- Excellent compression and fast column scans.
- Predicate pushdown, so queries read only relevant row groups.
- First-class support in modern engines like DuckDB, Apache Arrow, and Spark.
Where it shines
GeoParquet is a strong fit for analytical workloads — large joins, aggregations, and batch processing — and for portable data exchange between systems. Paired with DuckDB's spatial extension, you can run powerful spatial SQL over huge datasets on a laptop.
Where it doesn't (yet)
It is not a replacement for a transactional spatial database. If you need concurrent edits and row-level updates, PostGIS remains the right tool. The two complement each other: PostGIS for operations, GeoParquet for analytics and distribution.