TechEcho

9 comments

RMarcus7 months ago

This is awesome, thanks for creating this. I've had to write some absolutely wonky scripts to dump a PostgreSQL database into Parquet, or read a Parquet file into PostgreSQL. Normally some terrible combination of psycopg and pyarrow, which worked, but it was ad-hoc and slightly different every time.A lot of other commenters are talking about `pg_duckdb` which maybe also could've solved my problem, but this looks quite simple and clean.I hope for some kind of near-term future where there's some standardish analytics-friendly data archival format. I think Parquet is the closest thing we have now.

linuxhansl7 months ago

Parquet itself is actually not that interesting. It should be able to read (and even write) Iceberg tables.Also, how does it compare to pg_duckdb (which adds DuckDB execution to Postgres including reading parquet and Iceberg), or duck_fdw (which wraps a DuckDB database, which can be in memory and only pass-through Iceberg/Parquet tables)?

评论 #41874177 未加载

评论 #41876793 未加载

评论 #41874044 未加载

whalesalad7 months ago

I wish RDS made it easy to add custom extensions like this.

评论 #41918212 未加载

评论 #41874650 未加载

评论 #41875299 未加载

oulipo7 months ago

Cool, would this be better than using a clickhouse / duckdb extension that reads postgres and saves to Parquet?What would be recommended to output regularly old data to S3 as parquet file? To use a cron job which launches a second Postgres process connecting to the database and extracting the data, or using the regular database instance? doesn't that slow down the instance too much?

评论 #41874097 未加载

评论 #41881043 未加载

aamederen7 months ago

Congratulations! I'm happy to see the PostgreSQL license.

drewbitt7 months ago

<a href="https://github.com/pgspider/parquet_s3_fdw">https://github.com/pgspider/parquet_s3_fdw</a> is the foreign data wrapper alternative

jeadie7 months ago

Why not just federate Postgres and parquet files? That way the query planner can push down as much of the query and reduce how much data has to move about?

jakozaur7 months ago

It's good for small data, but the Iceberg format would be nicer for bigger data sets.

fforflo7 months ago

I can see myself using this as alternative to foreign data wrappers and/or pg_dump even.

9 comments

RMarcus7 months ago

linuxhansl7 months ago

评论 #41874177 未加载

评论 #41876793 未加载

评论 #41874044 未加载

whalesalad7 months ago

I wish RDS made it easy to add custom extensions like this.

评论 #41918212 未加载

评论 #41874650 未加载

评论 #41875299 未加载

oulipo7 months ago

评论 #41874097 未加载

评论 #41881043 未加载

aamederen7 months ago

Congratulations! I'm happy to see the PostgreSQL license.

drewbitt7 months ago

<a href="https://github.com/pgspider/parquet_s3_fdw">https://github.com/pgspider/parquet_s3_fdw</a> is the foreign data wrapper alternative

jeadie7 months ago

Why not just federate Postgres and parquet files? That way the query planner can push down as much of the query and reduce how much data has to move about?

jakozaur7 months ago

It's good for small data, but the Iceberg format would be nicer for bigger data sets.

fforflo7 months ago

I can see myself using this as alternative to foreign data wrappers and/or pg_dump even.

Pg_parquet: An extension to connect Postgres and parquet

9 comments

Pg_parquet: An extension to connect Postgres and parquet

9 comments