Postgres Just Cracked the Top Fastest Databases for Analytics

378 点作者 moonikakiss3 个月前

23 条评论

Just to be clear, standard SQL databases are not great for large-scale analytics. I know from first hand experience and a lot of pain.We tried using Postgres with large analytics at my previous company <a href="https://threekit.com" rel="nofollow">https://threekit.com</a> but it is an absolute pain. Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.But then I tried using BigQuery on my <a href="https://web3dsurvey.com" rel="nofollow">https://web3dsurvey.com</a> side project and I just recorded raw data and then wrote queries to do real-time aggregation (with a caching layer) in a large variety of ways. And it was near instant and easy and also very cheap.So then I deployed this strategy over at Threekit.com and it also was easy and fast and cheap. Even more flexible queries than Web3DSurvey at basically no developer cost or infrastructure costs. No more need for aggregating cron-jobs or trying to decide ahead of time how users want to slice and dice the data. Real time, flexible queries on the fly via BigQuery is the way.Also BigQuery bill for <a href="https://web3dsurvey.com" rel="nofollow">https://web3dsurvey.com</a> is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data. Where as just running the cheapest Postgres SQL server on Google Cloud is like >$25/month and it is a slow one.I would never go back to traditional SQL for analytics - it was hard, slow, expensive and inflexible. Worst of all worlds.

评论 #43299597 未加载

评论 #43299622 未加载

评论 #43299511 未加载

评论 #43299591 未加载

评论 #43301221 未加载

评论 #43300071 未加载

评论 #43299583 未加载

评论 #43301131 未加载

评论 #43300189 未加载

saisrirampur3 个月前

Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:For analytics on transactional data, it looks like you'd still need to use logical replication (<a href="https://github.com/Mooncake-Labs/pg_mooncake/issues/90">https://github.com/Mooncake-Labs/pg_mooncake/issues/90</a>). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!

评论 #43300985 未加载

评论 #43298802 未加载

dleeftink3 个月前

> To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queriesSo is it Postgres or DuckDB that cracked the analytics top ?

评论 #43298154 未加载

评论 #43298125 未加载

评论 #43298133 未加载

nikita3 个月前

This is an exciting project. Few highlights: - Query processor is DuckDB - as long as it translates PG type system to DuckDB typesystem well - it will be very fast. - Data is stored on S3 in Parquet with Delta or Iceberg metadata. This is really cool. You don't need to push analytical data through WAL - only metadata goes into WAL. This mean fast loading at least in theory, and compatibility with all the Delta/Iceberg ecosystem. - Once they build real-time ingest, you can just push timeseries into this system and you don't need a second system like Clickhouse

评论 #43297834 未加载

dsiegel22753 个月前

A question that I have had for a while that I can't seem to find an answer: for teams that are using various columnar store extensions to turn Postgres into a viable OLAP solution - are they doing so in the same instance of their Postgres that they are using for OLTP? Or are they standing up a separate Postgres instance?I'm trying to understand if there is any potential performance impact on the OLTP workload by including the OLAP in the same process.

评论 #43300251 未加载

评论 #43301091 未加载

owenthejumper3 个月前

What's the business model? It's a extension that's MIT licensed, yet it has a company and a VC behind it. Sounds like a rug pull waiting to happen

评论 #43300741 未加载

评论 #43301024 未加载

评论 #43299941 未加载

spapas823 个月前

As others have mentioned the results are from an extension, not postgres.Unfortunately using an extension that's not "part" of postgres (like pg_stat_statements) is not trivial for most people since there are both technical and legal issues.

评论 #43301257 未加载

评论 #43300437 未加载

xkgt3 个月前

I read the title a couple of times and I'm still not sure it isn't misleading. The benchmarks are not just for Postgres but for Postgres with the Mooncake extension. There are also other results for Postgres with different extensions. While it does rank among the top fastest databases, it is not the fastest and not even within the top 10.

jot3 个月前

How is this different from Crunchy Warehouse which is also built on Postgres and DuckDB?<a href="https://www.crunchydata.com/products/warehouse" rel="nofollow">https://www.crunchydata.com/products/warehouse</a>

评论 #43298145 未加载

评论 #43303733 未加载

评论 #43298138 未加载

bigtones3 个月前

Looks like they're currently No. 12 on their cited Clickbench benchmark - so not quite in the top 10 three weeks later.

评论 #43298166 未加载

dcreater3 个月前

All you need is postgres: part 73

评论 #43298778 未加载

评论 #43300210 未加载

antonmks3 个月前

It is not really Postgres, the queries are run on DuckDB. Yeah, DuckDB is really fast for analytical queries.

评论 #43298208 未加载

评论 #43298238 未加载

评论 #43298241 未加载

polskibus3 个月前

How does mooncake work with std oltp workloads? Can I use Postgres with OLTP , add mooncake and expect duckdb-level of performance for OLAP queries? I know that SAP HANA has some sort of several layers of storage and automatic movement of data between them to allow for such performant OLTP/OLAP hybrid, and I think this is the holy grail for cheap/open source db. Users need OLTP first but don’t want to add Kafka + clickhouse or similar pipelines when OLAP is needed.

评论 #43301199 未加载

mirekrusin3 个月前

Why timescaleDB is not mentioned anywhere? Genuine question.

评论 #43302286 未加载

评论 #43320263 未加载

pier253 个月前

They're already working on 0.2<a href="https://www.mooncake.dev/blog/pgmooncake-v02discussion" rel="nofollow">https://www.mooncake.dev/blog/pgmooncake-v02discussion</a>

评论 #43298292 未加载

rubenvanwyk3 个月前

Looking at the effort being put in it seems there’s a massive market for proper Postgres analytics / columnstore, which is very telling of the pain involved in moving data between systems.

评论 #43298767 未加载

osigurdson3 个月前

It would be great if Postgres had native columnar storage.

scirob3 个月前

Any differentiation vs Hydra ? They also put duckdb inside pg <a href="https://www.hydra.so/">https://www.hydra.so/</a>

评论 #43301174 未加载

tarun_anand3 个月前

Will this work alongside Citus? If yes, what benefits/disadvantages would it have? I presume at some point of time workloads do need a cluster.

评论 #43301243 未加载

xiphias23 个月前

It's cool proof that it's worth to spend the effort for Postgres authors to add vectorization to the default columnstore.

timsuchanek3 个月前

Congrats on the launch! Does it support ordinary db migrations with the normal Postgres interface?

rafael33 个月前

Noob question: why is there no Hadoop cluster on Clickbench?

评论 #43301235 未加载

stoicfungi3 个月前

what are the differences between pg_mooncake and pg_duck?

评论 #43301223 未加载