Show HN: PgDog – Shard Postgres without extensions

307 포인트작성자: levkk8일 전

Hey HN! Lev here, author of PgDog (<a href="https://github.com/pgdogdev/pgdog">https://github.com/pgdogdev/pgdog</a>). I’m scaling our favorite database, PostgreSQL. PgDog is a new open source proxy, written in Rust, with first-class support for sharding — without changes to your app or needing database extensions.Here’s a walkthrough of how it works: <a href="https://www.youtube.com/watch?v=y6sebczWZ-c" rel="nofollow">https://www.youtube.com/watch?v=y6sebczWZ-c</a>Running Postgres at scale is hard. Eventually, one primary isn’t enough at which point you need to split it up. Since there is currently no good tooling out there to do this, teams end up breaking their apps apart instead.If you’re familiar with PgCat, my previous project, PgDog is its spiritual successor but with a fresh codebase and new goals. If not, PgCat is a pooler for Postgres also written in Rust.So, what’s changed and why a new project? Cross-shard queries are supported out of the box. The new architecture is more flexible, completely asynchronous and supports manipulating the Postgres protocol at any stage of query execution. (Oh, and you guessed it — I adopted a dog. Still a cat person though!)Not everything is working yet, but simple aggregates like max(), min(), count(*) and sum() are in. More complex functions like percentiles and average will require a bit more work. Sorting (i.e. ORDER BY) works, as long as the values are part of the result set, e.g.:<pre><code> SELECT id, email FROM users WHERE admin = true ORDER BY 1 DESC; </code></pre> PgDog buffers and sorts the rows in memory, before sending them to the client. Most of the time, the working set is small, so this is fine. For larger results, we need to build swap to disk, just like Postgres does, but for OLTP workloads, which PgDog is targeting, we want to keep things fast. Sorting currently works for bigint, integer, and text/varchar. It’s pretty straightforward to add all the other data types, I just need to find the time and make sure to handle binary encoding correctly.All standard Postgres features work as normal for unsharded and direct-to-shard queries. As long as you include the sharding key (a column like customer_id, for example) in your query, you won’t notice a difference.How does this compare to Citus? In case you’re not familiar, Citus is an open source extension for sharding Postgres. It runs inside a single Postgres node (a coordinator) and distributes queries between worker databases.PgDog’s architecture is fundamentally different. It runs outside the DB: it’s a proxy, so you can deploy it anywhere, including managed Postgres like RDS, Cloud SQL and others where Citus isn’t available. It’s multi-threaded and asynchronous, so it can handle thousands, if not millions, of concurrent connections. Its focus is OLTP, not OLAP. Meanwhile, Citus is more mature and has good support for cross-shard queries and aggregates. It will take PgDog a while to catch up.My Rust has improved since my last attempt at this and I learned how to use the bytes crate correctly. PgDog does almost zero memory allocations per request. That results in a 3-5% performance increase over PgCat and a much more consistent p95. If you’re obsessed with performance like me, you know that small percentage is nothing to sneeze at. Like before, multi-threaded Tokio-powered PgDog leaves the single-threaded PgBouncer in the dust (<a href="https://pgdog.dev/blog/pgbouncer-vs-pgdog">https://pgdog.dev/blog/pgbouncer-vs-pgdog</a>).Since we’re using pg_query (which itself bundles the Postgres parser), PgDog can understand all Postgres queries. This is important because we can not only correctly extract the WHERE clause and INSERT parameters for automatic routing, but also rewrite queries. This will be pretty useful when we’ll add support for more complex aggregates, like avg(), and cross-shard joins!Read/write traffic split is supported out of the box, so you can put PgDog in front of the whole cluster and ditch the code annotations. It’s also a load balancer, so you can deploy it in front of multiple replicas to get 4 9’s of uptime.One of the coolest features so far, in my opinion, is distributed COPY. This works by hacking the Postgres network protocol and sending individual rows to different shards (<a href="https://pgdog.dev/blog/hacking-postgres-wire-protocol">https://pgdog.dev/blog/hacking-postgres-wire-protocol</a>). You can just use it without thinking about cluster topology, e.g.:<pre><code> COPY temperature_records (sensor_uuid, created_at, value) FROM STDIN CSV; </code></pre> The sharding function is straight out of Postgres partitions and supports uuid v4 and bigint. Technically, it works with any data type, but I just haven’t added all the wrappers yet. Let me know if you need one.What else? Since we have the Postgres parser handy, we can inspect, block and rewrite queries. One feature I was playing with is ensuring that the app is passing in the customer_id in all queries, to avoid data leaks between tenants. Brain dump of that in my blog here: <a href="https://pgdog.dev/blog/multi-tenant-pg-can-be-easy">https://pgdog.dev/blog/multi-tenant-pg-can-be-easy</a>.What’s on the roadmap: (re)sharding Postgres using logical replication, so we can scale DBs without taking downtime. There is a neat trick on how to quickly do this on copy-on-write filesystems (like EBS used by RDS, Google Cloud volumes, ZFS, etc.). I’ll publish a blog post on this soon. More at-scale features like blocking bad queries and just general “I wish my Postgres proxy could do this” stuff. Speaking of which, if you can think of any more features you’d want, get in touch. Your wishlist can become my roadmap.PgDog is being built in the open. If you have thoughts or suggestions about this topic, I would love to hear them. Happy to listen to your battle stories with Postgres as well.Happy hacking!Lev

22 comments

jashmatthews7일 전

Hey Lev!I've been looking into PgDog for sharding a 40TB Postgres database atm vs building something ourselves. This could be a good opportunity to collaborate because what we need is something more like Vitess for PostgreSQL. The scatter gather stuff is great but what we really need is config management via something like etcd, shard splitting, best-effort transactions for doing schema changes across all shards etc.Almost totally unrelated but have you had good success using pg_query.rs to re-write queries? Maybe I misunderstood how pg_query.rs works but re-writing an AST seems like a nightmare with how the AST types don't really support mutability or deep cloning. I ended up using the sqlparser crate which supports mutability via Visitors. I have a side project I'm chipping away at to build online schema change for PG using shadow tables and logical replication ala gh-ost.Jake

评论 #44103772 未加载

评论 #44107938 未加载

denysvitali7일 전

I know this is just a small feature and probably a less meaningful one compared to the rest of the project - but for me being able to use pgdog as a way to redirect reads to read replicas and writes to the primary (w/o doing that in code) is a huge plus. Many applications out there do not support R/W splits, and having something that does that for you (at the proxy level) has always brought speed improvements for me in the past.Such a cool project, good job Lev!

评论 #44107211 未加载

评论 #44107570 未加载

williamdclt8일 전

Really impressive stuff! Very interesting, well done!I don’t know that I’d want my sharding to be so transparently handled / abstracted away. First, because usually sharding is on the tenancy boundary and I’d want friction on breaking this boundary. Second, because the implications of joining across shards are not the same as in-shard (performance, memory, cpu) and I’d want to make that explicit tooThat takes nothing out of this project, it’s really impressive stuff and there’s tons of use cases for it!

评论 #44102213 未加载

mijoharas8일 전

We've been keeping an eye on PgDog for a while, and it seems like very impressive stuff.Congrats on the launch Lev, and keep it up!

评论 #44100110 未加载

xnickb8일 전

Very interesting.For me the key point in such projects is always handling of distributed queries. It's exciting that pgDog tries to stay transparent/compatible while operating on the network layer.Of course the limitations that are mentioned in the docs are expected and will require trade-offs. I'm very curious to see how you will handle this. If there is any ongoing discussion on the topic, I'd be happy to follow and maybe even share ideas.Good luck!

评论 #44100858 未加载

Existenceblinks7일 전

Looks neat, the first thing I search for in the docs is:<pre><code> Unique indexes Not currently supported. Requires query rewriting and separate execution engine to validate uniqueness across all shards. </code></pre> But still looks promising.

评论 #44103121 未加载

PeterZaitsev7일 전

This looks awesome!What can be challenge with such solutions is getting the last 1% right when it comes to sharding tricky queries properly (or at least detecting queries which are not handled properly) and also isolation and consistency

aeyes8일 전

One of the most interesting Postgres projects I have seen in many years.The benchmarks presented only seem to address standard pooling, I'd like to see what it looks like once query parsing and cross-shard join come into play.

评论 #44102130 未加载

anurag8일 전

Much-needed innovation in scaling Postgres. Congratulations on the launch!

rco87868일 전

This looks pretty amazing. Congrats on the launch.

评论 #44099613 未加载

jimmyl028일 전

this is awesome but I'm wondering does pgdog plan to handle high availability scenarios (multiple frontend proxies)? I know this can lead to much more difficult problems with consensus and handling split brain scenarios.if not, what is the approach to enable restarts without downtime? (let's say one node crashes)?

评论 #44102046 未加载

sroussey8일 전

Nice! Reminds me of MySQLProxy back in 2007-2014 and later ProxySQL.What’s the long term (business) plan to keep it updated?

评论 #44099606 未加载

ewalk1538일 전

Is your plan to stick with a hashing algorithm for tenant sharding, or allow for more fine grain control to shift large tenants between and shards?Hot shard management is a job in of itself and adds lot of operational complexity.

评论 #44102328 未加载

JyB8일 전

This looks amazing.

theusus8일 전

I wish there was something similar for SQL Server.

评论 #44100642 未加载

评论 #44101890 未加载

achanda3588일 전

This is very cool, congrats on the launch. Do you think making this CLI/API compatible with Vitess or Citus is worth it?

评论 #44102775 未加载

deadbabe8일 전

Can someone give tangible real world metrics on when you should consider sharding a Postgres database? Thanks in advance.

评论 #44105759 未加载

dbworku6일 전

Awesome work Lev! You had me a distributed copy being able to ingest Gigs per second.

评论 #44112225 未加载

simplecto7일 전

very cool to see people go deep in the weeds to make it easier for lazy devs like me.

hn_throwaway_997일 전

I think this is a very cool project, but when I saw this statement, "Running Postgres at scale is hard. Eventually, one primary isn’t enough at which point you need to split it up.", I was reminded about this recent article about OpenAI, now one of the most visited apps/sites in the world, running on a single pg primary:<a href="https://news.ycombinator.com/item?id=44071418">https://news.ycombinator.com/item?id=44071418</a>Quite from the article:> At OpenAI, we utilize an unsharded architecture with one writer and multiple readers, demonstrating that PostgreSQL can scale gracefully under massive read loads.Of course, if you have a lot of write volume this would be an unsuitable architecture, but just a reminder that pg can scale a lot more than many people think with just a single writer.

评论 #44108862 未加载

评论 #44108321 未加载

iamdanieljohns8일 전

How does this compare to Supabase/Supavisor?

评论 #44100825 未加载

iFire8일 전

is there a network agpl exception?

评论 #44100832 未加载