I run a 100 billion+ rows Postgres database [0], that is around 16TB, it's pretty painless!<p>There are a few tricks that make it run well (PostgreSQL compiled with a non-standard block size, ZFS, careful VACUUM planning). But nothing too out of the ordinary.<p>ATM, I insert about 150,000 rows a second, run 40,000 transactions a second, and read 4 million rows a second.<p>Isn't "Postgres does not scale" a strawman?<p>[0] <a href="https://www.merklemap.com/" rel="nofollow">https://www.merklemap.com/</a>
Skype open-sourced their architecture way back, using PL/Proxy to route calls based on shard. It works, is quite elegant, handled 50% of all international phone calls in the noughties. My old company used it to provide real-time analytics on about 300M mobile devices.<p><a href="https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.pdf" rel="nofollow">https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...</a><p><a href="https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_PyPGDay_2013+with+notes.pdf" rel="nofollow">https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...</a>
Probably as useful is the overview of what pgdog is and the docs. From their docs[1]: "PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code."<p>[1] <a href="https://docs.pgdog.dev/" rel="nofollow">https://docs.pgdog.dev/</a>
Another option is going full-scale with CockroachDB. We had a Django application backed by PostgreSQL, which we migrated to CockroachDB using their official backend.<p>The data migration was a pain, but it was still less painful than manually sharding the data or dealing with 3rd party extensions. Since then, we’ve had a few hiccups with autogenerated migration scripts, but overall, the experience has been quite seamless. We weren’t using any advanced PostgreSQL features, so CockroachDB has worked well.
Well, ofcourse it does! :)<p>Another (battle tested * ) solution is to deploy the (open source) Postgres distribution created by Citus (subsidiary of Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you are pretty much done: <a href="https://www.citusdata.com/product/community" rel="nofollow">https://www.citusdata.com/product/community</a><p>Slap good old trusty PgBounce in front of it if you want/need (and you probably do) connection pooling: <a href="https://www.citusdata.com/blog/2017/05/10/scaling-connections-in-postgres/" rel="nofollow">https://www.citusdata.com/blog/2017/05/10/scaling-connection...</a><p>*) Citus was purchased by Microsoft more or less solely to provide easy scale out on Azure through Cosmos DB for PostgreSQL
I’m working with several Postgres databases that share identical schemas, and I want to make their data accessible from a single interface.<p>Currently, I’m using Postgres FDWs to import the tables from those databases. I then create views that UNION ALL the relevant tables, adding a column to indicate the source database for each row.<p>This works, but I’m wondering if there’s a better way — ideally something that can query multiple databases in parallel and merge the results with a source database column included.<p>Would tools like pgdog, pgcat, pganimal be a good fit for this? I’m open to suggestions for more efficient approaches.<p>Thanks!
I'm kind of interested in why we can't make a better database with all of our modern technology.<p>Postgres is a fantastic workhorse, but it was also released in the late 80s. Who, who among you will create the database of the future... And not lock it behind bizarro licenses which force me to use telemetry.
Something I don't see in the pgdog documentation is how cross-shard joins work. Okay, if I do a simple `select * from users order by id`, you'll in-memory order the combined results for me. But if I have group by and aggregations and such? Will it resolve that correctly?
People talk about scale frequently as a single dimension (and usually volume as it relates to users) but that can be oversimplifying for many kinds of applications. For instance, as you are thinking about non-trivial partitioning schemes (like if there is high coupling between entities of the same kind - as you see in graphs) is when you should consider alternatives like the Bigtable-inspired DBs, since those are (relatively) more batteries included for you.<p>> It’s funny to write this. The Internet contains at least 1 (or maybe 2) meaty blog posts about how this is done<p>It would’ve been great to link those here. I’m guessing one refers to StackOverflow which has/had one of the more famous examples of scaled Postgres.
I run a postgresql db with a few billion rows at about 2TB right now. We don't need sharding yet but when we do I was considering Citus. Does anyone have experience implementing Citus that could comment?
Interesting technology. Similar to Citus but not built as an extension. The Citus coordinator, which is a Postgres database with the Citus extension, is replaced by a proxy layer written in Rust. That might provide more flexibility and velocity implementing distributed planning and execution than being tied to the extension ecosystem. It would indeed be a journey to catch up with Postgres on compatibility, but it's a good start.
Tangentially related: is there a good guide or setup scripts to run self hosted Postgres with backups and secondary standby? Like I just want something I can deploy to a VPS/dedicated box for all my side projects.<p>If not is supabase the most painless way to get started?
Apart from being backed by Postgres instead of MySQL, is this different from Vitess (and its commercial vendor PlanetScale)?<p><a href="https://vitess.io/" rel="nofollow">https://vitess.io/</a>
Funny how everyone eventually hits this point and thinks they're inventing fire - but then again, pushing your trusty old tools way past comfort is where cool engineering actually happens.
i was reading through this and was going "huh this sounds familiar" until i read who wrote it :)<p>neat piece of tech! excited to try it out.