Making Postgres scale

225 pointsby levkk2 months ago

22 comments

Eikon2 months ago

I run a 100 billion+ rows Postgres database [0], that is around 16TB, it's pretty painless!There are a few tricks that make it run well (PostgreSQL compiled with a non-standard block size, ZFS, careful VACUUM planning). But nothing too out of the ordinary.ATM, I insert about 150,000 rows a second, run 40,000 transactions a second, and read 4 million rows a second.Isn't "Postgres does not scale" a strawman?[0] <a href="https://www.merklemap.com/" rel="nofollow">https://www.merklemap.com/</a>

评论 #43368719 未加载

评论 #43369560 未加载

评论 #43367324 未加载

评论 #43365900 未加载

评论 #43372324 未加载

评论 #43368155 未加载

评论 #43367489 未加载

评论 #43370724 未加载

评论 #43368531 未加载

评论 #43366122 未加载

评论 #43375007 未加载

评论 #43368050 未加载

评论 #43369008 未加载

评论 #43366286 未加载

评论 #43366521 未加载

评论 #43370742 未加载

评论 #43366339 未加载

评论 #43367681 未加载

评论 #43366298 未加载

评论 #43369681 未加载

aprdm2 months ago

99.9% of the companies in the world will never need more than 1 beefy box running postgres with a replica for a manual failover and/or reads.

评论 #43368137 未加载

fmajid2 months ago

Skype open-sourced their architecture way back, using PL/Proxy to route calls based on shard. It works, is quite elegant, handled 50% of all international phone calls in the noughties. My old company used it to provide real-time analytics on about 300M mobile devices.<a href="https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.pdf" rel="nofollow">https://wiki.postgresql.org/images/2/28/Moskva_DB_Tools.v3.p...</a><a href="https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_PyPGDay_2013+with+notes.pdf" rel="nofollow">https://s3.amazonaws.com/apsalar_docs/presentations/Apsalar_...</a>

评论 #43366559 未加载

craigkerstiens2 months ago

Probably as useful is the overview of what pgdog is and the docs. From their docs[1]: "PgDog is a sharder, connection pooler and load balancer for PostgreSQL. Written in Rust, PgDog is fast, reliable and scales databases horizontally without requiring changes to application code."[1] <a href="https://docs.pgdog.dev/" rel="nofollow">https://docs.pgdog.dev/</a>

rednafi2 months ago

Another option is going full-scale with CockroachDB. We had a Django application backed by PostgreSQL, which we migrated to CockroachDB using their official backend.The data migration was a pain, but it was still less painful than manually sharding the data or dealing with 3rd party extensions. Since then, we’ve had a few hiccups with autogenerated migration scripts, but overall, the experience has been quite seamless. We weren’t using any advanced PostgreSQL features, so CockroachDB has worked well.

评论 #43365602 未加载

评论 #43367222 未加载

评论 #43365685 未加载

mindcrash2 months ago

Well, ofcourse it does! :)Another (battle tested * ) solution is to deploy the (open source) Postgres distribution created by Citus (subsidiary of Microsoft) on nodes running on Ubuntu, Debian or Red Hat and you are pretty much done: <a href="https://www.citusdata.com/product/community" rel="nofollow">https://www.citusdata.com/product/community</a>Slap good old trusty PgBounce in front of it if you want/need (and you probably do) connection pooling: <a href="https://www.citusdata.com/blog/2017/05/10/scaling-connections-in-postgres/" rel="nofollow">https://www.citusdata.com/blog/2017/05/10/scaling-connection...</a>*) Citus was purchased by Microsoft more or less solely to provide easy scale out on Azure through Cosmos DB for PostgreSQL

评论 #43365495 未加载

gourneau2 months ago

I’m working with several Postgres databases that share identical schemas, and I want to make their data accessible from a single interface.Currently, I’m using Postgres FDWs to import the tables from those databases. I then create views that UNION ALL the relevant tables, adding a column to indicate the source database for each row.This works, but I’m wondering if there’s a better way — ideally something that can query multiple databases in parallel and merge the results with a source database column included.Would tools like pgdog, pgcat, pganimal be a good fit for this? I’m open to suggestions for more efficient approaches.Thanks!

9999000009992 months ago

I'm kind of interested in why we can't make a better database with all of our modern technology.Postgres is a fantastic workhorse, but it was also released in the late 80s. Who, who among you will create the database of the future... And not lock it behind bizarro licenses which force me to use telemetry.

评论 #43365274 未加载

评论 #43365060 未加载

评论 #43365484 未加载

评论 #43365437 未加载

评论 #43368833 未加载

评论 #43376454 未加载

评论 #43365224 未加载

eximius2 months ago

Something I don't see in the pgdog documentation is how cross-shard joins work. Okay, if I do a simple `select * from users order by id`, you'll in-memory order the combined results for me. But if I have group by and aggregations and such? Will it resolve that correctly?

评论 #43367460 未加载

briankelly2 months ago

People talk about scale frequently as a single dimension (and usually volume as it relates to users) but that can be oversimplifying for many kinds of applications. For instance, as you are thinking about non-trivial partitioning schemes (like if there is high coupling between entities of the same kind - as you see in graphs) is when you should consider alternatives like the Bigtable-inspired DBs, since those are (relatively) more batteries included for you.> It’s funny to write this. The Internet contains at least 1 (or maybe 2) meaty blog posts about how this is doneIt would’ve been great to link those here. I’m guessing one refers to StackOverflow which has/had one of the more famous examples of scaled Postgres.

评论 #43368482 未加载

评论 #43366240 未加载

fourseventy2 months ago

I run a postgresql db with a few billion rows at about 2TB right now. We don't need sharding yet but when we do I was considering Citus. Does anyone have experience implementing Citus that could comment?

评论 #43368322 未加载

saisrirampur2 months ago

Interesting technology. Similar to Citus but not built as an extension. The Citus coordinator, which is a Postgres database with the Citus extension, is replaced by a proxy layer written in Rust. That might provide more flexibility and velocity implementing distributed planning and execution than being tied to the extension ecosystem. It would indeed be a journey to catch up with Postgres on compatibility, but it's a good start.

评论 #43365637 未加载

Karupan2 months ago

Tangentially related: is there a good guide or setup scripts to run self hosted Postgres with backups and secondary standby? Like I just want something I can deploy to a VPS/dedicated box for all my side projects.If not is supabase the most painless way to get started?

akshayshah2 months ago

Apart from being backed by Postgres instead of MySQL, is this different from Vitess (and its commercial vendor PlanetScale)?<a href="https://vitess.io/" rel="nofollow">https://vitess.io/</a>

评论 #43367948 未加载

sourtrident2 months ago

Funny how everyone eventually hits this point and thinks they're inventing fire - but then again, pushing your trusty old tools way past comfort is where cool engineering actually happens.

remram2 months ago

Never heard of pgdog before. How does it compare to citus?

评论 #43365418 未加载

评论 #43365242 未加载

评论 #43365553 未加载

moribvndvs2 months ago

DynamoDB is most certainly not the way.

Nelkins2 months ago

How does PgDog handle a column like:`id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY`

评论 #43393146 未加载

roark_howard2 months ago

Shouldn't the title be 'Learning to scale Postgres'?

kristianpaul2 months ago

Logical replication works both ways, thats a good start.

octernion2 months ago

i was reading through this and was going "huh this sounds familiar" until i read who wrote it :)neat piece of tech! excited to try it out.

rohan_2 months ago

Couldn't they have just moved to Aurora DSQL and saved all the headache?

评论 #43365772 未加载

评论 #43368562 未加载