I have more experience running large MySQL databases. Both at SurveyMonkey and Zapier our primary databases were MySQL and were <i>massive</i> as you can imagine from massively scaled consumer products. I won't list the data here since this is about postgres but wanted to provide the context.<p>The largest postgres I've personally administered is on RDS:<p><pre><code> * postgres 13.4
* db.m6g.2xlarge (8vCPU, 32 GB RAM)
* 1.3tb storage
* Largest tables: 317GB, 144GB, and 142GB
</code></pre>
I effectively treat this database as an analytics database and write massive queries with joins against those largest tables and postgres handles it just fine.<p>I have had random bad query plans when using CTEs that get cached and take down the entire DB for a bit but usually been able to fix it with `vacuum analyze` and the customizing the query to convince postgres to choose a better plan.<p>Overall I think postgres can do a lot of work with very little.
A few years ago, I used to run and maintain a block explorer that indexed Bitcoin mainnet, in the end, this turned out to be a ~2 TB database which was too expensive to host on any managed database, hence, I ended up running it on the same server that exposed the API.<p>I remember when I had to run this for the first time, I faced a few challenges, for example:<p>- I usually don't care about optimizing the db schema because postgres can handle most projects without much effort, this wasn't the case anymore, there were indexes that I had to drop because these were causing inserts to become considerably slower and they took a few GB to store them.<p>- Column types started to matter, I had some columns that were stored as hex-strings but I ended up switching to BYTEA to save some space.<p>- The reads were slow until I updated the postgres settings, postgres default settings are very conservative and while they can work for many projects, those won't work when you have TB of data.<p>- While this isn't database specific, offset-pagination does not work anymore and I switched all of these queries to scroll-based pagination.<p>- Applying database migrations isn't trivial anymore because the some operations could lock the database for hours, for example, updating a column type from TEXT to BYTEA isn't an option if you want to avoid many hours of downtime, instead, you have to create a new column and migrate the rows in the background, once the migration is ready, drop the old column.<p>Overall, it was a fun journey that required many tries to get a decent performance. There are some other details to consider but it's been a few years since I did this.
I have a +1tb Postgres database at home that has market data in it. Moving it over to another Postgres db that has timeseries extensions to compact and increase performance. Also thinking about just partitioning everything up into parquet files. Query performance isn’t the best and am currently trying to improve it.
<a href="https://www.adyen.com/blog/updating-a-50-terabyte-postgresql-database" rel="nofollow">https://www.adyen.com/blog/updating-a-50-terabyte-postgresql...</a><p>And that was in _2018_. I wonder how big it is now.
I'm managing a few databases for living, from pg dbs of few hundreds of GBs on-prem, up to few TBs on rds, and almost 1PB cluster of 20 warehouses on Snowflake. Which of those count? (doing data architect/eng for living)