On a somewhat meta note:<p>>Over the last two and a half years, we’ve picked up a few tips and tools about scaling Postgres that we wanted to share—<i>things we wish we knew when we first launched Instagram.</i><p>A common failure mode for myself and, I suspect, others, is thinking that we have to know <i>every single thing</i> before we start. Good old geek perfectionism of wanting an ideal, elegant setup. A sort of Platonic Ideal, if you will.<p>These guys went on to build one of the <i>hottest web properties on earth,</i> and <i>they</i> didn't get it all right up front.<p>If you're postponing something because you think you need to master all the intricacies of EC2, Postgres, Rails or $Technology_Name, pay close attention to this example. While they were launching growing, and being acquired for a cool billion, were you agonizing over the perfect hba.conf?<p>More a note to myself than anything else :)
One of the largest services on the net, and their summary of their database experience is "Overall, we’ve been very happy with Postgres’ performance and reliability."<p>Go Bears. That's awesome. And we should all take a hint...
If high-performance PostgreSQL is critical to your job, here are some resources:<p><a href="http://wiki.postgresql.org/wiki/Slow_Query_Questions" rel="nofollow">http://wiki.postgresql.org/wiki/Slow_Query_Questions</a><p>Query analysis tool <a href="http://explain.depesz.com" rel="nofollow">http://explain.depesz.com</a><p>The mailing list <a href="http://www.postgresql.org/list/pgsql-performance/" rel="nofollow">http://www.postgresql.org/list/pgsql-performance/</a><p>Greg Smith's book <a href="http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Smith/dp/184951030X" rel="nofollow">http://www.amazon.com/PostgreSQL-High-Performance-Gregory-Sm...</a><p>#postgresql on freenode.net
I am really glad to see all the adoption and recognition that Postgres is receiving nowadays, there was a time when all you could hear about was MySQL (or maybe this is just my perception). It seems to me that it has picked up even more after the Oracle takeover of MySQL, but it could also be that their feature set has reached a pretty mature point, or maybe a combination of both.
Read scalability has been greatly improved in Postgres 9.2.
It scales pretty much linearly to 64 concurrent clients. Goes up to 350,000 queries per second!<p>Write throughput has been improved as well.<p>Check out Josh Berkus's (one of 7 core team members of the Postgres dev team) presentation on what's new in Postgres 9.2:<p><a href="http://developer.postgresql.org/~josh/releases/9.2/92_grand_prix.pdf" rel="nofollow">http://developer.postgresql.org/~josh/releases/9.2/92_grand_...</a>
Dear Instagram,<p>How do you deploy database updates? With Rails-style migrations?<p>One thing that bugs me about migrations is that if you use functions or views, the function/view definition has to be copied to a new file. It makes it difficult to see what's been changed. I'm looking forward to <a href="http://sqitch.org/" rel="nofollow">http://sqitch.org/</a> for this reason. (slides: <a href="http://www.slideshare.net/justatheory/sqitch-pgconsimple-sql-change-management-with-sqitch" rel="nofollow">http://www.slideshare.net/justatheory/sqitch-pgconsimple-sql...</a>)
I enjoyed this article and also found a link to this one which I found equally interesting:<p><a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram" rel="nofollow">http://instagram-engineering.tumblr.com/post/10853187575/sha...</a><p>I worked with the Flickr-style ticket DB id setup at Etsy, and while it was lovely once it was all set up, it's way more complicated (requiring two dedicated servers and a lot of software and operations stuff.) The solution outlined by instagram of just having a clever schema layout and stored procedure that safely allocates IDs locally on each logical shard is elegant and I'm having a hard time blowing holes in it.
> we’re now pushing over 10,000 likes per second at peak<p>These kinds of stats always sound so impressive, but let's imagine:<p><pre><code> - 8 byte timestamp
- 8 byte user ID
- 8 byte post ID
- 128 bytes DBMS overhead
- 128 bytes for user->like index
- 128 bytes for post->like index
</code></pre>
= 3.96MiB/second, or ~1015 IOPs/second, or 342GB per day absolute worst case. A single economy machine with an even remotely decent SSD could handle a full day's data at these rates.
We use a lot of PostGIS via GeoDjango, and I made a mental note to remember this article if I my postgres instances ever start ailing. Unfortunately, we haven't pushed these limits nearly as much as Instagram.
Anyone use one of these tools? First I've heard of them, and would seem to make my inner control-freak happy.<p><pre><code> pg_reorg can re-organize tables without any locks, and can be a better
alternative of CLUSTER and VACUUM FULL. This project is not active now,
and fork project "pg_repack" takes over its role. See
https://github.com/reorg/pg_repack for details.</code></pre>