At work we're in the midst of rolling out a sharded Postgres platform based on <a href="http://www.craigkerstiens.com/2012/11/30/sharding-your-database/" rel="nofollow">http://www.craigkerstiens.com/2012/11/30/sharding-your-datab...</a>, with the sharding implemented at the application level. The biggest piece of complexity in that post is around designing the sharding in such a way that you can gracefully add more shards later.<p>Having read the pg_shard readme, it's not clear to me how it addresses that issue. I'd need to have a really clear idea how to handle scaling my cluster before committing to a sharding solution.
It's great to see this released as open source (LGPL) -- looks like a really useful extension.<p>I've always been a fan of using PostgreSQL wherever possible, and this extends "where possible". As a YC S11 batchmate especially, I'm really proud of all the great contributions Citus Data has made, and what a useful tool CitusDB is (a bunch of other YC companies use it).
Huge fan of Citus Data. Their column store for postgres is really useful for building data warehouses:<p><a href="https://github.com/citusdata/cstore_fdw" rel="nofollow">https://github.com/citusdata/cstore_fdw</a>
I wonder how it compares to the slightly more well established Postgres-XL.<p>There is definitely a different approach being taken as Postgres-XL has a supervisor/loadbalancer, and pg_shard seems like every node is capable of doing all actions.<p>Excited to see it evolve.
If you want more details about pg_shard, have a look to this blog post.<p><a href="http://www.databasesoup.com/2014/12/whats-this-pgshard-thing.html" rel="nofollow">http://www.databasesoup.com/2014/12/whats-this-pgshard-thing...</a><p>It explains a bit more what it does and doesn't
Comparing this a bit with Postgres-XL, how mindful of data locality do I have to be when querying? Looks like this drops right in for existing apps, but I'd be concerned about long-term performance if I didn't tailor my app code.
Nice work. But I wonder how they handle table alterations, I couldn't see they mentioned on the docs. Is it possible at all? If it is, since pg_shard doesn't support transactions, what if alteration fails ?
Wondering if any CitusData folks can speak to whether this is being used for production workloads anywhere yet. If so, could you speak to the size/throughput of those deployments?
We have a sharding mechanism in our PHP framework and also Node as well, which can actually split shards which become too "hot" as determined by you. The whole system is online during splitting and only a small part if the system goes offline for 1 second before the final switchiver. No need to pre shard in the beginning, it actually splits according to usage later, into an unlimited number of shards.
Awesome work, I should give this a spin.<p>One of the issues I can see already is being able to support existing applications, especially ones that have transaction heavy workflows. I have the same issue with Postgres XC, supporting transactions, but not supporting savepoints.<p>But this looks like a completely different use case for postgres, as a sort of pseudo-noSQL type db.
Very cool to see this open sourced.<p>Any advice on migration process? Transferring a high write-throughput postgres instance to a multi-pg deployment with pg_shard feels pretty daunting.