Citus 12: Schema-based sharding for PostgreSQL

160 pointsby mulanderalmost 2 years ago

10 comments

mslotalmost 2 years ago

Nice to see this on HN :)The high-level is: You enable a setting and every CREATE SCHEMA creates a new shard. All the tables in the schema will be co-located so you can have efficient joins & foreign keys between the tables.On top of that, you can also have reference tables that are replicated to all nodes, again for fast joins & foreign keys with all schemas.Everything else is about making every PostgreSQL feature work as seamlessly as if there was no sharding. You can still do things like transactions across schemas, create and use custom types, access controls, work with other extensions, use procedures, etc.

评论 #36781392 未加载

metadatalmost 2 years ago

Does Oracle support anything like this? Or any other DBMS in widespread use, for that matter?The promise behind this approach to DB sharding has great potential. Simultaneously impressive, novel, and badass.I wish this had been available ten years ago at a few of my startups!

评论 #36780474 未加载

评论 #36786714 未加载

评论 #36780067 未加载

评论 #36782398 未加载

pickledishalmost 2 years ago

Hm, question for people a bit more familiar with Postgres -- what is meant by "schema" here?My definition is "the columns and column types of a table", but, that doesn't seem to make sense with what they're talking about here ("large" and "small" schemas probably aren't referring to wide and narrow tables for example, and I don't see how sharding by my definition of "schema" could even make sense anyways)

评论 #36778011 未加载

评论 #36777908 未加载

评论 #36777937 未加载

评论 #36777884 未加载

评论 #36777924 未加载

skunkworkeralmost 2 years ago

Having used schema based sharding in postgres before, I hope citus has a way around connection pooling and pgbouncer, as if you use pgbouncer transaction pooling your search_path could disappear at any time.

评论 #36779951 未加载

评论 #36779986 未加载

stevefan1999almost 2 years ago

What happens if one node lost their shards due to external event? (e.g. Disk corruption, physcially destroyed like OVH Cloud) I do understand we still have to actively backup but I rather not serve any data than serving wrong data at the time of severe outage. Also I want to see any forward error correction code (FECC) would be implemented in Citus so we can do this on the fly rather than relying on RAID, e.g. RAID10, RAIDZ1, RAIDZ2

评论 #36782583 未加载

asahalmost 2 years ago

love citus! this is useful.one gotcha: schemas are a weird old thing in SQL that are kinda the worst of all worlds, basically more like prefixing your SQL object names than a real level of indirection. Schema objects can't be manipulated as a batch, they don't provide the isolation / access control of databases, and can't be manipulated in DML and require stored procedures to manipulate.

评论 #36778239 未加载

评论 #36779298 未加载

评论 #36778626 未加载

binwiederhieralmost 2 years ago

It is worth noting that Citus was acquired by Microsoft a few years ago (not a secret, but may not be obvious to everyone), and they have since shifted heavily towards the Azure side of the world.Back when it happened we tried to buy their product and they were not sure if the Citus standalone product was even going to exist, and they refused to demo it even. Odd timing possibly, but it's a data point.On the positive side of things, the shard rebalancing was not open source back then iirc, which made the open source version pretty useless. Now it seems to be open source: <a href="https://www.citusdata.com/product/comparison" rel="nofollow noreferrer">https://www.citusdata.com/product/comparison</a> -- pretty cool.I'd still be careful to bank on it as a Citus only customer or open source user.

评论 #36797723 未加载

mariocesaralmost 2 years ago

I'm having trouble getting CitusDB to work with RDS, even though I'd really like to use it with AWS. Whenever I try to research how to make it happen, I get stuck in a lot of challenges and end up concluding that it might not be possible. While I could use Azure instead, I'm hesitant because I have a lot of resources and infrastructure in AWS and it wouldn't make sense to move the database layer. The idea of having to do DBA and Ops work on my own EC2 instances is overwhelming. I would appreciate any guidance on how to use RDS with Citus on their documentation. It's concerning that Citus might be withholding information about the possibility of using RDS, given that Microsoft is its partner.

评论 #36779760 未加载

评论 #36780428 未加载

评论 #36783580 未加载

评论 #36779108 未加载

__salmost 2 years ago

What's advantage over having tenant id as distribution column? Seems like you make schema name the distribution column. Maybe gross setups where same name function definition varies between schemas (been there done that, don't want to do it again)Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use caseCouldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every rowCan one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

评论 #36783292 未加载

评论 #36782533 未加载

potamicalmost 2 years ago

Doesn't sharding imply horizontal partitioning? Some of their examples show vertical partitioning use cases. Not sure how I feel about that. I can't imagine why you would do microservices and host their databases as schemas on the same instance.