This is awesome. I have experience with running a CitusDB cluster and it pretty much solved a lot of the scaling problems I was having at the time. For it to go open source now, is of huge benefit to the future projects I have.<p>> With the release of newly open sourced Citus v5.0, pg_shard's codebase has been merged into Citus...<p>This is fantastic, sounds like the setup process is much simpler.<p>I wonder if they have introduced the Active/Active Master solution they were working on? I know before, there is 1 Master and multiple Worker nodes. The solution before was to have a passive backup of the Master.<p>If say, they released the Active/Active Master later on this year. That's huge. I can pretty much think of my DB solution as done at this point.
I've been unable to find any clear description of the capabilities of Citus and competing solutions (postgres-x2 seems the other leader).<p>Which of these are supported:<p>1. Full PostgreSQL SQL language<p>2. All isolation levels including Serializable (in the sense that they actually provide the same guarantees as normal PostgreSQL)<p>3. Never losing any committed data on sub-majority failures (i.e. synchronous replication)<p>4. Ability to automatically distribute the data (i.e. sharding)<p>5. Ability to replicate the data instead or in addition to sharding<p>6. Transactionally-correct read scalability<p>7. Transactionally-correct write scalability where possible
(i.e. multi-master replication)<p>8. Automatic configuration only requiring to specify some sort of "cluster identifier" the node belongs to
If anyone from Citus is reading this: how does this affect your business model? I remember when I asked at Strata conf a couple of years ago why isn't your stuff Open Source, the answer then was "because revenue". So what changed since then?
This is awesome! Tebrikler (congrats) on the release of 5.0 and going OS, definitely great news.<p>Can you publish competitive positioning of Citus vs Actian Matrix (nee ParAccel) and Vertica? I'd love to compare them side by side - even if it's just from your point of view :-)
Unforking is a very smart decision. Postgres also has gained a lot of favour since MySQL was bought by Oracle. Altogether Citus has earned a lot of kudos for that move, at least with me, for all that may count!
Citus can parallelize SQL queries across a cluster and across multiple CPU cores. How does it compare with the upcoming 9.6 version of PostgreSQL which will support parallel-able sequential scans, parallel joins and parallel aggregate ?
This is fantastic news! Postgres does not have a terribly strong High Availability story so far and of course it also does not scale out vertically.
I have looked at CitusDB in the past, but was always put off by its closed-source nature. Opening it up seems like a great move for them and for all Postgres users. I can imagine that a very active open-source community will develop around it.
I'd very much like to see what algorithm these systems are using to enable transactions in a distributed environment. Are they just using straight two-phase commit, and letting the whole transaction fail if a single server goes down? Or are are they getting fancy and doing some kind of replication with consensus?
This is great!<p>One thing I'm having trouble with is finding information about transactional semantics. If I make several updates (to differently sharded keys) in a single transaction, will the transaction boundaries be preserved (committed "locally" first, then replicated atomically to shards)? Or will they fan out to different shards with separate begin/commit statements? Or without transactional boundaries at all?<p>In fact, I can't really find any information on how CitusDB achieves its transparent sharding for queries and writes. Does it add triggers to distributed tables to rewrite inserts, updates and deletes? Or are tables renamed and replaced with foreign tables? I wish the documentation was a bit more extensive.
My guess is that Citus is making enough money from consulting that they don't need to keep this code closed source when they can profit from free community-driven growth while they are expanding their sales pipeline through consulting.
Congratulations, Citus.<p>Since I heard last year at PgConfSV that you will be releasing CitusDB 5.0 as open source, I've been waiting for this moment to come.<p>It makes 9.5's awesome capabilities to be augmented with sharding and distributed queries. While this targets real-time analytics and OLAP scenarios, being an open source extension to 9.5 means that a whole lot of users will benefit from this, even under more OLTP-like scenarios.<p>Now that Citus is open source, ToroDB will add a new CitusDB backend soon, to scale-out the Citus way, rather than in a Mongo way :)<p>Keep up with the good work!
I don't have a ton of experience scaling out and using different flavors of PostgreSQL but I had run across Postgres-XL not long ago; does anyone know how this compares to that?
Does CitusDb fit in olap analytical workloads to do aggregations on hundreds millions of records using varying order and size of dimensions (eg druid) in max of 3 seconds response time using as few boxes as possible - Or there are other techniques have to be used along with Citusdb? Can you shed a light on your experience with CloudFlare in terms of cluster size and queries perf?
Great product - If would be nice to have a Admin interface like RethinkDB where you can clearly define your replication and Sharding settings.
Any documentation around how to do this from command line ?
I recently switched back to MariaDB because I didn't see a clear/easy path for Postgres scalability in case the project i am working on takes off. I am under the assumption there are at least two fairly simple approaches to scale MySQL; master-master replication using Galera and Aurora from AWS. What do you guys think? Am I right in thinking MySQL is easier to scale given I want to spend the least amount of time maintaining it.
Being burned before,I will never use an OS infrastructure project that has enterprise features you need to pay for. They always try to move you to paid and make the OSS version unpleasant to use over time as soon as the bean counters take over to milk you<p>"For customers with large production deployments, we also offer an enterprise edition that comes with additional functionality"
One must thank them for open sourcing this, and cannot blame them for using a different license, but using a different license makes me think calling this "unfork" is bending the truth a little bit.