ActorDB is a neat project. Simplified, it's a distributed SQLite where consistency is guaranteed by the Raft protocol.<p>However, it has a unique and initially confusing data model where the database is divided into "actors", which are self-contained shards. For example, a database modeled on Hacker News would probably have an actor per user and an actor per story, and probably an actor per thread.<p>Every actor acts like a self-contained database, with its own set of tables. When you want to query or update data, you first tell it which actor(s) to operate on; but unlike systems like Cassandra, the sharding is explicit, in that the shards have identifiers, and there's no automatic sharding function. Indeed, you can have actors that act like a basic database, e.g. a global, shared list of lookup values could be a single actor.<p>You have ACID transactions within actors, and you can also do transactions across actors, and queries can also span multiple actors. You can't do joins across actors, as far as I recall. Schema migrations also become interesting, since each actor has its own, entirely separate schema.
Digging through the docs, I can't find any actual information of the consistency and isolation guarantees other than "consistent" and "ACID". That's an immediate red flag for me. What are the actual isolation characteristics of the database?<p>Does it use snapshot isolation? Is it serializable? Is it linerizable? With all the great work Kyle Kingsbury (aka Aphyr) has done on the Jepsen tests, it's pretty clear that claiming to be "ACID" with no additional info isn't sufficient for a modern database.
ActorDB sounds like an interesting piece of software. It looks like it's good for simple applications that have large amounts of data. Some thoughts I have based on the docs:<p>ActorDB has no concurrency at the actor level. This makes it a poor fit for applications that have lots of concurrency around the same pieces of data. A long running read or write on a single actor will lock out any other reads or writes.<p>Likewise, distributed transactions lock all actors involved in the transaction until the transaction completes.<p>It seems like reads have to go through a round of Raft. This increases latency for reads. It also decreases throughput, although I'm not sure how big of a deal that is given the lack of concurrency.<p>It's unclear to me how ActorDB guarantees serializability for multi-actor transactions. You need some way to guarantee two multi-actor transactions will execute in the same order on every actor. Based on the docs, Raft is performed at the actor level and not across multiple actors. ActorDB does use two-phase commit to guarantee atomicity across multiple actors, but there's no description of how it handles serializability.<p>Based on my reading, ActorDB is good if you have lots of data and your queries have either low concurrency and you don't require high throughput. If you have high concurrency or require high throughput, my guess is ActorDB will be a poor fit.
They have an introductory blog post that explains the database fairly well (summary: it was designed for a shard per user): <a href="http://blog.biokoda.com/post/112206754025/why-we-built-actordb" rel="nofollow">http://blog.biokoda.com/post/112206754025/why-we-built-actor...</a><p>If you are looking at distributed SQLite solutions there is also rqlite/dqlite and bedrock.
I used to work with Pivotal's Greenplum, which is also a distributed db, and I quiet liked it. Basically a postgresql with syntax sugar on partition , and distribution across servers. I had the pleasure to never need any index in the database.<p>This sounds to be an interesting project as well.<p>The question to me always about "how this will makes the project that use it have little learning curve for the new recruits, easy to understand integration in the code level and low maintenance on the long run"
Another popular SQLite + Raft project is rqlite ( <a href="https://github.com/rqlite/rqlite" rel="nofollow">https://github.com/rqlite/rqlite</a> ), but the goals seem slightly different (rqlite only uses distribution for fault-tolerance, not for sharding).
An 'explicitly sharded' SQL database - effectively as many databases as you need with some special sauce to let you run transactions across multiple shards. Neat.
I think the data model is similar to virtual actors [0] which have persistent state. There is also CLR and JVM specific implementations of this model.<p>[0] <a href="https://www.microsoft.com/en-us/research/project/orleans-virtual-actors/" rel="nofollow">https://www.microsoft.com/en-us/research/project/orleans-vir...</a>
> Think of running a large mail service, dropbox, evernote, etc. They all require server side storage for user data, but the vast majority of queries is within a specific user.<p>I have this exact use case for a new project I’m working on so I’m fine with the 1-db-per-user approach. ActorDB definitely sounds very interesting, but are there any alternatives to compare it with?
How do schema upgrades work? It looks like a specific actor type is tied to a specific schema but I assume the schema upgrades must be eventually consistent unless they are using 2 phase commit across all actors of the type?
ActorDB might be great for distributed systems. How does it compare to CocktoachDB? The latter does transaction planning across shards?<p>None of these DBs seem to be byzantine fault tolerant, though. Any examples of BFT databases?
> with the scalability of a KV store, while keeping the query capabilities of a relational database.<p>So not doing good query performance for OLAP/DS. And "core" DB is in Erlang.
Another option for scaling SQL is the Vitess OSS. Might give it a look. Uses a similar pattern of using an actor up front and then scaling MySQL on the back-end.<p><a href="https://vitess.io/overview/" rel="nofollow">https://vitess.io/overview/</a><p>Would like a similar solution but with Postgres on the back end instead.