LiteTree: SQLite with Branches

343 pointsby kroggenover 6 years ago

22 comments

Fossil[1] is a SCM system (like git) created by the very same author of SQLite (D. Richard Hipp). It uses SQLite as its database and implements versioning and branching[2] and even merging (which LiteTree doesn't do) on its own, by recording the changes on each item on a separate table.This approach is more complex to implement but a lot more versatile and flexible. Most of times you wouldn't want to version or branch the whole database, but only parts of it.[1] <a href="https://www.fossil-scm.org" rel="nofollow">https://www.fossil-scm.org</a>[2] <a href="https://www.fossil-scm.org/index.html/doc/trunk/www/branching.wiki" rel="nofollow">https://www.fossil-scm.org/index.html/doc/trunk/www/branchin...</a>

评论 #17869235 未加载

nneonneoover 6 years ago

Thanks for posting this. My first thought was - has this been sent through the official SQLite battery of tests? If so, have the tests been adapted to validate branches, rapid branch switches, branching under failure conditions (malloc fails, power outages, etc) and concurrent access patterns?One of the reasons why SQLite is so widely used is that it is carefully tested and shown to be reliable even in potentially faulty conditions. As detailed on <a href="https://sqlite.org/testing.html" rel="nofollow">https://sqlite.org/testing.html</a>, there are three test sets, one of which is public (the TCL set). I’d love to see test results to assure the safety of any data stored in LiteTree.

评论 #17866714 未加载

beardicusover 6 years ago

> LiteTree is more than TWICE AS FAST than normal SQLite on Linux and MacOSX!!!In my experience, claims like these usually end up showing that the author didn't understand the `PRAGMA synchronous` setting at all, or they chose to ignore it to juice their stats.In this benchmarking test are the data durability guarantees the same for both LiteTree and vanilla SQLite?

评论 #17868111 未加载

评论 #17870552 未加载

评论 #17867557 未加载

cannadayrover 6 years ago

Neat. I'll have to compare this to my own implementation.<a href="https://github.com/cannadayr/git-sqlite" rel="nofollow">https://github.com/cannadayr/git-sqlite</a>Instead of storing the transactions as a separate lmdb commit, I decided to store the database in a git repository and expose the diffs using sqlite's sqldiff utility. This allowed my workflow to be almost unchanged and limits the dependencies to git, sqlite, sqldiff, & bash.

评论 #17866163 未加载

DocSavageover 6 years ago

There has been earlier work on getting git-style branched versioning on top of databases. For relational databases, OrpheusDB (<a href="http://orpheus-db.github.io/" rel="nofollow">http://orpheus-db.github.io/</a>) puts a layer over PostgreSQL. They also supply a gRPC layer for interacting with the server.For key-value systems, there are simple techniques for adding branched versioning to key-value (particularly ordered key-value) stores. We are using it for our research dataservice that holds 25+ TB of Connectomics data, which includes 3d image and segmentation data (<a href="http://dvid.io" rel="nofollow">http://dvid.io</a>). Our paper is currently under review but should have been out several years ago :) We can use a variety of key-value storage backends and are experimenting with versioned relational DBs, so I'll definitely give LiteTree a look.

评论 #17868489 未加载

rubyfanover 6 years ago

Can anyone elaborate a use case for something like this? I’m guessing there’s some blockchain connection but it’s not immediately obvious

评论 #17867326 未加载

评论 #17867844 未加载

评论 #17868467 未加载

natmakaover 6 years ago

Is the function similar to PostgreSQL's deprecated "Time Travel" <a href="https://www.postgresql.org/docs/6.3/static/c0503.htm" rel="nofollow">https://www.postgresql.org/docs/6.3/static/c0503.htm</a> ?AFAIK this can be a foundation for some form of Snapshot Isolation <a href="https://www.sqliteconcepts.org/SI_index.html" rel="nofollow">https://www.sqliteconcepts.org/SI_index.html</a> (?)

评论 #17866125 未加载

transfireover 6 years ago

If merge gets supported than it could serve as an alternative for program development -- using tables to store function definitions, constants, etc. instead of using flat files.

评论 #17867004 未加载

评论 #17866499 未加载

aureboxover 6 years ago

I am looking for exactly for this kind of implementation for my work project - having a DB using version control model.However I need a production ready solution.There is also: <a href="https://github.com/attic-labs/noms" rel="nofollow">https://github.com/attic-labs/noms</a> But the project does not seem mature enough.Do you know if there is any way to achieve this with an aim for production? What would be the best way/stack to get this result with current available tools?

评论 #17866684 未加载

2T1Qka0rEiProver 6 years ago

I'm looking at this will little knowledge of how this makes the blockchain application easier. What seems odd to me is that merging branches isn't supported? So you can't perform a bunch of "transactions" and then merge them back into your master state. Maybe someone could illuminate the purpose this solves a little more clearly, as I'm guessing it has nothing to do with my naive understanding.

评论 #17867792 未加载

评论 #17870050 未加载

评论 #17867311 未加载

tripueover 6 years ago

Interesting project. How do you achieve theses performances ?

评论 #17866073 未加载

andridkover 6 years ago

Very interesting stuff!Is it possible to see a history of a column, table, schema, etc? Is it possible to tag a certain point in time?It would be liberating for many schema designs that we could just change stuff and be sure that the database knew what was changed and when with the ability to roll changes back.

mingodadover 6 years ago

Looking at the README it's not clear how indexes are managed. Like when we create a branch and add some data to an existing table and move back to a previous branch and try to add data with the same index keys ?

评论 #17866792 未加载

nathancahillover 6 years ago

Interesting, I implemented something similar a long time ago, have to see if I can dig up the source code. The goal was to support forking data without duplicating unchanged data.

评论 #17866809 未加载

masa331over 6 years ago

This looks great. Thank you for creating it and sharing it

amiroucheover 6 years ago

Why did you choose LMDB among leveldb, wiredtiger and bsddb or even gdbm?It seems like you do not rely on range queries at all.

评论 #17873124 未加载

geordeeover 6 years ago

Interesting. The branches could solve the "date-effective" table designs. In the past I had used Git as a database to store multiple versions of a document efficiently.Or this could be used as some elementary partitioning logic where each branch is effectively a partition.

srikuover 6 years ago

The use case seems to overlap with noms dB - <a href="https://github.com/attic-labs/noms" rel="nofollow">https://github.com/attic-labs/noms</a>Noms doesn't have the appeal of SQL, but it is versioned and forkable and strongly typed data.

评论 #17869402 未加载

chaz6over 6 years ago

This is interesting and I hope I can find a use case for it. However, the performance compared to vanilla SQLite makes me anxious that there is a trade-off elsewhere, such as crash integrity.

评论 #17866758 未加载

amiroucheover 6 years ago

> LiteTree is implemented storing the SQLite db pages on LMDB.Why are you doing it like that? Does it lead to some limitation of some sort? Like making merge very costly?

评论 #17873218 未加载

boksioraover 6 years ago

great stuff :) this is innovation :)

coleiferover 6 years ago

Cool project thanks for sharing your work. There's an older project using lmdb (which doesn't support branching or anything, just for storage)...is litetree's usage of lmdb comparable to what sqlightning does? How does litetree work with the write-ahead log? How do multiple concurrent connections interact? Are multiple writers allowed? Can readers and writer(s) coexist?

评论 #17873310 未加载

22 comments

diego_moitaover 6 years ago

评论 #17869235 未加载

nneonneoover 6 years ago

评论 #17866714 未加载

beardicusover 6 years ago

评论 #17868111 未加载

评论 #17870552 未加载

评论 #17867557 未加载

cannadayrover 6 years ago

评论 #17866163 未加载

DocSavageover 6 years ago

评论 #17868489 未加载

rubyfanover 6 years ago

Can anyone elaborate a use case for something like this? I’m guessing there’s some blockchain connection but it’s not immediately obvious

评论 #17867326 未加载

评论 #17867844 未加载

评论 #17868467 未加载

natmakaover 6 years ago

评论 #17866125 未加载

transfireover 6 years ago

If merge gets supported than it could serve as an alternative for program development -- using tables to store function definitions, constants, etc. instead of using flat files.

评论 #17867004 未加载

评论 #17866499 未加载

aureboxover 6 years ago

评论 #17866684 未加载

2T1Qka0rEiProver 6 years ago

评论 #17867792 未加载

评论 #17870050 未加载

评论 #17867311 未加载

tripueover 6 years ago

Interesting project. How do you achieve theses performances ?

评论 #17866073 未加载

andridkover 6 years ago

mingodadover 6 years ago

评论 #17866792 未加载

nathancahillover 6 years ago

Interesting, I implemented something similar a long time ago, have to see if I can dig up the source code. The goal was to support forking data without duplicating unchanged data.

评论 #17866809 未加载

masa331over 6 years ago

This looks great. Thank you for creating it and sharing it

amiroucheover 6 years ago

Why did you choose LMDB among leveldb, wiredtiger and bsddb or even gdbm?It seems like you do not rely on range queries at all.

评论 #17873124 未加载

geordeeover 6 years ago

srikuover 6 years ago

评论 #17869402 未加载

chaz6over 6 years ago

This is interesting and I hope I can find a use case for it. However, the performance compared to vanilla SQLite makes me anxious that there is a trade-off elsewhere, such as crash integrity.

评论 #17866758 未加载

amiroucheover 6 years ago

> LiteTree is implemented storing the SQLite db pages on LMDB.Why are you doing it like that? Does it lead to some limitation of some sort? Like making merge very costly?