TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Evolutionary (binary) JSON data store (full immutable revision history)

1 pointsby lichtenbergerover 1 year ago
I&#x27;ve already posted the project a couple of years ago and it gained some interest, but a lot of stuff has been done since then, especially regarding performance, a complete new JSON store, a REST API, various internals refactored, an improved JSONiq based query engine allowing updates, a now already dated web UI, a new Kotlin based CLI, a Python and TypeScript client to ease the use of Sirix...<p>First prototypes from a precursor stem already from 2005.<p>So, what is it all about?<p>I&#x27;m working on an evolutionary data store in my spare time[1]. It is based on the idea to get rid of the need for a second trx log (the WAL) by using a persistent tree of tries (preserving the previous revision through copy on write and path copying to the root) index as the log itself with only a single permitted read&#x2F;write txn concurrently and in parallel to N read-only txns, which are bound to specific revisions during the start. The single writer is permitted on a resource (comparable to a table&#x2F;relation in a relational DB) basis within a database, reads do not involve any locks at all.<p>The idea is, that the system atomically swaps the tree root to the new version (replicated). If something fails the log can simply be truncated to the former tree root.<p>Thus, the system has many similarities with Git (structural sharing of unchanged nodes&#x2F;pages) and ZFS snapshots (regarding the latter the keyed trie has been inspired by ZFS, as well as that checksums for child pages are stored in parent pages in the references to the child pages)[2].<p>You can of course simply execute time travel queries on the whole revision history, add commit comments and the author to answer questions such as who committed what at which point in time and why...<p>The system not only copies full data pages, but it applies a sliding snapshot versioning algorithm to keep storage space to a minimum.<p>Thus, it&#x27;s best suited for fast flash drives with fast random reads and sequential writes. Data is never overwritten, thus audit trails are given for free.<p>The system stores find granular JSON nodes, thus the structure and size of an object has almost no limits. A path summary is built, which is an unordered set of all paths to leaf nodes in the tree and enables various optimizations. Furthermore a rolling hash is optionally built, whereas during inserts all ancestor node hashes are adapted.<p>Furthermore it optionally keeps track of update operations and the ctx nodes involved during txn commits. Thus, you can easily get the changes between revisions, you can check the full history of nodes, as well as navigate in time to the first revision, the last revision, the next and previous revision of a node...<p>You can also open a revision at a specific system time revert to a revision and commit a new version while preserving all revisions in-between.<p>As said one feature is, that the objects can be arbitrarily nested, thus almost no limits in the number and updates are cheap.<p>A dated Jupyter notebook with some examples can be found in [3] and overall documentation in [4].<p>The query engine[5] Brackit is retargetable (a couple of interfaces and rewrite rules have to be implemented for DB systems) and especially finds implicit joins and applies known algorithms from the relational DB systems world to optimize joins and aggregate functions due to set-oriented processing of the operators.[6]<p>I&#x27;ve given an interview in [7], but I&#x27;m usually very nervous, so don&#x27;t judge too harshly.<p>Give it a try and happy coding!<p>Kind regards<p>Johannes<p>[1] https:&#x2F;&#x2F;sirix.io | https:&#x2F;&#x2F;github.com&#x2F;sirixdb&#x2F;sirix<p>[2] https:&#x2F;&#x2F;sirix.io&#x2F;docs&#x2F;concepts.html<p>[3] https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1NNn1nwSbK6hAekzo1YbED52RI3NMqqbG#scrollTo=CBWQIvc0Ov3P<p>[4] https:&#x2F;&#x2F;sirix.io&#x2F;docs&#x2F;<p>[5] http:&#x2F;&#x2F;brackit.io<p>[6] https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;19eC-UfJVm_gCjY--koOWN50sgiFa5hSC<p>[7] https:&#x2F;&#x2F;youtu.be&#x2F;Ee-5ruydgqo?si=Ift73d49w84RJWb2

no comments

no comments