First off... I'm excited to see this project. There's a lot of potential here and this looks like a good implementation of a nice concept. I have at least a bit of authority behind that statement, since a few years ago, I had the opportunity to build something similar (although smaller in ambition.) A couple things to think about:<p><i></i>* Type accretion - This doesn't change the fact that database clients need to be able to accept historical data formats if they need to access historical data. The schema can't be changed for the older data objects without changing the hashes for that data, so there's no way to do something like a schema migration would work in SQL. For simple schema changes like adding fields, this might not be so hard to deal with, but some changes will be structural in nature and change the relative paths between objects. (This adds complexity to the code of database clients, as well as testing effort.)<p><i></i>* Security - Is there a way to secure objects stored within noms? Let's say I store $SECRET into noms and get back a hash. Does it then become the case that every user with access to the database and the hash can now retrieve the $SECRET? What if permissions need to be granted or revoked to a particular object after it's been stored? A field within a particular object? What if an object shouldn't have been stored in the database at all and needs to be obliterated? (This last problem gets worse if the object to be obliterated contains the only path to data that needs to be retained.)<p><i></i>* Performance - The CAS model effectively takes the stored data, runs it through a blender, and returns you a grey goo of hashes...this is good for replication, but it means you can't get much meaningful information out of a hash. This tends to mean a lot of operations like you might find in an old-school navigational database, and a huge dependency on the time to fetch an object given a hash. Indices can help by reducing the complexity of the traversals you need to do, but only if they're current and you have the index you need.<p><i></i>* Data roll off - How do you expire off data so that it doesn't just monotonically increase in volume? Let's say there's an API to mark an object as purgeable, the problem of identifying other purgeable objects turns into effectively a garbage collection process. (git gc, etc.) There's also the issue of the sheer number of objects that can be involved. The system I was involved with had something like 500K objects/day that had to be purged after 120 days in the system. (Total of 60MM objects line and around 6TB or so) Identifying 500K objects to purge and then specifying those to the data layer for action is not necessarily an easy thing....<p><i></i>* Querying - Server side query logic (and an expression language) is basically essential to performance. Otherwise, you wind up with a network round trip for every edge of the graph you follow. Going back to my first point, whatever querying language is used has to be flexible enough to handle a schema that might be varying over time (through schema accretion).<p>All four of these bullet points are worthy of a great deal more discussion, and I haven't even broached issues around conflict resolution, differencing, UI concerns, etc. I think there are good approaches to managing lots of these issues, but there's a bunch of engineering involved, as well as some close attention to scope and goals...