TechEcho

5 comments

kpmahalmost 2 years ago

Author here, 8 years on.Although the advantages are real, I can't say I have had much opportunity to implement schemas like this. The extra complexity is usually what gets in the way, and it can add difficulty to migrations.I think it would be useful in certain scenarios, for specific parts of an application. Usually where the history is relevant to the user. I think using it more generally could be helped by some theoretical tooling for common patterns and data migrations.

评论 #36484668 未加载

评论 #36478379 未加载

pil0ualmost 2 years ago

From a user's perspective, I can see a privacy drawback as well.Suppose that instead of a typical User table, you have a User_Revision table like suggested. Every time a user updates their account settings, you INSERT a new row there. If a user changes their email address, you get a row each time they update it.Not only the company gets an history of email addresses, but also they are tied to each other. If this information gets leaked, the user is exposed to more vectors of attack.

评论 #36476691 未加载

评论 #36478011 未加载

评论 #36477678 未加载

roenxialmost 2 years ago

I'll argue that this is bad design. It works as long as the amount of data is small, but even then taking a low-data scenario and building lots of views or triggers just seems a bit weird. When I do that, it works for a month and then falls over in maintenance because you've got a table-based database where none of the important data is in a table?! This is not a design that will be flexible if needs change even slightly.Immutable design is extremely powerful, but it needs to be a first-class citizen to get full benefit. Clojure's data structures are a great study in this - they squeeze a shocking amount of efficiency out because they have guarantees that the underlying data is immutable (eg, copy-and-slightly-update a large object is effectively a free operation, we have old & new objects available for comparison and that is lovely). Mimicking the same style of programming in, say, Java would gain none of the performance advantages or the logical conveniences. I expect it would be an uncomfortable programmer experience.Here I think it would be more effective to design the tables as-usual and keep a log of all the changes separately. There is a chance the log will get out of sync with the active tables but frankly if that is a problem go use something designed with immutability in mind and don't twist PostgreSQL into pretzel shapes.

评论 #36478168 未加载

pharmakomalmost 2 years ago

It’s a mistake to use timestamp for sorting versions instead of revision_id imo

评论 #36477031 未加载

crdrostalmost 2 years ago

(1) You will benefit immensely in the applications built from these sorts of structures by keeping the version log for an entity in a separate table from the current value. First because you can just use old-school triggers, “any update over here triggers inserting a new row into the corresponding versions table,” you don't need to even care about implementing the versioning in application logic. Second because the sorts of queries that would involve looking at previous versions tend to not be the same as the sorts of queries for general data manipulation.(2) To give an instance of where (1) becomes important, suppose you change some X to X' and then want to change it back. Suppose that after the change some entity was deleted—X foreign keys to a now-deleted value, X' does not. Most applications that try to shove both current state and history into one ubertable disable a bunch of constraint checking and other suchness, and permit this dubious feature of partially-rolling-back into an inconsistent state. But if you just DELETED the row when you said you had, then you would have gotten a foreign-key-error and your user would have copy-pasted you on their “unexpected error occurred” error message and you'd immediately be able to diagnose what foreign key constraint was blocking the undo, rather than mysterious failures several weeks later.(3) Regardless of your stance on (1), once your application supports deletion, your relational integrity usually suffers because the technically correct value for all of the columns in a deleted-row is to make them all null. This is basically the problem that databases do not have sum types. A sum type in a database is not hard to create once you need it, create a row that has 3 columns which foreign-key to other tables, plus constraints that exactly one of these values is non-null. So the very lightweight construction if you are upset about denormalizing your data is for a Cat in your Cats table in your PetStore database to be a nullable pointer to a CatVersion. So that's how to proceed if you REALLY want to normalize.(4) All of the above assumes that for every edit to an entity you will save a new row in the versions table, copying all of the other data. The problem is that inevitably some tables get super wide as they have to hold dozens of pieces of business data together, and it's never the ones that you initially expected. There is an easy fix for this as well, it is for versions to also be “mutable.” Whaaaaa??? Yes. Snapshots plus deltas. It's not really mutable because it's append-only.

5 comments

kpmahalmost 2 years ago

评论 #36484668 未加载

评论 #36478379 未加载

pil0ualmost 2 years ago

评论 #36476691 未加载

评论 #36478011 未加载

评论 #36477678 未加载

roenxialmost 2 years ago

评论 #36478168 未加载

pharmakomalmost 2 years ago

It’s a mistake to use timestamp for sorting versions instead of revision_id imo

评论 #36477031 未加载

crdrostalmost 2 years ago

Immutable Data (2015)

5 comments

Immutable Data (2015)

5 comments