TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Rich Hickey: Deconstructing the Database

229 点作者 noidi将近 13 年前

11 条评论

saurik将近 13 年前
Watching this talk has so far (I'm halfway through, and now giving up) been very disappointing, primarily because many of the features and implementation details ascribed to "traditional databases" are not true of the common modern SQL databases, and almost none of them are true of PostgreSQL. As an initial trivial example, many database systems allow you to store arrays. In the case of PostgreSQL, you can have quite complex data types, from dictionaries and trees to JSON, or even whatever else you want to come up with, as it is a runtime extensible system.<p>However, it really gets much deeper than these kinds of surface details. As a much more bothersome example that is quite fundamental to the point he seems to be taking with this talk, at about 15:30 he seriously says "in general, that is an update-in-place model", and then has multiple slides about the problems of this data storage model. Yet, <i>modern databases don't do this.</i> Even <i>MySQL</i> doesn't do this (anymore). Instead, modern databases use MVCC, which involves storing all historical versions of the data for at least some time; in PostgreSQL, this could be a very long time (when a manual VACUUM occurs; if you want to store things forever, this can be arranged ;P).<p><a href="http://en.wikipedia.org/wiki/Multiversion_concurrency_control" rel="nofollow">http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...</a><p>This MVCC model thereby directly solves one of the key problems he spends quite a bit of time at the beginning of his talk attempting to motivate: that multiple round-trips to the server are unable to get cohesive state; in actuality, you can easily get consistent state from these multiple queries, as within a single transaction (which, for the record, is very cheap under MVCC if you are just reading things) almost all modern databases (Oracle, PostgreSQL, MySQL...) will give you an immutable snapshot of what the database looked like when you started your transaction. The situation is actually only getting better and more efficient (I recommend looking at PostgreSQL 9.2's serializable snapshot isolation).<p>At ~20:00, he then describes the storage model he is proposing, and keys in on how important storing time is in a database; the point is also made that storing a timestamp isn't enough: that the goal should be to store a transaction identifier... but again, this is how PostgreSQL already stores its data: every version (as again: it doesn't delete data the way Rich believes it does) stores the transaction range that it is valid for. The only difference between existing SQL solutions and Rich's ideal is that it happens per row instead of per individual field (which could easily be modeled, and is simply less efficient).<p>Now, the point he makes at ~24:00 actually has some merit: that you can't easily look up this information using the presented interfaces of databases. However, if I wanted to hack that feature into PostgreSQL, it would be quite simple, as the fundamental data model is already what he wants: so much so that the indexes are still indexing the dead data, so I could not only provide a hacked up feature to query the past but I could actually do so efficiently. Talking about transactions is even already simple: you can get the identifier of a transaction using txid_current() (and look up other running transactions if you must using info tables; the aforementioned per-row transaction visibility range is even already accessible as magic xmin and xmax columns on every table).
评论 #4447604 未加载
评论 #4447630 未加载
评论 #4448092 未加载
lhnz将近 13 年前
There are two people that I will stop what I'm doing and watch every new lecture they make: Rich Hickey and Bret Victor. Both are visionaries.
评论 #4445406 未加载
评论 #4445088 未加载
评论 #4445827 未加载
评论 #4446608 未加载
评论 #4446100 未加载
erikpukinskis将近 13 年前
Fascinating stuff. Some things that came up for me while watching this and the other videos on their site[1]:<p>It's not Open Source, for anyone who cares about that. It's interesting how strange it feels to me for infrastructure code to be anything other then Open Source.<p>I'm sort of shocked that the query language is still passing strings, when Hickey made a big deal of how the old database do it that way. I guess for me a query is a data structure that we build programmatically, so why force the developer to collapse it into a string? Maybe because they want to support languages that aren't expressive enough to do that concisely?<p>[1] <a href="http://www.datomic.com/videos.html" rel="nofollow">http://www.datomic.com/videos.html</a>
评论 #4445863 未加载
sriram_malhar将近 13 年前
I'm always puzzled when the Datomic folks speak of reads not being covered under a transaction. This is dangerous.<p>Here's the scenario, that in a conventional update-oriented store, is termed as a "lost update". "A" reads object.v1, "B" reads the same version, "B" adds a fact to the object making it v2, then "A" comes along and writes obj.v3 based on its own _stale_ knowledge of the object. In effect, it has clobbered what "B" wrote, because A's write came later and has become the latest version of the object. The fact that DAtomic's transactor serialized writes is meaningless because it doesn't take into account read dependency.<p>In other words, DAtomic gives you an equivalent of Read-committed or snapshot isolation, but not true serializability. I wouldn't use it for a banking transaction for sure. To fix it, DAtomic would need to add a test-and-set primitive to implement optimistic concurrency, so that a client can say, "process this write only if this condition is still true". Otherwise, two clients are only going to be talking past each other.
评论 #4446836 未加载
评论 #4448140 未加载
评论 #4447294 未加载
评论 #4448461 未加载
bsaul将近 13 年前
Anyone understands how this system would deal with CAP theorem, in the case of a regular "add 100$ then remove 50$ to the bank account, in that order and in one go" type of transaction ? The transactor is supposed to "send" novelty to peers, so that they update their live index. That's one point where i would see trouble (suppose it lags, one "add" request goes to one peer, the "read" goes to the second, you don't find what you just add...) Another place i see where it could mess things up is the "Data store" tier, which uses the same traditional technics as of today to replicate data between different servers (one peer requests facts from a "part" of the data store that's not yet synchronized with the one a second peer requests). It seems like all those issues are addressed on his "a fact can also be a function" slide, but he skips it very quickly, so if anyone here could tell me more...
评论 #4446333 未加载
评论 #4447244 未加载
评论 #4446985 未加载
评论 #4446296 未加载
arscan将近 13 年前
I recall Datomic making a bit of a splash on HN when it was announced 6+ months ago, but basically crickets since then. Anybody build something cool that took advantage of Datomic's unique design?
评论 #4446084 未加载
评论 #4446830 未加载
评论 #4447640 未加载
brlewis将近 13 年前
Anyone have a summary for those of us who don't want to watch an hour-long video?
评论 #4445463 未加载
评论 #4445048 未加载
danecjensen将近 13 年前
this reminds me a lot of "How to beat the CAP theorem" <a href="http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html" rel="nofollow">http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html</a>
sbmassey将近 13 年前
How would you idiomatically fix invalid data in Datomic? for example, if you needed to update a badly entered value in a record, but keep the record's timestamp the same so as not to screw up historical queries?
评论 #4446718 未加载
评论 #4446759 未加载
hobbyist将近 13 年前
I often wonder, is Phd in computer science really required to do awesome work?
评论 #4446666 未加载
评论 #4446413 未加载
评论 #4446704 未加载
评论 #4446668 未加载
duck将近 13 年前
I'm getting "This video is currently unavailable"?
评论 #4445712 未加载