The Universal Data Structure

189 pointsby elbenshiraalmost 10 years ago

20 comments

jfealmost 10 years ago

"Hashes are always O(1) reads, inserts and writes."Maybe, once you've found a location to read, insert, or write to. The author neglects the runtime cost required for the hash algorithm itself, which may not be trivial; computing the hash of a string key is typically an O(n) operation.Furthermore, unless a suitable table size is selected, integer keys (should one use a map like an array) will eventually hash to the same value, requiring even more time to iterate through all the remaining values that match that key until the desired value is found."I don’t know why you would ever use [a linked list] over an array (or a hash)..."Here's why: because arrays take up space whether you use it or not. Linked lists don't suffer from this problem, but at the cost of 1-2 pointers per item. Has the author seriously never managed memory before? Please tell me this article is a joke.

评论 #9807152 未加载

评论 #9807739 未加载

评论 #9807240 未加载

评论 #9807161 未加载

评论 #9808386 未加载

评论 #9808114 未加载

评论 #9808745 未加载

评论 #9809537 未加载

评论 #9810902 未加载

评论 #9807188 未加载

评论 #9807118 未加载

评论 #9809811 未加载

评论 #9815565 未加载

Cushmanalmost 10 years ago

This is a perfect example of the kind of humor that belongs on HN. Actually had me nodding along in parts, then screwing up my face at others. By the time I was sure it was satire, I was committed enough to see it through to the end.Best of all, I expect sincere discussion of the merits of the argument in this thread. Well executed, and heh heh.

评论 #9808111 未加载

fasteoalmost 10 years ago

Like a well trained Pavlov dog[1], reading the title brought "Lua table"[2] to my mind, the most flexible data structure I have worked with, by far.[1] <a href="https://en.wikipedia.org/wiki/Classical_conditioning" rel="nofollow">https://en.wikipedia.org/wiki/Classical_conditioning</a>[2] <a href="http://www.lua.org/pil/2.5.html" rel="nofollow">http://www.lua.org/pil/2.5.html</a>

评论 #9807073 未加载

jerfalmost 10 years ago

Having read over the entire thing, I have only one issues with it: The Universal Hash structure is only universal if the language implementing it permits you to create cyclic structures, so you can manipulate the "pointers" or relevant language concept to create the cyclic structures. There are a handful of languages that don't permit that, such as Erlang, and some languages like Haskell that permit you to use "tying the knot" to create cyclic structures [1], but ones that can sometimes be difficult to manipulate after the fact.In those languages, you'll probably need to use a data structure well known for its simplicity, efficiency, and broad cross-language-platform compatibility: The Resource Description Framework's graph model. Graphs are also a well-known candidate for "universal data structure" [2]. Also, it's semantic, which is a clear advantage over the entirely not semantic UniversalHash, because semantic is better than not semantic when it comes to universal semantics of semanticness.Semantic.Otherwise, the article is pretty solid and all serious developers should definitely consider implementing it forthwith, as they will find it a very, very educational experience. I know I've seen people implement this model in Java before, and I can certainly vouch for the fact that it was an education for all involved.I'm going to say "semantic" one more time, because it's been my observation that the more you say it, the smarter you sound, so: semantic. Oh, yeah, that's the stuff. Mmmmmmmmm.[1]: <a href="https://wiki.haskell.org/Tying_the_Knot" rel="nofollow">https://wiki.haskell.org/Tying_the_Knot</a>[2]: Seriously, if you really do want to discuss the "universal data structure", perhaps because you want to analyze data structures very generically for some mathematical reason, graphs really are a good candidate. Not necessarily RDF graphs, which are both bizarrely complicated while lacking simple features; the three blessed collection types are a weird combination of features.

评论 #9807219 未加载

carapacealmost 10 years ago

"...We propose that many, and maybe even all, interesting organizations of information and behaviour might be built from a single primitive operation: n-way associative lookup. ..."<a href="http://www.vpri.org/pdf/tr2011003_abmdb.pdf" rel="nofollow">http://www.vpri.org/pdf/tr2011003_abmdb.pdf</a>

synthmeatalmost 10 years ago

While this may be a mockery, title probably alludes to Universal Design Pattern[1], which is not so easily dismissable idea.[1] <a href="http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html" rel="nofollow">http://steve-yegge.blogspot.com/2008/10/universal-design-pat...</a>

shkkmoalmost 10 years ago

This had me worried:"God-given axioms, which is academic lingo for a truth we accept purely by our God-given logic, like: if something is not true then it is false, something cannot exist in an empty set, something logical is logical."But then I saw this and started laughing:"Remember when everyone used tables to lay out their HTML? Well, that proved to be a horrible way to do things, because tables are inherently inflexible. It’s a strictly geometric constraint. Now we all use divs and CSS, because we get a much-more flexible CSS engine to define our layout. Postgres and her SQL friends are all table based, just like the <table> of 1999. Don’t be 1999."

评论 #9807770 未加载

Totientalmost 10 years ago

Satire, aside, I think a very short addition to the last line holds a lot of truth:"A hash is simple. A hash is fast. A hash is all you need to start with".I can think of plenty of good reasons to stop using a map/hash/associative array in code, but I can't think of very many good reasons not to start coding with associative arrays as your default data structure. If there's a performance/memory problem, fix it later. I've seen a lot more code suffer from premature optimization than I've seen suffer from using data structures that were a little too inefficient.

brudgersalmost 10 years ago

When in doubt, use brute force. -- Ken ThompsonUsing hashes as a first choice data structure is not necessarily a bad idea. 1] Until profiling a working implementation demonstrates otherwise, other data structures may be premature optimization.[1] Clearly an improvement over the Lisper's association lists.

评论 #9811653 未加载

ameliusalmost 10 years ago

Ehh, relational databases already proved that "sets" are the true universal data structures. Anything can be built upon them, including (hash)maps.

评论 #9809084 未加载

评论 #9810617 未加载

评论 #9807383 未加载

Azkaralmost 10 years ago

The Poe's law is strong in this one. It had me going.

jcwildealmost 10 years ago

I wish they had used the word "map" in place of "hash" throughout this entire article. The use of hashes is an implementation detail and wholly irrelevant.

评论 #9807080 未加载

pmelendezalmost 10 years ago

>"I hope you’re convinced. A hash is simple. A hash is fast. A hash is all you need."Hash tables are cool but they are far from being the only thing you need.The problem with this kind of subtle satire without a disclaimer at the end is that some people would fall by this and blindly follow it (collisions weren't mentioned not even once nor cache misses).

评论 #9807106 未加载

erikpukinskisalmost 10 years ago

Jonathan Blow made the point recently that academic programming language writers always make the same mistake of trying to take an idea and fully radicalize it.When you go from "Objects are super easy and useful in this language" to "Everything Is An Object" you basically doom yourself to using objects to implement a bunch of stuff that doesn't really make sense as objects and could be implemented much easier as another data structure.Big-brainded academics love the challenge of "ooh, can I make everything an object?" because they are always free to decrease the scope of their research a little to compensate for the implementation taking a long time. And the more phenomena you can contort into agreement with your thesis, the more scholarpoints you get.Blow advocates "data-driven programming" which, as a rule of thumb, I translate in my head as "don't move anything around you don't have to."For example, rather than just copying a giant array of JSON objects over the wire when you only need some image URLs each with an array of timestamped strings, you write the code that serialized that data. And if you do that a few times, write the tooling you need to make that kind of thing easy.The pitch is that it's not more work. And I'm kind of convinced. It just gets rid of so much pollution when you are debugging.Your first cut of things is often a little weird: "do I need a generator generator here?" but typically you realize that a simpler solution works just as well in the refactor.When you hack in a "wrong but easier to sketch out" solution into your code as the first try, it often just lives like that forever. Correct, confusing code often collapses into correct, simple code. Simple, functional-but-wrong code just seems less inclined to self improvement.And I am continually surprised by how many problems, when simplified down as much as possible, are best described with the basics: functions, structs, arrays. You need fancy stuff sometimes for sure, but most of our human problems are trivial enough that these blunt tools suffice. I just often won't be able to see it until I've renamed all the variables three times.What's interesting is I've been doing JavaScript programming this way, and Jonathan Blow is... shall I say... not a fan of JS. But I think the concepts translate pretty well! It's just instead of targeting raw memory, you target the DOM/js runtime which is actually a pretty powerful piece of metal if you have the patience to learn about the runtime and keep thinking about your data every step of the way.

asQuirreLalmost 10 years ago

It took me until:> Hashes are always O(1) reads, inserts and writes.To realise that this was a joke.

tempodoxalmost 10 years ago

Good tongue-in-cheek. Technically, however, the hash was outdone by the Lisp CONS. CONSes could represent every conceivable data structure in 1959 and they were eaten raw by the CPUs of the time. But then, there were not so many people available for the following-blindly part.

mjcohenalmost 10 years ago

That's one of the reasons I love awk (actually, gawk): this is the only data structure it has.

评论 #9810834 未加载

malkiaalmost 10 years ago

Basic had that. It was called arrays - yay for A$()

belovedeaglealmost 10 years ago

This... this is just very, very dedicated satire, right?Let's replace main memory with hash maps and then implement existing data structures on that.Yes. This is satire.

评论 #9806846 未加载

评论 #9807901 未加载

评论 #9808582 未加载

michaelochurchalmost 10 years ago

While this is satire, it brings to mind some bit of industry history. My first reaction was, "you want to define a type class called Associative because you're talking about an interface, and that got me thinking about OOP vs. Haskell's type classes (a superior approach) (...and then I realized that the OP was a satire.)The major historical selling point of object-oriented programming (OOP) to the Forces of Evil-- not all business people are "Forces of Evil; really, there are some great business people out there, and so I'm referring specifically to cost-cutting mini-Eichmanns who've attempted to commoditize, humiliate, and infantilize us with "user scrum stories" and a culture of mediocrity-- was that OOP (after diverging far from Alan Kay's original vision) would allow the Forces of Evil to replace high-cost experts with teams of mediocre programmers, and thereby ruin the balance of power. Culturally, it worked (the culture of mediocrity is well-established in the software industry); economically, it failed (because large teams of mediocrities are actually fucking expensive, because a "10x" engineer only costs about 1.5-2.5x an average one).The sales pitch for OOP to the Forces of Evil was that OOP would make it possible to hire a couple of low-paid body-shop programmers too stupid to recognize the OP as either (a) satire or, missing the joke but still correct, just wrong. Smart wizards in the open-source world and at companies like Google would do the actual engineering that made efficient hash-maps possible, and CommodityScrumDrones would staple shit together using topologies thereof, without really understanding any of the technologies they were gluing together, and probably ignorant of why these things are sometimes called "hashes" in the first place.The problem is that when CommodityScrumDrones grow up and become middle managers and get to start making technical choices, they often make bad ones. They reject PostgreSQL as too hard to too old and use some NoSQL JSON storage engine that was a "Show HN" project 17 days ago.Even though the CommodityScrumProgrammer phenomenon has been a massive failure in economic terms-- the Forces of Evil have won on ego terms but utterly dominating their targets, but they've lost money-- it has been a cultural success that has inflicted mediocrity, anti-intellectualism, and subordinacy to "The Business", on the software industry. And now we have people calling important technical shots who have literally no idea why the OP is either satire or wrong.

评论 #9814867 未加载