Write a hash table in C

189 pointsby jmlrover 7 years ago

16 comments

fizixerover 7 years ago

I've heard of two possibilities (as a recommendation) for using data structures in C without having to develop them:- BSD queue.h offers linked lists, queues, etc [1].- UT-hash offers hash tables [2].I've also read some people being frustrated with them, probably due to the quirky syntax/API. On the plus side (BSD at least) it's just a header file that you can download, include and start using.[1] <a href="http://bxr.su/OpenBSD/sys/sys/queue.h" rel="nofollow">http://bxr.su/OpenBSD/sys/sys/queue.h</a>[2] <a href="https://troydhanson.github.io/uthash/" rel="nofollow">https://troydhanson.github.io/uthash/</a>

评论 #15085671 未加载

评论 #15085951 未加载

评论 #15086196 未加载

评论 #15085768 未加载

评论 #15087692 未加载

评论 #15087183 未加载

msarnoffover 7 years ago

"The language doesn't come with [a hash table implementation] included"The standard library does come with a half-baked hash table implementation[1] (hcreate/hdestroy/hsearch) that only allows creation of _one_ global hash table. GNU libc[2] adds hcreate_r/hdestroy_r/hsearch_r that allows multiple tables to be created.The APIs are strange and antiquated. On the other hand, the hash table implementation described in the article presents a much nicer interface.[1] <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/hcreate.html" rel="nofollow">http://pubs.opengroup.org/onlinepubs/009695399/functions/hcr...</a>[2] <a href="http://man7.org/linux/man-pages/man3/hsearch.3.html" rel="nofollow">http://man7.org/linux/man-pages/man3/hsearch.3.html</a>

评论 #15084173 未加载

评论 #15085531 未加载

sillysaurus3over 7 years ago

The link at the bottom of the intro is broken: <a href="https://github.com/jamesroutley/write-a-hash-table/blob/v0.1.0/hash-table" rel="nofollow">https://github.com/jamesroutley/write-a-hash-table/blob/v0.1...</a> preventing the reader from advancing.I think the whole thing would be best as a single page, to be honest. Even if it's long, people have scroll wheels. But it's a stylistic choice.

pieguyover 7 years ago

"(long)pow(a, len_s - (i+1))" is both inefficient and inaccurate. You should store the current power in a variable which you multiply by 'a' and reduce by 'm' on each iteration.Also, your double hashing scheme has a bug. You identified that it's problematic if hash_b returns 0, but adding 1 to the result doesn't actually solve anything, because now you have the same problem if hash_b returns num_buckets-1.

__sover 7 years ago

Weird there isn't a src directory collecting everything togetherHashtables are one of those data structures one can easily get away with never having to write on their own, yet still use every day. In scripting languages it's impractical to roll your own, & in the more abstracted low level languages there'll usually be a pretty good generic version. Yet there's much room for variationI had the joy to be out of reach of off-the-shelf hashtables for luwa,init: <a href="https://github.com/serprex/luwa/blob/436c7271120fcd116af30e906919afe7a7e55254/rt/alloc.lua#L156" rel="nofollow">https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...</a>get/set: <a href="https://github.com/serprex/luwa/blob/436c7271120fcd116af30e906919afe7a7e55254/rt/table.lua" rel="nofollow">https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...</a>hash: <a href="https://github.com/serprex/luwa/blob/436c7271120fcd116af30e906919afe7a7e55254/rt/obj.lua#L130" rel="nofollow">https://github.com/serprex/luwa/blob/436c7271120fcd116af30e9...</a>Still need to implement Lua's sequential-portion-as-array logic. I do like the API not having a 'remove' method, opting intead for nil by default. nil value entries get vacuumed out when rehashing. This ends up acting as tombstone. In an earlier iteration I was using 0 as a marker distinct from nil, but an implementation detail is that nil is stored at address 0.. oops

nathan_longover 7 years ago

If you just want to see hash tables work as a data structure, here's a talk-as-blog-post I made on the subject.<a href="http://nathanmlong.com/2015/10/reimplementing-rubys-hash/" rel="nofollow">http://nathanmlong.com/2015/10/reimplementing-rubys-hash/</a>

评论 #15084336 未加载

评论 #15084815 未加载

dullgiulioover 7 years ago

If your set is known at compile time, make a Perfect Hash Table with gperf instead. It's commonly used in compilers and other software with a parsing or interpreting stage.

benlorenzettiover 7 years ago

In the intro section, on extending to unicode being non-trivial...is the argument that if your input has some non-ascii data you probably have all non-ascii data and the above 127 bucket will be too full? Honest question.

baron816over 7 years ago

While the reasons for this that the author gave are valid, I think it would also be valuable to do this in a high level language. I'm sure lot of people who can read C code went through a CS program, and hopefully learned how hash tables work then. People without CS degrees and who mostly write JS/Ruby/Python/etc. and use hash tables on almost every line of code they write are much less likely to know how they work. Just saying, someone should do it for other languages.

评论 #15083618 未加载

评论 #15084342 未加载

评论 #15087403 未加载

clarryover 7 years ago

Don't allocate these items individually; don't store pointers to such items. They're tiny, and the pointer chasing + allocator bookkeeping overhead + potential randomisation make it inefficient for no gain. Just allocate an array of items. This way you'll also deal with fewer potential errors. Speaking of errors, please handle them and fix the API so that the caller can know if there's a problem.

评论 #15085414 未加载

smegelover 7 years ago

This might seem academic, but I actually had a real reason to hand-roll a hash table implementation that would work in process-shared memory (as an Apache module). It was based on the hash table implemented in mod_ldap, and was part of an LRU cache.

jay-andersonover 7 years ago

This is a good mostly simple overview. The appendix should also mention robin hood hashing as a method for addressing collisions. From my understanding it is cpu cache friendly and works well in practice.

peterbotondover 7 years ago

berkeley db 185. it has served me very good over the years and it was fast enough for real time market data and indexing and many more.

7Anonymousover 7 years ago

Just go back to the table of contents. The link works from there.

agumonkeyover 7 years ago

hah, just a few days ago I named a few files hshtbl.c trie.c and so on. All after a Raymond Hettinger talk about how they rethought Python 3 dicts.

cletuswover 7 years ago

Link should be changed to <a href="https://github.com/jamesroutley/write-a-hash-table" rel="nofollow">https://github.com/jamesroutley/write-a-hash-table</a>

评论 #15083347 未加载