TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Write a hash table in C

189 点作者 jmlr超过 7 年前

16 条评论

fizixer超过 7 年前
I&#x27;ve heard of two possibilities (as a recommendation) for using data structures in C without having to develop them:<p>- BSD queue.h offers linked lists, queues, etc [1].<p>- UT-hash offers hash tables [2].<p>I&#x27;ve also read some people being frustrated with them, probably due to the quirky syntax&#x2F;API. On the plus side (BSD at least) it&#x27;s just a header file that you can download, include and start using.<p>[1] <a href="http:&#x2F;&#x2F;bxr.su&#x2F;OpenBSD&#x2F;sys&#x2F;sys&#x2F;queue.h" rel="nofollow">http:&#x2F;&#x2F;bxr.su&#x2F;OpenBSD&#x2F;sys&#x2F;sys&#x2F;queue.h</a><p>[2] <a href="https:&#x2F;&#x2F;troydhanson.github.io&#x2F;uthash&#x2F;" rel="nofollow">https:&#x2F;&#x2F;troydhanson.github.io&#x2F;uthash&#x2F;</a>
评论 #15085671 未加载
评论 #15085951 未加载
评论 #15086196 未加载
评论 #15085768 未加载
评论 #15087692 未加载
评论 #15087183 未加载
msarnoff超过 7 年前
&quot;The language doesn&#x27;t come with [a hash table implementation] included&quot;<p>The standard library does come with a half-baked hash table implementation[1] (hcreate&#x2F;hdestroy&#x2F;hsearch) that only allows creation of _one_ global hash table. GNU libc[2] adds hcreate_r&#x2F;hdestroy_r&#x2F;hsearch_r that allows multiple tables to be created.<p>The APIs are strange and antiquated. On the other hand, the hash table implementation described in the article presents a much nicer interface.<p>[1] <a href="http:&#x2F;&#x2F;pubs.opengroup.org&#x2F;onlinepubs&#x2F;009695399&#x2F;functions&#x2F;hcreate.html" rel="nofollow">http:&#x2F;&#x2F;pubs.opengroup.org&#x2F;onlinepubs&#x2F;009695399&#x2F;functions&#x2F;hcr...</a><p>[2] <a href="http:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man3&#x2F;hsearch.3.html" rel="nofollow">http:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man3&#x2F;hsearch.3.html</a>
评论 #15084173 未加载
评论 #15085531 未加载
sillysaurus3超过 7 年前
The link at the bottom of the intro is broken: <a href="https:&#x2F;&#x2F;github.com&#x2F;jamesroutley&#x2F;write-a-hash-table&#x2F;blob&#x2F;v0.1.0&#x2F;hash-table" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jamesroutley&#x2F;write-a-hash-table&#x2F;blob&#x2F;v0.1...</a> preventing the reader from advancing.<p>I think the whole thing would be best as a single page, to be honest. Even if it&#x27;s long, people have scroll wheels. But it&#x27;s a stylistic choice.
pieguy超过 7 年前
&quot;(long)pow(a, len_s - (i+1))&quot; is both inefficient and inaccurate. You should store the current power in a variable which you multiply by &#x27;a&#x27; and reduce by &#x27;m&#x27; on each iteration.<p>Also, your double hashing scheme has a bug. You identified that it&#x27;s problematic if hash_b returns 0, but adding 1 to the result doesn&#x27;t actually solve anything, because now you have the same problem if hash_b returns num_buckets-1.
__s超过 7 年前
Weird there isn&#x27;t a src directory collecting everything together<p>Hashtables are one of those data structures one can easily get away with never having to write on their own, yet still use every day. In scripting languages it&#x27;s impractical to roll your own, &amp; in the more abstracted low level languages there&#x27;ll usually be a pretty good generic version. Yet there&#x27;s much room for variation<p>I had the joy to be out of reach of off-the-shelf hashtables for luwa,<p>init: <a href="https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e906919afe7a7e55254&#x2F;rt&#x2F;alloc.lua#L156" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e9...</a><p>get&#x2F;set: <a href="https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e906919afe7a7e55254&#x2F;rt&#x2F;table.lua" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e9...</a><p>hash: <a href="https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e906919afe7a7e55254&#x2F;rt&#x2F;obj.lua#L130" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;serprex&#x2F;luwa&#x2F;blob&#x2F;436c7271120fcd116af30e9...</a><p>Still need to implement Lua&#x27;s sequential-portion-as-array logic. I do like the API not having a &#x27;remove&#x27; method, opting intead for nil by default. nil value entries get vacuumed out when rehashing. This ends up acting as tombstone. In an earlier iteration I was using 0 as a marker distinct from nil, but an implementation detail is that nil is stored at address 0.. oops
nathan_long超过 7 年前
If you just want to see hash tables work as a data structure, here&#x27;s a talk-as-blog-post I made on the subject.<p><a href="http:&#x2F;&#x2F;nathanmlong.com&#x2F;2015&#x2F;10&#x2F;reimplementing-rubys-hash&#x2F;" rel="nofollow">http:&#x2F;&#x2F;nathanmlong.com&#x2F;2015&#x2F;10&#x2F;reimplementing-rubys-hash&#x2F;</a>
评论 #15084336 未加载
评论 #15084815 未加载
dullgiulio超过 7 年前
If your set is known at compile time, make a Perfect Hash Table with gperf instead. It&#x27;s commonly used in compilers and other software with a parsing or interpreting stage.
benlorenzetti超过 7 年前
In the intro section, on extending to unicode being non-trivial...is the argument that if your input has some non-ascii data you probably have all non-ascii data and the above 127 bucket will be too full? Honest question.
baron816超过 7 年前
While the reasons for this that the author gave are valid, I think it would also be valuable to do this in a high level language. I&#x27;m sure lot of people who can read C code went through a CS program, and hopefully learned how hash tables work then. People without CS degrees and who mostly write JS&#x2F;Ruby&#x2F;Python&#x2F;etc. and use hash tables on almost every line of code they write are much less likely to know how they work. Just saying, someone should do it for other languages.
评论 #15083618 未加载
评论 #15084342 未加载
评论 #15087403 未加载
clarry超过 7 年前
Don&#x27;t allocate these items individually; don&#x27;t store pointers to such items. They&#x27;re tiny, and the pointer chasing + allocator bookkeeping overhead + potential randomisation make it inefficient for no gain. Just allocate an array of items. This way you&#x27;ll also deal with fewer potential errors. Speaking of errors, please handle them and fix the API so that the caller can know if there&#x27;s a problem.
评论 #15085414 未加载
smegel超过 7 年前
This might seem academic, but I actually had a real reason to hand-roll a hash table implementation that would work in process-shared memory (as an Apache module). It was based on the hash table implemented in mod_ldap, and was part of an LRU cache.
jay-anderson超过 7 年前
This is a good mostly simple overview. The appendix should also mention robin hood hashing as a method for addressing collisions. From my understanding it is cpu cache friendly and works well in practice.
peterbotond超过 7 年前
berkeley db 185. it has served me very good over the years and it was fast enough for real time market data and indexing and many more.
7Anonymous超过 7 年前
Just go back to the table of contents. The link works from there.
agumonkey超过 7 年前
hah, just a few days ago I named a few files hshtbl.c trie.c and so on. All after a Raymond Hettinger talk about how they rethought Python 3 dicts.
cletusw超过 7 年前
Link should be changed to <a href="https:&#x2F;&#x2F;github.com&#x2F;jamesroutley&#x2F;write-a-hash-table" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jamesroutley&#x2F;write-a-hash-table</a>
评论 #15083347 未加载