TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A visual interactive guide to Bloom filters

286 点作者 flyingsky超过 1 年前

17 条评论

nalgeon超过 1 年前
Sam&#x27;s visual tutorials are as good as ever (doggos are very important, by the way).<p>And if you want to try some code, I&#x27;ve prepared three toy Bloom filter implementations to accompany the article:<p>JavaScript: <a href="https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?engine=browser&amp;sandbox=javascript&amp;src=gist:e7bde93f98c5e47ca38d359a5104fd88:bloom.js" rel="nofollow">https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?engine=browser&amp;sandbox=javascript&amp;...</a><p>Python: <a href="https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?sandbox=python&amp;src=gist:e7bde93f98c5e47ca38d359a5104fd88:bloom.py" rel="nofollow">https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?sandbox=python&amp;src=gist:e7bde93f98...</a><p>Go: <a href="https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?sandbox=go&amp;src=gist:e7bde93f98c5e47ca38d359a5104fd88:bloom.go" rel="nofollow">https:&#x2F;&#x2F;codapi.org&#x2F;embed&#x2F;?sandbox=go&amp;src=gist:e7bde93f98c5e4...</a>
samwho超过 1 年前
I&#x27;m the author of this post, and am happy to answer any questions. :)
评论 #39443118 未加载
评论 #39456271 未加载
svraghavan超过 1 年前
Thanks to the author for all the hard work. Its easy to consume and understand for beginners too.
评论 #39443170 未加载
naugtur超过 1 年前
I remember discussing bloom and cuckoo filters a while back and someone linked.to.great papers from late 2010s with much more advanced probabilistic structures that were handling modification much better or had other interesting properties. Anyone here knows where to look for these? I didn&#x27;t take notes...
评论 #39447049 未加载
评论 #39448672 未加载
the_arun超过 1 年前
What are the business use cases we use bloom filters? Some examples would be great. Thanks!
评论 #39448842 未加载
评论 #39445346 未加载
评论 #39447295 未加载
评论 #39448538 未加载
评论 #39444027 未加载
franky47超过 1 年前
A wonderful article, thank you.<p>Rather than using multiple hash functions, would it make more sense to use a single algorithm over (prefix | input), with k different prefixes? This may allow computing those hashes in parallel, using SIMD for example, and caching the prehash state of the prefixes.<p>Edit: looks like there has been some research on this: <a href="https:&#x2F;&#x2F;ieeexplore.ieee.org&#x2F;document&#x2F;8462781" rel="nofollow">https:&#x2F;&#x2F;ieeexplore.ieee.org&#x2F;document&#x2F;8462781</a>
评论 #39446827 未加载
评论 #39448493 未加载
Crier1002大约 1 年前
this is a great article! I love how you explained it. here are my takeaways from it: - Bloom filter is a space-efficient data structure - You&#x27;re trading off some accuracy (possibility of false positives) for the significantly smaller memory footprint and faster membership checks
Bloomfu超过 1 年前
Very good explanation of Bloom filters, but are they optimum from a complexity point of view given a false positive rate and a bounded number of bits? Are there other kind of filters if we relax de condición of zero false negative rate?, are there other better but more complex data structures able to provide the same effect?
评论 #39448197 未加载
lormayna超过 1 年前
For whom that are interested I suggest also to have a look on Cukoo Filter: <a href="https:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~dga&#x2F;papers&#x2F;cuckoo-conext2014.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.cmu.edu&#x2F;~dga&#x2F;papers&#x2F;cuckoo-conext2014.pdf</a>
评论 #39466084 未加载
machekb超过 1 年前
I appreciate the author&#x27;s dedication to making animations accessible!
评论 #39439867 未加载
dndn1超过 1 年前
These are beautiful interactives along with the rest by Sam!
评论 #39442352 未加载
martinky24超过 1 年前
Awesome resource! Bloom filters are one of my favorite data structure (that I wish I got to employ more often in my line of work… but alas).
评论 #39443196 未加载
Epicism超过 1 年前
Very cool! I was going to read up on bloom filters and this popped up the next day!
评论 #39452593 未加载
who-shot-jr超过 1 年前
Amazing! Very well explained.
评论 #39444162 未加载
ctxc超过 1 年前
Flows beautifully. Looking forward to the next one Sam!
评论 #39444784 未加载
grupthink超过 1 年前
Would it be better to store the hash of each output into 3 different arrays?<p>e.g. currently, it&#x27;s doing:<p><pre><code> [input] -&gt; [3 hashes] -&gt; [single storage] </code></pre> but, I&#x27;m wondering if it&#x27;s better to do:<p><pre><code> [input] -&gt; [hash a] -&gt; [storage a] -&gt; [hash b] -&gt; [storage b] -&gt; [hash c] -&gt; [storage c] </code></pre> ...this way, the output of the 3 hashes don&#x27;t affect each other during the set and membership-check. I wonder how that would affect how much more data you can store. I&#x27;m sure someone has considered this.
评论 #39450924 未加载
nhggfu超过 1 年前
very well written and informative.
评论 #39444168 未加载