TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Scalable Bloom Filters (2007) [pdf]

126 pointsby setraover 7 years ago

7 comments

signa11about 7 years ago
this is a kind of tangential comment&#x2F;rant :<p>but to me it seems that research papers _must_ be, for lack of a better term, <i>runnnable</i>. i would, and hopefully others as well, like to, replicate all these wonderful results that are advertised in these papers. without that, they are all just advertisements of scholarship rather than scholarship themselves. a set of instructions + environment which generated these figures would be very welcome.<p>&lt;end-rant&gt;<p>on the subject of bloom filters, have a look at this one: <a href="https:&#x2F;&#x2F;www3.cs.stonybrook.edu&#x2F;~ppandey&#x2F;files&#x2F;p775-pandey.pdf" rel="nofollow">https:&#x2F;&#x2F;www3.cs.stonybrook.edu&#x2F;~ppandey&#x2F;files&#x2F;p775-pandey.pd...</a> (A General-Purpose Counting Filter: Making Every Bit Count)
评论 #16435308 未加载
opencoffabout 7 years ago
I used this paper for building a scalable bloom filter for use in an ad-tech stack. The performance was better than DJB&#x27;s CDB.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;opencoff&#x2F;portable-lib" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;opencoff&#x2F;portable-lib</a><p>The bloom filter code is in src&#x2F;bloom.c; the Header file is in inc&#x2F;utils&#x2F;bloom.h<p>I implemented a serialization&#x2F;deserialization of the bloom filters as well (src&#x2F;bloom_marshal.c).<p>The tests are in test&#x2F;t_bloom.c.
esturkabout 7 years ago
Can someone help me understand the query part?<p>It says that a query is done on each BF, even on the ones that were added after the initial storage. So suppose we have only 2 iterations. In the first BF, there&#x27;s k0 hash functions and in the 2nd (iteration) BF, there&#x27;s now k1 hash functions.<p>So naturally, an item is stored using the k0 hash functions. But in order to query, I run against k1 hash functions which is a larger set. If any one of the k1-k0 extra hash functions returns 0, won&#x27;t that be a false negative?
评论 #16435408 未加载
bistro17about 7 years ago
implementation of this paper - <a href="https:&#x2F;&#x2F;metacpan.org&#x2F;pod&#x2F;Bloom::Scalable" rel="nofollow">https:&#x2F;&#x2F;metacpan.org&#x2F;pod&#x2F;Bloom::Scalable</a>
stochastic_monkabout 7 years ago
This should also be marked with a year. A cursory google search has StackOverflow answers from 2013.<p>A lack of a year label implicitly suggests that it’s new, e.g. Hacker News. Please add a tag with the appropriate year.
评论 #16435075 未加载
howitworksabout 7 years ago
This should be labeled as a PDF
评论 #16435286 未加载
bhoustonabout 7 years ago
Hacker news loves upvoting articles about bloom filters (and Bayesian probability - its on the front page again this evening.) Personally I&#x27;ve never found a use for either of them in practice.
评论 #16435521 未加载
评论 #16434963 未加载
评论 #16435145 未加载
评论 #16435078 未加载
评论 #16435214 未加载
评论 #16435047 未加载
评论 #16435728 未加载
评论 #16435195 未加载
评论 #16435150 未加载
评论 #16435698 未加载
评论 #16438355 未加载
评论 #16435040 未加载
评论 #16434969 未加载
评论 #16435082 未加载
评论 #16435118 未加载
评论 #16435172 未加载