this is a kind of tangential comment/rant :<p>but to me it seems that research papers _must_ be, for lack of a better term, <i>runnnable</i>. i would, and hopefully others as well, like to, replicate all these wonderful results that are advertised in these papers. without that, they are all just advertisements of scholarship rather than scholarship themselves. a set of instructions + environment which generated these figures would be very welcome.<p><end-rant><p>on the subject of bloom filters, have a look at this one: <a href="https://www3.cs.stonybrook.edu/~ppandey/files/p775-pandey.pdf" rel="nofollow">https://www3.cs.stonybrook.edu/~ppandey/files/p775-pandey.pd...</a> (A General-Purpose Counting Filter: Making Every Bit Count)
I used this paper for building a scalable bloom filter for use in an ad-tech stack. The performance was better than DJB's CDB.<p><a href="https://github.com/opencoff/portable-lib" rel="nofollow">https://github.com/opencoff/portable-lib</a><p>The bloom filter code is in src/bloom.c; the Header file is in inc/utils/bloom.h<p>I implemented a serialization/deserialization of the bloom filters as well (src/bloom_marshal.c).<p>The tests are in test/t_bloom.c.
Can someone help me understand the query part?<p>It says that a query is done on each BF, even on the ones that were added after the initial storage. So suppose we have only 2 iterations. In the first BF, there's k0 hash functions and in the 2nd (iteration) BF, there's now k1 hash functions.<p>So naturally, an item is stored using the k0 hash functions. But in order to query, I run against k1 hash functions which is a larger set. If any one of the k1-k0 extra hash functions returns 0, won't that be a false negative?
implementation of this paper - <a href="https://metacpan.org/pod/Bloom::Scalable" rel="nofollow">https://metacpan.org/pod/Bloom::Scalable</a>
This should also be marked with a year. A cursory google search has StackOverflow answers from 2013.<p>A lack of a year label implicitly suggests that it’s new, e.g. Hacker News. Please add a tag with the appropriate year.
Hacker news loves upvoting articles about bloom filters (and Bayesian probability - its on the front page again this evening.) Personally I've never found a use for either of them in practice.