This is really great!<p>For those that don't know, bloom filters are really good at determining when something is not in a set. So if says "not in set" that's 100% accurate and super fast.<p>We used this to our advantage at reddit. When you load a comments page, we had to determine if you've voted on anything. So with the bloom filter in front, the query for "have they voted on anything here" was very quick if the answer was no.<p>But we still had to make the query. So another hack we did was to put a value on your user object to track the last time you interacted with the website. If the comments page was made after your last interaction, then we didn't even have to do the query.<p>With this, we wouldn't have to do the two step process -- it sounds like this would be super fast when it was time boxed. Maybe...
Aether (<a href="https://getaether.net" rel="nofollow">https://getaether.net</a>) also uses a roughly equivalent version of this I had thought, up to this point, that I ‘discovered’. I call it ‘Rolling Bloom’. (Not sure which implementation is first.)<p>Here’s the Go implementation: <a href="https://github.com/nehbit/aether/blob/master/aether-core/aether/services/rollingbloom/rollingbloom.go" rel="nofollow">https://github.com/nehbit/aether/blob/master/aether-core/aet...</a>
An interesting hack is to use a count-based bloom filter, maybe 8-bits per count instead of 1-bit.<p>As long as the "count" doesn't overflow to the max size (in this case: 255), you can always "remove" items from a Counting Bloom Filter. In effect, a traditional 1-bit bloom filter is simply a counting-bloom filter that "overflows" on the first hit.<p>If people really need deletion on their bloom filter, do realize it is possible! (as long as you select a proper bit-size).