Bloom filters explained in a single HN comment:<p>They are an efficient implementation of a Set that contains hashes of the elements.<p>bloomfilter.add("foo") will internally add hash("foo") to the Set<p>bloomfilter.has("foo") checks if the Set contains hash("foo")<p>False positives arise due to different elements hashing to the same hash. If "foo" and "bar" hash to the same value, bloomfilter.has("bar") would return true.<p>No false negatives are possible.<p>They are used when an actual check for an element in a datastructure is quite costly and the hitrate for not-in-the-datastructure is non-trivial and can therefor be skipped if the bloomfilter gives a negative.
"bloom filters explain in a single image" where 90% of the image is text. so just bloom filters explained in 2-3 paragraphs?<p>fig 3 here<p><a href="https://www.sciencedirect.com/science/article/pii/S1389128613003083#f0015" rel="nofollow">https://www.sciencedirect.com/science/article/pii/S138912861...</a><p>is a much better "single image" explanation (insofar as any single image could be sufficiently explanatory) of bloom filters.
What is it about bloom filters that makes people want to explain them to others? I think I’ve seen more blog posts about them than any other cs topic, with the possible exception of monads.
Hi, author here!<p>I have created that site to summarize concepts I find interesting, while providing examples or use cases.<p>My biggest inspiration comes from Julia Evans and her zines[0]. I started drawing the things I was learning about, and I thought the format could be useful for other people.<p>Right now I'm diving into a mix of hashing functions/cryptography, data structures and databases. I usually spend a few days learning about a concept, and then I try to summarize it in a constrained drawing frame (I use Excalidraw[1] to do it and some logos from drwn.io[2]). I'm happy to accept suggestions about interesting topics to explore, draw and summarize.<p>[0] <a href="https://wizardzines.com/" rel="nofollow">https://wizardzines.com/</a>
[1] <a href="https://excalidraw.com/" rel="nofollow">https://excalidraw.com/</a>
[2] <a href="https://drwn.io/" rel="nofollow">https://drwn.io/</a>
This is great but the title is a little misleading for me. Perhaps "bloom filters explained in a poster"...<p>This contains multiple paragraphs and figures. You couldn't screenshot a wikipedia page and claim the same!
I didn't understand a thing from this image but I was definitely intrigued and so I found a video on Youtube(1) that explains it quite simply (like for dummies) and now I fully understand it!<p>You can watch it at 2.5x speed without missing out on anything:<p><a href="https://www.youtube.com/watch?v=kfFacplFY4Y" rel="nofollow">https://www.youtube.com/watch?v=kfFacplFY4Y</a>
For a really high level that includes how people practically use one with storage, I like this simple image:<p><a href="https://academy.bit2me.com/wp-content/uploads/2020/06/como-funciona-un-filtro-bloom-1024x576.webp" rel="nofollow">https://academy.bit2me.com/wp-content/uploads/2020/06/como-f...</a>
Fun fact: the original pre-computer punch cards used for the census etc. were quite like Bloom filters with degenerate hash functions. Some of the same principles of key number/width, density, and pollution even apply. I've often thought that would be a good place to start, if I ever had to explain Bloom filters to a layman or beginner.
I've just thought about how you could also have a "Bloom Map", basically as a HashSet is to Bloom Filter, a HashMap would be to a Bloom Map. It would be able to answer lookups with either "Definitely not present" or "Might map to value XYZ".
Can someone help me understand the value of hashing? If you’re using modulus to bucket, why not just use the string length or something? Is it because the values will distribute across buckets more uniformly?
> Bloom filters test if an element is part of a set, without needing to store the entire set<p><a href="https://python.land/bloom-filter" rel="nofollow">https://python.land/bloom-filter</a><p>There.
I've used Bloom filters and found them to be memory latency bound, as queries can not be cached (big data structure, random access). Any recommendations on how to speed up queries?
To be complete something along the lines of: ‘if the answer is “maybe yes” a more computationally expensive is used to definitely decide yes or no’ should me added imo