Bloom filters explained in a single image

98 pointsby polyrandabout 4 years ago

17 comments

eisabout 4 years ago

Bloom filters explained in a single HN comment:They are an efficient implementation of a Set that contains hashes of the elements.bloomfilter.add("foo") will internally add hash("foo") to the Setbloomfilter.has("foo") checks if the Set contains hash("foo")False positives arise due to different elements hashing to the same hash. If "foo" and "bar" hash to the same value, bloomfilter.has("bar") would return true.No false negatives are possible.They are used when an actual check for an element in a datastructure is quite costly and the hitrate for not-in-the-datastructure is non-trivial and can therefor be skipped if the bloomfilter gives a negative.

评论 #26773475 未加载

评论 #26777239 未加载

评论 #26777154 未加载

评论 #26773356 未加载

anon_tor_12345about 4 years ago

"bloom filters explain in a single image" where 90% of the image is text. so just bloom filters explained in 2-3 paragraphs?fig 3 here<a href="https://www.sciencedirect.com/science/article/pii/S1389128613003083#f0015" rel="nofollow">https://www.sciencedirect.com/science/article/pii/S138912861...</a>is a much better "single image" explanation (insofar as any single image could be sufficiently explanatory) of bloom filters.

评论 #26772602 未加载

bradleyjgabout 4 years ago

What is it about bloom filters that makes people want to explain them to others? I think I’ve seen more blog posts about them than any other cs topic, with the possible exception of monads.

评论 #26772835 未加载

评论 #26773995 未加载

评论 #26772916 未加载

评论 #26772767 未加载

评论 #26772892 未加载

polyrandabout 4 years ago

Hi, author here!I have created that site to summarize concepts I find interesting, while providing examples or use cases.My biggest inspiration comes from Julia Evans and her zines[0]. I started drawing the things I was learning about, and I thought the format could be useful for other people.Right now I'm diving into a mix of hashing functions/cryptography, data structures and databases. I usually spend a few days learning about a concept, and then I try to summarize it in a constrained drawing frame (I use Excalidraw[1] to do it and some logos from drwn.io[2]). I'm happy to accept suggestions about interesting topics to explore, draw and summarize.[0] <a href="https://wizardzines.com/" rel="nofollow">https://wizardzines.com/</a> [1] <a href="https://excalidraw.com/" rel="nofollow">https://excalidraw.com/</a> [2] <a href="https://drwn.io/" rel="nofollow">https://drwn.io/</a>

评论 #26772937 未加载

reivalcabout 4 years ago

thanks for the post, it inspired this naive code:<pre><code> class BloomFilter: def __init__(self, size): self.f = [0] * size def contains(self, s): h1, h2, h3 = self.hashes(s) if self.f[h1] * self.f[h2] * self.f[h3] == 1: return 'Value might be in the set.' else: return 'Value is definitely not in the set.' def hashes(self, s): h1 = hash(s) % len(self.f) h2 = hash(s + 'salt') % len(self.f) h3 = hash(s + 'more salt') % len(self.f) return h1, h2, h3 def insert(self, s): h1, h2, h3 = self.hashes(s) self.f[h1] = self.f[h2] = self.f[h3] = 1 bf = BloomFilter(64) bf.insert('bill') print(f"{bf.contains('bill') = }") print(f"{bf.contains('bob') = }") Out: bf.contains('bill') = 'Value might be in the set.' bf.contains('bob') = 'Value is definitely not in the set.'</code></pre>

评论 #26773297 未加载

kowloabout 4 years ago

This is great but the title is a little misleading for me. Perhaps "bloom filters explained in a poster"...This contains multiple paragraphs and figures. You couldn't screenshot a wikipedia page and claim the same!

superasnabout 4 years ago

I didn't understand a thing from this image but I was definitely intrigued and so I found a video on Youtube(1) that explains it quite simply (like for dummies) and now I fully understand it!You can watch it at 2.5x speed without missing out on anything:<a href="https://www.youtube.com/watch?v=kfFacplFY4Y" rel="nofollow">https://www.youtube.com/watch?v=kfFacplFY4Y</a>

评论 #26772819 未加载

tyingqabout 4 years ago

For a really high level that includes how people practically use one with storage, I like this simple image:<a href="https://academy.bit2me.com/wp-content/uploads/2020/06/como-funciona-un-filtro-bloom-1024x576.webp" rel="nofollow">https://academy.bit2me.com/wp-content/uploads/2020/06/como-f...</a>

notacowardabout 4 years ago

Fun fact: the original pre-computer punch cards used for the census etc. were quite like Bloom filters with degenerate hash functions. Some of the same principles of key number/width, density, and pollution even apply. I've often thought that would be a good place to start, if I ever had to explain Bloom filters to a layman or beginner.

joshlemerabout 4 years ago

I've just thought about how you could also have a "Bloom Map", basically as a HashSet is to Bloom Filter, a HashMap would be to a Bloom Map. It would be able to answer lookups with either "Definitely not present" or "Might map to value XYZ".

评论 #26776160 未加载

vangelisabout 4 years ago

My first thought about Bloom filters is always what does this have to do with shaders.

bryzaguyabout 4 years ago

Can someone help me understand the value of hashing? If you’re using modulus to bucket, why not just use the string length or something? Is it because the values will distribute across buckets more uniformly?

评论 #26774184 未加载

throw14082020about 4 years ago

> Bloom filters test if an element is part of a set, without needing to store the entire set<a href="https://python.land/bloom-filter" rel="nofollow">https://python.land/bloom-filter</a>There.

tonke90about 4 years ago

I've used Bloom filters and found them to be memory latency bound, as queries can not be cached (big data structure, random access). Any recommendations on how to speed up queries?

评论 #26773448 未加载

评论 #26773237 未加载

gbritsabout 4 years ago

To be complete something along the lines of: ‘if the answer is “maybe yes” a more computationally expensive is used to definitely decide yes or no’ should me added imo

FatalLogicabout 4 years ago

Can this be explained, more simply, by saying that many different strings will be represented by a single hash?

评论 #26773578 未加载

评论 #26772849 未加载

sullyj3about 4 years ago

What's the intuition behind the name?

评论 #26776471 未加载