科技回声

8 条评论

colmmacc超过 10 年前

This paper is needlessly hard to read (4 pages in before there's even an explanation!) so first a refresher: A traditional bloom filter sets k bits to 1, using k hash functions to determine which bits. So for input 'i', we might get something like:<pre><code> h1(i) = 0 h2(i) = 5 h3(i) = 7</code></pre> and upon insertion, the bloom filter would resemble:<pre><code> 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 </code></pre> to check if an item is the bloom filter, you check the corresponding bits. After you add items to a bloom filter there's some probability that the corresponding bits for items that haven't been added are also set. So it can sometimes lie to you and say something is included, even when it's not.IBLTs replace the bits with counts of items added, and a sum of the key and value. So where Cs is the count sum, Ks is the Key sum and Vs is the value sum, the filter might look like;<pre><code> Cs=1 Ks=10 Vs=20 | Cs=0 Ks=0 Vs=0 | ... </code></pre> the positions in the table are computed just as before; the count in the count increments for every item added, and the key and value "sums" are a union of the keys and values that have been added (this might be a simple rolling count of a small checksum, or an xor).Items can be removed from the table individually, by subtracting them from the relevant counts (just like a counting bloom filter) and rolling sums. A "dump" of the table can be derived by iterating through all possible values (you need to know what they might be) and deleting them one by one, and it takes trickery to support duplicate entries.This seems like a big downside; if the input set is constrained and there is so small a finite set of inputs that you can just iterate over them, then I think it is easier to use composite numbers as your invertible lookup table.Here's how; for each potential input element, assign it a unique prime number, keep this assignment in a static lookup table. Start the table with the value "1". To store an element, multiply the current table value by the element's value. A composite number will be generated that is the product of all elements inserted into the table.To check for element presence; just use mod. Duplicates work; if the value is still congruent, then the element is still there. To remove an element, just divide by its value. A surprisingly large number of elements can be represented this way in a 64k-sized composite number.A good example of this this technique is to model Internet AS paths; give every AS number a unique prime number (there are only about 100k of them) and model AS path as the composite of its component ASes; using vectorized DIV/MOD instructions it is incredibly fast to large graph operations.

评论 #8244964 未加载

评论 #8244784 未加载

评论 #8245328 未加载

评论 #8244509 未加载

评论 #8245274 未加载

评论 #8244591 未加载

lqdc13超过 10 年前

If you just want insertion, deletion, resizing and merging with better data locality and roughly same performance as Bloom Filter, also consider Quotient Filter <a href="http://arxiv.org/pdf/1208.0290.pdf" rel="nofollow">http://arxiv.org/pdf/1208.0290.pdf</a>

robmccoll超过 10 年前

Why use this instead of putting a bloom filter in front of a fixed hash table? In the unlucky case that two pairs have total hash collision with the IBLT you can't recover anything. Iteration cost is the same. If you need delete just use a count in the filter instead of a hash. If the table fills up, drop entries. I guess the IBLT would be able to recover info if the number of insertions crossed the threshold and deletions brought the count back down.

mhlakhani超过 10 年前

Note that there's a Python implementation here:<a href="https://github.com/jesperborgstrup/Py-IBLT" rel="nofollow">https://github.com/jesperborgstrup/Py-IBLT</a>

msane超过 10 年前

I was recently looking at bloom filters as a possible piece of a permissions system in a typical database object model. Imagine:<pre><code> - users have various permissions - other objects in the system require one or more permissions to be viewable by those users </code></pre> you can accomplish this easily with joins alone, but it becomes a performance issue rather quickly. I have a large-scale system and my intuition was that supplemental indexes based on bloom filters could help me solve this problem. i haven't figured out an exact way to apply them however.

mrfusion超过 10 年前

Could this replace a python dictionary? Assuming you're ok with losing some entries now and then?

评论 #8244357 未加载

评论 #8244443 未加载

notastartup超过 10 年前

When would be a good reason to use such invertible bloom lookup tables, in my knowledge, isn't a bloom filter's purpose is to store large amount of data and give very quick replies as to whether a piece of data is in the basket but not with 100% accuracy?

评论 #8244277 未加载

评论 #8244769 未加载

评论 #8244386 未加载

jbyers超过 10 年前

(2011)

评论 #8244445 未加载

评论 #8244430 未加载

8 条评论

colmmacc超过 10 年前

评论 #8244964 未加载

评论 #8244784 未加载

评论 #8245328 未加载

评论 #8244509 未加载

评论 #8245274 未加载

评论 #8244591 未加载

lqdc13超过 10 年前

robmccoll超过 10 年前

mhlakhani超过 10 年前

Note that there's a Python implementation here:<a href="https://github.com/jesperborgstrup/Py-IBLT" rel="nofollow">https://github.com/jesperborgstrup/Py-IBLT</a>

msane超过 10 年前

mrfusion超过 10 年前

Could this replace a python dictionary? Assuming you're ok with losing some entries now and then?

Invertible Bloom Lookup Tables (2011)

8 条评论

Invertible Bloom Lookup Tables (2011)

8 条评论