TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Invertible Bloom Lookup Tables (2011)

92 点作者 mhlakhani超过 10 年前

8 条评论

colmmacc超过 10 年前
This paper is needlessly hard to read (4 pages in before there&#x27;s even an explanation!) so first a refresher: A traditional bloom filter sets k bits to 1, using k hash functions to determine which bits. So for input &#x27;i&#x27;, we might get something like:<p><pre><code> h1(i) = 0 h2(i) = 5 h3(i) = 7</code></pre> and upon insertion, the bloom filter would resemble:<p><pre><code> 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 </code></pre> to check if an item is the bloom filter, you check the corresponding bits. After you add items to a bloom filter there&#x27;s some probability that the corresponding bits for items that haven&#x27;t been added are also set. So it can sometimes lie to you and say something is included, even when it&#x27;s not.<p>IBLTs replace the bits with counts of items added, and a sum of the key and value. So where Cs is the count sum, Ks is the Key sum and Vs is the value sum, the filter might look like;<p><pre><code> Cs=1 Ks=10 Vs=20 | Cs=0 Ks=0 Vs=0 | ... </code></pre> the positions in the table are computed just as before; the count in the count increments for every item added, and the key and value &quot;sums&quot; are a union of the keys and values that have been added (this might be a simple rolling count of a small checksum, or an xor).<p>Items can be removed from the table individually, by subtracting them from the relevant counts (just like a counting bloom filter) and rolling sums. A &quot;dump&quot; of the table can be derived by iterating through all possible values (you need to know what they might be) and deleting them one by one, and it takes trickery to support duplicate entries.<p>This seems like a big downside; if the input set is constrained and there is so small a finite set of inputs that you can just iterate over them, then I think it is easier to use composite numbers as your invertible lookup table.<p>Here&#x27;s how; for each potential input element, assign it a unique prime number, keep this assignment in a static lookup table. Start the table with the value &quot;1&quot;. To store an element, multiply the current table value by the element&#x27;s value. A composite number will be generated that is the product of all elements inserted into the table.<p>To check for element presence; just use mod. Duplicates work; if the value is still congruent, then the element is still there. To remove an element, just divide by its value. A surprisingly large number of elements can be represented this way in a 64k-sized composite number.<p>A good example of this this technique is to model Internet AS paths; give every AS number a unique prime number (there are only about 100k of them) and model AS path as the composite of its component ASes; using vectorized DIV&#x2F;MOD instructions it is incredibly fast to large graph operations.
评论 #8244964 未加载
评论 #8244784 未加载
评论 #8245328 未加载
评论 #8244509 未加载
评论 #8245274 未加载
评论 #8244591 未加载
lqdc13超过 10 年前
If you just want insertion, deletion, resizing and merging with better data locality and roughly same performance as Bloom Filter, also consider Quotient Filter <a href="http://arxiv.org/pdf/1208.0290.pdf" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1208.0290.pdf</a>
robmccoll超过 10 年前
Why use this instead of putting a bloom filter in front of a fixed hash table? In the unlucky case that two pairs have total hash collision with the IBLT you can&#x27;t recover anything. Iteration cost is the same. If you need delete just use a count in the filter instead of a hash. If the table fills up, drop entries. I guess the IBLT would be able to recover info if the number of insertions crossed the threshold and deletions brought the count back down.
mhlakhani超过 10 年前
Note that there&#x27;s a Python implementation here:<p><a href="https://github.com/jesperborgstrup/Py-IBLT" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jesperborgstrup&#x2F;Py-IBLT</a>
msane超过 10 年前
I was recently looking at bloom filters as a possible piece of a permissions system in a typical database object model. Imagine:<p><pre><code> - users have various permissions - other objects in the system require one or more permissions to be viewable by those users </code></pre> you can accomplish this easily with joins alone, but it becomes a performance issue rather quickly. I have a large-scale system and my intuition was that supplemental indexes based on bloom filters could help me solve this problem. i haven&#x27;t figured out an exact way to apply them however.
mrfusion超过 10 年前
Could this replace a python dictionary? Assuming you&#x27;re ok with losing some entries now and then?
评论 #8244357 未加载
评论 #8244443 未加载
notastartup超过 10 年前
When would be a good reason to use such invertible bloom lookup tables, in my knowledge, isn&#x27;t a bloom filter&#x27;s purpose is to store large amount of data and give very quick replies as to whether a piece of data is in the basket but not with 100% accuracy?
评论 #8244277 未加载
评论 #8244769 未加载
评论 #8244386 未加载
jbyers超过 10 年前
(2011)
评论 #8244445 未加载
评论 #8244430 未加载