TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Consistently faster and smaller compressed bitmaps with Roaring (2016)

54 pointsby hambandit7 months ago

4 comments

o11c6 months ago
Title is confusing: this is not about the original &quot;Roaring&quot;, but an extension of it called &quot;Roaring+Run&quot;.<p>Here, &quot;bitmap&quot; = &quot;set of sometimes-compact integers&quot;. The &quot;uncompressed&quot; and several &quot;rle&quot; implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you&#x27;re doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can&#x27;t use variable-length integers now).<p>Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it <i>only</i> works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.<p>This paper adds a third representation for the inner node, the bsearch&#x27;able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).<p>Overall there&#x27;s neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the &quot;overall sparse but locally semi-dense&quot; representation that uses fixed-size single-word bitsets, ...)
评论 #42086074 未加载
softwaredoug6 months ago
Roaring bitmaps are really useful for doing phrase search in search engines.<p>Basically you can find cases where one term is immediately before another by left shifting the right terms roaring encoded positions in all documents and bitwise-anding the similarly roaring-encoded payload of the preceding term. All with a highly compressed representation of term positions.<p>With something like numpy you can do this in a handful of logical operations in python.<p><a href="https:&#x2F;&#x2F;softwaredoug.com&#x2F;blog&#x2F;2024&#x2F;01&#x2F;21&#x2F;search-array-phrase-algorithm" rel="nofollow">https:&#x2F;&#x2F;softwaredoug.com&#x2F;blog&#x2F;2024&#x2F;01&#x2F;21&#x2F;search-array-phrase...</a>
pncnmnp6 months ago
Hey everyone, if you&#x27;re looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: <a href="https:&#x2F;&#x2F;pncnmnp.github.io&#x2F;blogs&#x2F;roaring-bitmaps.html" rel="nofollow">https:&#x2F;&#x2F;pncnmnp.github.io&#x2F;blogs&#x2F;roaring-bitmaps.html</a>. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.
评论 #42088795 未加载
rurban6 months ago
I found them consistently slower in unary unicode tables, with sizes from 1.000 to 30.000. Binary search and hashtables were both better.<p>The reason might be that ordered search tables and perfect hashtables can be dumped to C code statically, roaring not, so there is the dynamic alloc and deserialization overhead.