Title is confusing: this is not about the original "Roaring", but an extension of it called "Roaring+Run".<p>Here, "bitmap" = "set of sometimes-compact integers". The "uncompressed" and several "rle" implementations are obvious. Hm, except this only seems to be talking about a particularly-naive RLE approach (suitable for storage but not computation)? If you're doing computation I expect you to use absolute offsets rather than relative ones which means you can just do binary search (the only downside of this is that you can't use variable-length integers now).<p>Roaring is just a fixed 2-level trie, where the outer node is always an array of pointers and where the inner nodes can be either uncompressed bitvectors (if dense) or an array of low-half integers (if sparse). Also, it <i>only</i> works for 32-bit integers at a fundamental level; significant changes are needed for 64-bit integers.<p>This paper adds a third representation for the inner node, the bsearch'able absolute RLE method I mentioned earlier (before even reading the paper beyond the abstract).<p>Overall there's neither anything novel nor anything particularly exciting about this paper, but if you ignore all the self-congratulations it might work as a decent intro to the subject? Except maybe not since there are a lot of things it fails to mention (the ping-pong problem, deduplicated tries, the approach of using a separate set for sparse values in the same range, the "overall sparse but locally semi-dense" representation that uses fixed-size single-word bitsets, ...)
Roaring bitmaps are really useful for doing phrase search in search engines.<p>Basically you can find cases where one term is immediately before another by left shifting the right terms roaring encoded positions in all documents and bitwise-anding the similarly roaring-encoded payload of the preceding term. All with a highly compressed representation of term positions.<p>With something like numpy you can do this in a handful of logical operations in python.<p><a href="https://softwaredoug.com/blog/2024/01/21/search-array-phrase-algorithm" rel="nofollow">https://softwaredoug.com/blog/2024/01/21/search-array-phrase...</a>
Hey everyone, if you're looking for a more approachable guide on bitmap compression, I wrote a blog post on it this year: <a href="https://pncnmnp.github.io/blogs/roaring-bitmaps.html" rel="nofollow">https://pncnmnp.github.io/blogs/roaring-bitmaps.html</a>. It covers run-length encoding, BBC, WAH, Concise, and even Roaring Bitmaps.
I found them consistently slower in unary unicode tables, with sizes from 1.000 to 30.000. Binary search and hashtables were both better.<p>The reason might be that ordered search tables and perfect hashtables can be dumped to C code statically, roaring not, so there is the dynamic alloc and deserialization overhead.