TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Roaring bitmaps: what they are and how they work

226 pointsby voberoiover 2 years ago

12 comments

celeritasceleryover 2 years ago
I see this common theme among very fast practical algorithms (like timsort or pdqsort) where there is not some secret math or algorithm, rather they are just a bunch of special cases based on heuristics. Often they involve specific knowledge of the hardware instead of treating software as abstract. To me this is the big difference between “computer science” and “computer engineering”.
评论 #32706466 未加载
评论 #32704674 未加载
评论 #32705383 未加载
评论 #32706554 未加载
评论 #32709631 未加载
ww520over 2 years ago
This is an excellent write up on Roaring Bitmap. The concrete examples really help to illustrate the algorithm well.<p>While Roaring Bitmap performs well on space saving with good performance, my gut feeling is there&#x27;re better ways to achieve the same goals, especially it involving sorting on insertion and binary search on lookup, both on the sorted top level list and the sorted lower-bit integers.<p>Also it works well on 32-bit integers and probably 64-bit integers but it&#x27;s not clear it can scale well beyond to bigger integers.<p>Nevertheless it&#x27;s an excellent algorithm that works well in practice.
quickthrower2over 2 years ago
Not .bmp bitmaps used to store photographic images as I would have guessed:<p>&gt; Bitmaps are arrays of bits used to store sets of integers. They work by setting the Nth bit when an integer N is in the set
评论 #32709172 未加载
TacticalCoderover 2 years ago
&gt; When a set is sparse, traditional bitmaps compress poorly.<p>They waste space. But if you were to compress them, they&#x27;d compress <i>very</i> well. As in, for example, if you were to compress them using roaring bitmaps.
评论 #32709363 未加载
评论 #32707798 未加载
pvillanoover 2 years ago
I thought the trick was going to be indirection.<p>sorted-list&#x2F;bitmap&#x2F;runs, in a two level tree. cool.<p>it&#x27;s technically missing sorted compliment-list, i.e. only the items that are missing, although worst case runs only use twice as much space, so more complexity without much savings, esp. considering real workloads<p>performs better with sequential ids than random uuids because you use fewer pages<p>further research<p>investigating the effect of adding more layers to the tree<p>using additional set representations e.g. balanced tree
评论 #32707900 未加载
nullcover 2 years ago
Those who do not know judy1 are doomed to reinvent it.<p><a href="http:&#x2F;&#x2F;judy.sourceforge.net&#x2F;" rel="nofollow">http:&#x2F;&#x2F;judy.sourceforge.net&#x2F;</a>
评论 #32705912 未加载
评论 #32705616 未加载
maycotteover 2 years ago
Bitmaps are amazing and we believe that they will replace all other data formats for analytics.<p>Today we open sourced about $25m in R&amp;D (more to come)! They take a bit of harnessing, but once you do the results are amazing. We have released an OLAP database called FeatureBase, built entirely on bitmaps (well bitmaps in a b-tree)...<p>Here is a bit (pun intended) more about our big bet on bitmaps: <a href="https:&#x2F;&#x2F;www.featurebase.com&#x2F;blog&#x2F;bitmaps-making-real-time-analytics-real" rel="nofollow">https:&#x2F;&#x2F;www.featurebase.com&#x2F;blog&#x2F;bitmaps-making-real-time-an...</a>
Xeoncrossover 2 years ago
Can someone explain how this is laid out in memory and what it would look like serialized? It&#x27;s easy to have an array of N size that fits all bits and just write to disk. You can also mmap a file as a large array of bytes and just mutate it and let the OS handle the disk syncs if you&#x27;re okay with some data loss of recent state changes if the machine crashes.<p>What would an array of pointers to X containers which are N size each look like on disk?
评论 #32708312 未加载
评论 #32706659 未加载
nattaylorover 2 years ago
I enjoyed this walkthrough. I&#x27;m interested in RB because it&#x27;s used in Lucene but never dug in and assumed (without much thought) that it had to do with consecutive runs of 1s and 0s. Wrong!
评论 #32705044 未加载
superjanover 2 years ago
It looks a bit too simplistic: Blocks with up to 4k items are stored as sorted lists. Having to insert in a sorted list up to 4k items seems very expensive to me.
评论 #32706652 未加载
评论 #32706304 未加载
评论 #32709097 未加载
评论 #32705657 未加载
jervenover 2 years ago
I love roaringbitmaps and maybe even over use this library. On the other hand it made a lot of things possible for legacy.uniprot.org over the years that would have been unaffordable otherwise. Quick faceting using permanent filters over 500+ million Documents would have been really hard otherwise.
评论 #32706504 未加载
pvillanoover 2 years ago
I&#x27;m reading this on a runaway with terrible internet and the figures slowly load to reveal... text :&#x2F;