TechEcho

12 comments

I see this common theme among very fast practical algorithms (like timsort or pdqsort) where there is not some secret math or algorithm, rather they are just a bunch of special cases based on heuristics. Often they involve specific knowledge of the hardware instead of treating software as abstract. To me this is the big difference between “computer science” and “computer engineering”.

评论 #32706466 未加载

评论 #32704674 未加载

评论 #32705383 未加载

评论 #32706554 未加载

评论 #32709631 未加载

ww520over 2 years ago

This is an excellent write up on Roaring Bitmap. The concrete examples really help to illustrate the algorithm well.While Roaring Bitmap performs well on space saving with good performance, my gut feeling is there're better ways to achieve the same goals, especially it involving sorting on insertion and binary search on lookup, both on the sorted top level list and the sorted lower-bit integers.Also it works well on 32-bit integers and probably 64-bit integers but it's not clear it can scale well beyond to bigger integers.Nevertheless it's an excellent algorithm that works well in practice.

quickthrower2over 2 years ago

Not .bmp bitmaps used to store photographic images as I would have guessed:> Bitmaps are arrays of bits used to store sets of integers. They work by setting the Nth bit when an integer N is in the set

评论 #32709172 未加载

TacticalCoderover 2 years ago

> When a set is sparse, traditional bitmaps compress poorly.They waste space. But if you were to compress them, they'd compress very well. As in, for example, if you were to compress them using roaring bitmaps.

评论 #32709363 未加载

评论 #32707798 未加载

pvillanoover 2 years ago

I thought the trick was going to be indirection.sorted-list/bitmap/runs, in a two level tree. cool.it's technically missing sorted compliment-list, i.e. only the items that are missing, although worst case runs only use twice as much space, so more complexity without much savings, esp. considering real workloadsperforms better with sequential ids than random uuids because you use fewer pagesfurther researchinvestigating the effect of adding more layers to the treeusing additional set representations e.g. balanced tree

评论 #32707900 未加载

nullcover 2 years ago

Those who do not know judy1 are doomed to reinvent it.<a href="http://judy.sourceforge.net/" rel="nofollow">http://judy.sourceforge.net/</a>

评论 #32705912 未加载

评论 #32705616 未加载

maycotteover 2 years ago

Bitmaps are amazing and we believe that they will replace all other data formats for analytics.Today we open sourced about $25m in R&D (more to come)! They take a bit of harnessing, but once you do the results are amazing. We have released an OLAP database called FeatureBase, built entirely on bitmaps (well bitmaps in a b-tree)...Here is a bit (pun intended) more about our big bet on bitmaps: <a href="https://www.featurebase.com/blog/bitmaps-making-real-time-analytics-real" rel="nofollow">https://www.featurebase.com/blog/bitmaps-making-real-time-an...</a>

Xeoncrossover 2 years ago

Can someone explain how this is laid out in memory and what it would look like serialized? It's easy to have an array of N size that fits all bits and just write to disk. You can also mmap a file as a large array of bytes and just mutate it and let the OS handle the disk syncs if you're okay with some data loss of recent state changes if the machine crashes.What would an array of pointers to X containers which are N size each look like on disk?

评论 #32708312 未加载

评论 #32706659 未加载

nattaylorover 2 years ago

I enjoyed this walkthrough. I'm interested in RB because it's used in Lucene but never dug in and assumed (without much thought) that it had to do with consecutive runs of 1s and 0s. Wrong!

评论 #32705044 未加载

superjanover 2 years ago

It looks a bit too simplistic: Blocks with up to 4k items are stored as sorted lists. Having to insert in a sorted list up to 4k items seems very expensive to me.

评论 #32706652 未加载

评论 #32706304 未加载

评论 #32709097 未加载

评论 #32705657 未加载

jervenover 2 years ago

I love roaringbitmaps and maybe even over use this library. On the other hand it made a lot of things possible for legacy.uniprot.org over the years that would have been unaffordable otherwise. Quick faceting using permanent filters over 500+ million Documents would have been really hard otherwise.

评论 #32706504 未加载

pvillanoover 2 years ago

I'm reading this on a runaway with terrible internet and the figures slowly load to reveal... text :/

12 comments

celeritasceleryover 2 years ago

评论 #32706466 未加载

评论 #32704674 未加载

评论 #32705383 未加载

评论 #32706554 未加载

评论 #32709631 未加载

ww520over 2 years ago

quickthrower2over 2 years ago

评论 #32709172 未加载

TacticalCoderover 2 years ago

评论 #32709363 未加载

评论 #32707798 未加载

pvillanoover 2 years ago

评论 #32707900 未加载

nullcover 2 years ago

Those who do not know judy1 are doomed to reinvent it.<a href="http://judy.sourceforge.net/" rel="nofollow">http://judy.sourceforge.net/</a>

评论 #32705912 未加载

评论 #32705616 未加载

maycotteover 2 years ago

Xeoncrossover 2 years ago

评论 #32708312 未加载

评论 #32706659 未加载

nattaylorover 2 years ago

I enjoyed this walkthrough. I'm interested in RB because it's used in Lucene but never dug in and assumed (without much thought) that it had to do with consecutive runs of 1s and 0s. Wrong!

评论 #32705044 未加载

superjanover 2 years ago

It looks a bit too simplistic: Blocks with up to 4k items are stored as sorted lists. Having to insert in a sorted list up to 4k items seems very expensive to me.

评论 #32706652 未加载

评论 #32706304 未加载

评论 #32709097 未加载

评论 #32705657 未加载

jervenover 2 years ago

评论 #32706504 未加载

pvillanoover 2 years ago

I'm reading this on a runaway with terrible internet and the figures slowly load to reveal... text :/

Roaring bitmaps: what they are and how they work

12 comments

Roaring bitmaps: what they are and how they work

12 comments