TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A compressed indexable bitset

95 pointsby francoismassotalmost 2 years ago

3 comments

otalmost 2 years ago
&gt; We haven&#x27;t discussed other solutions like Partitioned Elias-Fano indexes or Tree-Encoded Bitmaps. We simply did not investigate them. There are no efficient off-the-shelf implementations of these solutions, so we had to implement one.<p>I guess they mean there are no implementations <i>in Rust</i>? The Partitioned Elias-Fano (PEF) paper links to the C++ implementation ([1], which I wrote) used for the experiments, it was efficient 9 years ago :) The library was later forked [2] and used in a few other papers, I believe it is somewhat maintained. The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.<p>It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps), PEF would likely be faster, if dense (visit a large fraction of the bitmap), it&#x27;d be slower.<p>It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;ot&#x2F;ds2i&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;ot&#x2F;ds2i&#x2F;</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;pisa-engine&#x2F;pisa">https:&#x2F;&#x2F;github.com&#x2F;pisa-engine&#x2F;pisa</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;folly&#x2F;blob&#x2F;main&#x2F;folly&#x2F;experimental&#x2F;EliasFanoCoding.h">https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;folly&#x2F;blob&#x2F;main&#x2F;folly&#x2F;experiment...</a>
评论 #36556713 未加载
评论 #36556381 未加载
评论 #36557946 未加载
评论 #36560585 未加载
inciampatialmost 2 years ago
Also known as rank&#x2F;select dictionaries: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Succinct_data_structure#Succinct_indexable_dictionaries" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Succinct_data_structure#Succin...</a>
jacquesmalmost 2 years ago
Interesting. I once made something like this for a search engine I built. It turned out to be a very useful data structure, especially when working with large, sparse sets.