TechEcho

3 comments

otalmost 2 years ago

> We haven't discussed other solutions like Partitioned Elias-Fano indexes or Tree-Encoded Bitmaps. We simply did not investigate them. There are no efficient off-the-shelf implementations of these solutions, so we had to implement one.I guess they mean there are no implementations in Rust? The Partitioned Elias-Fano (PEF) paper links to the C++ implementation ([1], which I wrote) used for the experiments, it was efficient 9 years ago :) The library was later forked [2] and used in a few other papers, I believe it is somewhat maintained. The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps), PEF would likely be faster, if dense (visit a large fraction of the bitmap), it'd be slower.It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.[1] <a href="https://github.com/ot/ds2i/">https://github.com/ot/ds2i/</a>[2] <a href="https://github.com/pisa-engine/pisa">https://github.com/pisa-engine/pisa</a>[3] <a href="https://github.com/facebook/folly/blob/main/folly/experimental/EliasFanoCoding.h">https://github.com/facebook/folly/blob/main/folly/experiment...</a>

评论 #36556713 未加载

评论 #36556381 未加载

评论 #36557946 未加载

评论 #36560585 未加载

inciampatialmost 2 years ago

Also known as rank/select dictionaries: <a href="https://en.wikipedia.org/wiki/Succinct_data_structure#Succinct_indexable_dictionaries" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Succinct_data_structure#Succin...</a>

jacquesmalmost 2 years ago

Interesting. I once made something like this for a search engine I built. It turned out to be a very useful data structure, especially when working with large, sparse sets.

3 comments

otalmost 2 years ago

评论 #36556713 未加载

评论 #36556381 未加载

评论 #36557946 未加载

评论 #36560585 未加载

inciampatialmost 2 years ago

jacquesmalmost 2 years ago

Interesting. I once made something like this for a search engine I built. It turned out to be a very useful data structure, especially when working with large, sparse sets.

A compressed indexable bitset

3 comments

A compressed indexable bitset

3 comments