I'm one of the authors of the paper, nice to see it on HN.<p>I remember when we first experimented with this, the compression improvement compared to our previous heuristic was massive, but the algorithm took a day to run on a double-digit-node Giraph cluster for a single index shard. I was very skeptical we'd ever be able to use it in production, given that we had to run it on thousands of index shards every few days.<p>Eventually we reimplemented it in C++, optimized all the data structures to make them fit in memory, and we were able to run it in a couple of hours on a single (beefy) machine. Over the years it has been optimized further.<p>The algorithm has been reproduced externally with an open-source implementation [1], which AFAIR was pretty good when I looked at it.<p>[1] <a href="https://culpepper.io/publications/mm+19-ecir.pdf" rel="nofollow">https://culpepper.io/publications/mm+19-ecir.pdf</a>
I wonder if there's any development of this, since this paper is from 2016.<p>This is an example of an NP-hard problem that surprisingly broadly applies to everyday situations, just like the bin-packing problem that has been on the front page recently. Do you know of other NP hard problems with surprising applicability to everyday situations?
This is such a nice and straightforward result. Encode the vertex ID's using locality-preserving integers, and then store the deltas between the ID's instead of the ID's themselves.