Honestly, the inverse Burrows-Wheeler transform seems like some sort of Voodoo black magic to me.<p>It reminds be of the 100 prisoners problem [0]. Yes, I understand why it works. I can see how it works mathematically. But it still feels like it shouldn't work, that we're somehow getting something for free.<p>[0]: <a href="https://en.wikipedia.org/wiki/100_prisoners_problem" rel="nofollow">https://en.wikipedia.org/wiki/100_prisoners_problem</a>
This is used in both bwa [1] and bowtie [2], two of the most popular DNA sequence aligners.<p>[1] <a href="https://github.com/lh3/bwa" rel="nofollow">https://github.com/lh3/bwa</a><p>[2] <a href="https://github.com/BenLangmead/bowtie" rel="nofollow">https://github.com/BenLangmead/bowtie</a>
This is a fun video from a series on compression that explains it well, and features Mike Burrows:<p><a href="https://youtu.be/4WRANhDiSHM" rel="nofollow">https://youtu.be/4WRANhDiSHM</a><p>He shares the origin of the algorithm as well as a story about how it was first published.<p>The Compressor Head video series is the best introduction to compression that I’ve found.
I still remember when I first read the first C implementation of this, which was IIRC by Mike Burrows.<p>Apparently, back in the days, Mike had some sort of philosophical opposition to indenting his C code.<p>I was impressed he managed to write such a complex piece of code without ever indenting anything.
For biologists reading this, I recommend the following series from a mathematical biologist in Spain (maybe Toronto by now): <a href="http://blog.thegrandlocus.com/tag/burrows-wheeler-transform" rel="nofollow">http://blog.thegrandlocus.com/tag/burrows-wheeler-transform</a>
It's fascinating to me that xz's algorithm[1] beats bzip2 (Burrows-Wheeler) in both time and space, but it's a much simpler algorithm.<p>1: <a href="https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm" rel="nofollow">https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93M...</a>