TechEcho

16 comments

dkbrkover 2 years ago

Honestly, the inverse Burrows-Wheeler transform seems like some sort of Voodoo black magic to me.It reminds be of the 100 prisoners problem [0]. Yes, I understand why it works. I can see how it works mathematically. But it still feels like it shouldn't work, that we're somehow getting something for free.[0]: <a href="https://en.wikipedia.org/wiki/100_prisoners_problem" rel="nofollow">https://en.wikipedia.org/wiki/100_prisoners_problem</a>

评论 #32963676 未加载

评论 #32965199 未加载

评论 #32961743 未加载

评论 #32961165 未加载

评论 #32969371 未加载

评论 #32966079 未加载

评论 #32969322 未加载

tdidoover 2 years ago

This is used in both bwa [1] and bowtie [2], two of the most popular DNA sequence aligners.[1] <a href="https://github.com/lh3/bwa" rel="nofollow">https://github.com/lh3/bwa</a>[2] <a href="https://github.com/BenLangmead/bowtie" rel="nofollow">https://github.com/BenLangmead/bowtie</a>

评论 #32962498 未加载

评论 #32961381 未加载

lancefisherover 2 years ago

This is a fun video from a series on compression that explains it well, and features Mike Burrows:<a href="https://youtu.be/4WRANhDiSHM" rel="nofollow">https://youtu.be/4WRANhDiSHM</a>He shares the origin of the algorithm as well as a story about how it was first published.The Compressor Head video series is the best introduction to compression that I’ve found.

评论 #32961707 未加载

ur-whaleover 2 years ago

I still remember when I first read the first C implementation of this, which was IIRC by Mike Burrows.Apparently, back in the days, Mike had some sort of philosophical opposition to indenting his C code.I was impressed he managed to write such a complex piece of code without ever indenting anything.

评论 #32962222 未加载

评论 #32980126 未加载

评论 #32965759 未加载

dangover 2 years ago

Related:Burrows-Wheeler Transform [video] - <a href="https://news.ycombinator.com/item?id=10721401" rel="nofollow">https://news.ycombinator.com/item?id=10721401</a> - Dec 2015 (5 comments)Compression with the Burrows-Wheeler Transform - <a href="https://news.ycombinator.com/item?id=1112845" rel="nofollow">https://news.ycombinator.com/item?id=1112845</a> - Feb 2010 (6 comments)<a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=%22burrows-wheeler%22&sort=byDate&type=comment" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...</a>

skrebbelover 2 years ago

Can anyone explain to me why this is said to be O(n) when it heavily relies on sorting?

评论 #32961205 未加载

lofatdairyover 2 years ago

For biologists reading this, I recommend the following series from a mathematical biologist in Spain (maybe Toronto by now): <a href="http://blog.thegrandlocus.com/tag/burrows-wheeler-transform" rel="nofollow">http://blog.thegrandlocus.com/tag/burrows-wheeler-transform</a>

mcintover 2 years ago

Someone's been reading about string matching algorithms, or compression. For biology?

评论 #32960901 未加载

djha-skinover 2 years ago

It's fascinating to me that xz's algorithm[1] beats bzip2 (Burrows-Wheeler) in both time and space, but it's a much simpler algorithm.1: <a href="https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm" rel="nofollow">https://en.m.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93M...</a>

评论 #32963472 未加载

评论 #32964452 未加载

评论 #32965542 未加载

ducaaleover 2 years ago

Fun fact, burrows-wheeler is one of the assignments in coursera's algorthims course.

personjerryover 2 years ago

How is it that this doesn't violate the pidgeonhole principle?

tehjokerover 2 years ago

If you were applying a first pass of compression to integer data and then applying gzip, would adding BWT in the middle still be beneficial?

beagle3over 2 years ago

It is an incredibly elegant scheme for bringing Markov context outputs together without actually trying to figure out what those contexts are.

billfruitover 2 years ago

Is there any analogous method for images?

评论 #32965101 未加载

frozencellover 2 years ago

So even the inventors quite misunderstand how they came up with this marvel. (we are lost ^^)

GordonSover 2 years ago

Has anyone here found uses for it, perhaps for domain-specific string compression?

评论 #32962239 未加载

16 comments

dkbrkover 2 years ago

评论 #32963676 未加载

评论 #32965199 未加载

评论 #32961743 未加载

评论 #32961165 未加载

评论 #32969371 未加载

评论 #32966079 未加载

评论 #32969322 未加载

tdidoover 2 years ago

评论 #32962498 未加载

评论 #32961381 未加载

lancefisherover 2 years ago

评论 #32961707 未加载

ur-whaleover 2 years ago

评论 #32962222 未加载

评论 #32980126 未加载

评论 #32965759 未加载

dangover 2 years ago

skrebbelover 2 years ago

Can anyone explain to me why this is said to be O(n) when it heavily relies on sorting?

评论 #32961205 未加载

lofatdairyover 2 years ago

mcintover 2 years ago

Someone's been reading about string matching algorithms, or compression. For biology?

评论 #32960901 未加载

djha-skinover 2 years ago

评论 #32963472 未加载

评论 #32964452 未加载

评论 #32965542 未加载

ducaaleover 2 years ago

Fun fact, burrows-wheeler is one of the assignments in coursera's algorthims course.

personjerryover 2 years ago

How is it that this doesn't violate the pidgeonhole principle?

tehjokerover 2 years ago

If you were applying a first pass of compression to integer data and then applying gzip, would adding BWT in the middle still be beneficial?

beagle3over 2 years ago

It is an incredibly elegant scheme for bringing Markov context outputs together without actually trying to figure out what those contexts are.

billfruitover 2 years ago

Is there any analogous method for images?

评论 #32965101 未加载

frozencellover 2 years ago

So even the inventors quite misunderstand how they came up with this marvel. (we are lost ^^)

GordonSover 2 years ago

Has anyone here found uses for it, perhaps for domain-specific string compression?

评论 #32962239 未加载

Burrows–Wheeler Transform

16 comments

Burrows–Wheeler Transform

16 comments