How I Wrote an Ultra-Fast DNA Sequence Alignment Algorithm in JavaScript

106 pointsby jebus989over 10 years ago

13 comments

This is a neat JS library with some possible applications, but if you're only matching substrings without gaps, then you're basically string searching, not performing alignment. Alignment is an entirely different and more complicated computational problem with decades of algoritm development (see Smith-Waterman, Needleman-Wunsch, Burrows-Wheeler transform) and implementation going into it, and it is far more biologically useful and practical than substring searching. Considering this, in practice heuristic implementations like BLAST, FASTA, and lastz are used to do most big alignment searching, and that's not even considering multiple sequence alignment, which is another problem into itself. In other words, it's a useful JS library and interesting implementation, but it isn't alignment.

评论 #9096469 未加载

xaaover 10 years ago

My colleague (an editor at Bioinformatics journal) and I were joking the other day about how every other paper title, especially about aligners, seems to include the word "fast" in it. This takes it to a new level.As a learning exercise, this is interesting and fine. I am trying very hard to suppress the inner "reviewer" right now. Walking away...not comparing this to existing algorithms which are implemented in highly optimized C/C++, or CUDA, or even hardware. About why other authors would go to such extraordinary lengths if high-level languages are suitable. Not going to ask how conclusions can be drawn about the suitability of Javascript for very computationally-intensive tasks without solving the actual alignment problem rather than a subset, let alone comparing to existing tools. Not going to ask about the application of the tool. It's not a paper. I'm breathing. OK.

评论 #9097715 未加载

评论 #9099722 未加载

评论 #9099814 未加载

searineover 10 years ago

Except this is totally reinventing the wheel.If I want to do fast genome alignment, I'll use lastz which is already blazing fast and written in C. If I want to just look for homology, I'll use blast.There isn't much need for improved alignment algorithms. If it's a big job, most bioinformatics have access to clusters. If it is a small job, who cares about speed.

评论 #9096278 未加载

discardoramaover 10 years ago

I liked the exposition, but it would have been nice if the author had compared his implementation with something more meaty. Like Sustik-Moore[1], for instance.[1] <a href="http://www.cs.utexas.edu/users/moore/publications/sustik-moore.pdf" rel="nofollow">http://www.cs.utexas.edu/users/moore/publications/sustik-moo...</a>

评论 #9096039 未加载

评论 #9095758 未加载

AdamTReinekeover 10 years ago

I'm not familiar with the ASM.js spec[0] to know how hard it would be, but I am curious to know if there would be a perf improvement if you could get it to run in that mode. Nice write-up![0] <a href="http://asmjs.org/spec/latest/" rel="nofollow">http://asmjs.org/spec/latest/</a>

Ultimattover 10 years ago

Not sure I get why the author of the article didn't just use the well known Bitap algorithm? <a href="http://en.wikipedia.org/wiki/Bitap_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Bitap_algorithm</a>

hashtreeover 10 years ago

With io.js coming on to the scene, I've recently revisited my thoughts on using JavaScript for scientific/statistical computing. With ES6, V8 Turbofan coming down the pipe, and enough support for serious functional programming (e.g. fantasy land, transducers, ramda, etc), I've decided to shift from using Scala/Haskell/R to JavaScript. I think there is a bright future in these areas and a lot of room for devs to make a name for themselves doing this kind of development. R is particularly easy to beat on the performance front, from the libs I have rewritten to JS to date.

评论 #9096525 未加载

evanpwover 10 years ago

I thought Atwood's Law [1] was a joke, but this is seriously interesting. I remember when Java was just a fun little toy language for making web pages, and I'm still surprised every time someone mentions it as a serious number-crunching platform. (I've heard that the entire Nasdaq matching engine is a single-threaded Java application.) Is JavaScript going to end up the same way?[1] <a href="http://blog.codinghorror.com/the-principle-of-least-power/" rel="nofollow">http://blog.codinghorror.com/the-principle-of-least-power/</a>

评论 #9099146 未加载

jervenover 10 years ago

Super neat JS work! I have similar code for doing GC/AT counts in java. Using longs and popcount one can do 16 nucleotide per cycle. Also what is nice is that one can easily use a memory mapped byte buffer as long buffer to run through the code. To go faster than that AVX needs to be used/jitted in.What it really shows is that the FASTA format is just terrible for computational efficiency :(<a href="https://gist.github.com/JervenBolleman/d1430d0549028861504c" rel="nofollow">https://gist.github.com/JervenBolleman/d1430d0549028861504c</a>

Rhapsoover 10 years ago

this is the commonly used BLAST/alignment tool: <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=blastn&BLAST_SPEC=GlobalAln&LINK_LOC=BlastHomeLink" rel="nofollow">http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearc...</a>for comparison.

评论 #9095942 未加载

daemonkover 10 years ago

Cool. I saw this the other day when you posted it at BioStar (I am Damian Kao at BioStar). Next step is to establish some alignment metrics (some kind of p-value).

评论 #9096206 未加载

tresilover 10 years ago

Very exciting work and a fantastic write up! Looking forward to giving this a shot.

yatoomyover 10 years ago

Loving the rise of JS algs. Keep it up!

评论 #9099841 未加载