This is a neat JS library with some possible applications, but if you're only matching substrings without gaps, then you're basically string searching, not performing alignment. Alignment is an entirely different and more complicated computational problem with decades of algoritm development (see Smith-Waterman, Needleman-Wunsch, Burrows-Wheeler transform) and implementation going into it, and it is far more biologically useful and practical than substring searching. Considering this, in practice heuristic implementations like BLAST, FASTA, and lastz are used to do most big alignment searching, and that's not even considering multiple sequence alignment, which is another problem into itself.
In other words, it's a useful JS library and interesting implementation, but it isn't alignment.
My colleague (an editor at Bioinformatics journal) and I were joking the other day about how every other paper title, especially about aligners, seems to include the word "fast" in it. This takes it to a new level.<p>As a learning exercise, this is interesting and fine. I am trying very hard to suppress the inner "reviewer" right now. Walking away...not comparing this to existing algorithms which are implemented in highly optimized C/C++, or CUDA, or even hardware. About why other authors would go to such extraordinary lengths if high-level languages are suitable. Not going to ask how conclusions can be drawn about the suitability of Javascript for very computationally-intensive tasks without solving the actual alignment problem rather than a subset, let alone comparing to existing tools. Not going to ask about the application of the tool. It's not a paper. I'm breathing. OK.
Except this is totally reinventing the wheel.<p>If I want to do fast genome alignment, I'll use lastz which is already blazing fast and written in C. If I want to just look for homology, I'll use blast.<p>There isn't much need for improved alignment algorithms. If it's a big job, most bioinformatics have access to clusters. If it is a small job, who cares about speed.
I liked the exposition, but it would have been nice if the author had compared his implementation with something more meaty. Like Sustik-Moore[1], for instance.<p>[1] <a href="http://www.cs.utexas.edu/users/moore/publications/sustik-moore.pdf" rel="nofollow">http://www.cs.utexas.edu/users/moore/publications/sustik-moo...</a>
I'm not familiar with the ASM.js spec[0] to know how hard it would be, but I am curious to know if there would be a perf improvement if you could get it to run in that mode. Nice write-up!<p>[0] <a href="http://asmjs.org/spec/latest/" rel="nofollow">http://asmjs.org/spec/latest/</a>
Not sure I get why the author of the article didn't just use the well known Bitap algorithm? <a href="http://en.wikipedia.org/wiki/Bitap_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Bitap_algorithm</a>
With io.js coming on to the scene, I've recently revisited my thoughts on using JavaScript for scientific/statistical computing. With ES6, V8 Turbofan coming down the pipe, and enough support for serious functional programming (e.g. fantasy land, transducers, ramda, etc), I've decided to shift from using Scala/Haskell/R to JavaScript. I think there is a bright future in these areas and a lot of room for devs to make a name for themselves doing this kind of development. R is particularly easy to beat on the performance front, from the libs I have rewritten to JS to date.
I thought Atwood's Law [1] was a joke, but this is seriously interesting. I remember when Java was just a fun little toy language for making web pages, and I'm still surprised every time someone mentions it as a serious number-crunching platform. (I've heard that the entire Nasdaq matching engine is a single-threaded Java application.) Is JavaScript going to end up the same way?<p>[1] <a href="http://blog.codinghorror.com/the-principle-of-least-power/" rel="nofollow">http://blog.codinghorror.com/the-principle-of-least-power/</a>
Super neat JS work! I have similar code for doing GC/AT counts in java. Using longs and popcount one can do 16 nucleotide per cycle. Also what is nice is that one can easily use a memory mapped byte buffer as long buffer to run through the code. To go faster than that AVX needs to be used/jitted in.<p>What it really shows is that the FASTA format is just terrible for computational efficiency :(<p><a href="https://gist.github.com/JervenBolleman/d1430d0549028861504c" rel="nofollow">https://gist.github.com/JervenBolleman/d1430d0549028861504c</a>
this is the commonly used BLAST/alignment tool:
<a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=blastn&BLAST_SPEC=GlobalAln&LINK_LOC=BlastHomeLink" rel="nofollow">http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearc...</a><p>for comparison.
Cool. I saw this the other day when you posted it at BioStar (I am Damian Kao at BioStar). Next step is to establish some alignment metrics (some kind of p-value).