I thought I would post a practical application, FWIW. I recently applied edit distance to a problem I was facing on my free dating site. Being free, we attract an inordinate number of scammers. I've been developing an arsenal of tools to help identify and track suspects in order to delete their accounts before they do too much damage. The most successful to date has used edit distance.<p>Scammers tend to post identical or slightly altered messages to a large number of users. As you might expect, this has the effect of quickly polluting the valid ecosystem. The solution I employed was to delay sending messages between users long enough to capture spammer patterns and compare the messages to one another.<p>I use the Levenshtein distance algorithm (which has O(n^2) performance). If I get a similarity over a certain threshold. I quarantine the message for further analysis. To work around the performance, I bound the problem by capping the number of messages used for analysis.<p>It has been incredibly effective.
Dick Lipton's blog has a huge collection of really amazing kiddie pool tastes and pointers within awesome nuggets of theoretical computer science, one of his posts that I want to walk through carefully when i have the time is the one on the quantum method, for proving various nifty algorithmic complexity results etc.<p><a href="http://rjlipton.wordpress.com/2009/03/28/erds-and-the-quantum-method/" rel="nofollow">http://rjlipton.wordpress.com/2009/03/28/erds-and-the-quantu...</a><p>sexy stuff
What a coincidence! I have been looking at the edit distance problem, because it can be applied to computer vision. One of the uses is if you have a stereo image pair you can figure out the depth of the objects in the scene, by matching corresponding features. The edit distance problem helps with figuring out which pixels in one stereo image correspond to pixels in the other stereo image.<p>I am hoping with a bit more effort I can understand the dynamic programming side of it better.
For a bit-parallel approach to this problem, see: <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.4342" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.4...</a>
I spent a year and a half as an undergraduate converting a custom C edit distance implementation into Java for a computer assisted language instruction system. This is a really nice article discussing the topic. I think the interesting facet of dynamic programming is how it shows the dynamic between space and time. In his creative book on algorithms, Udi Manber gives a nice description of a general approach to designing algorithms with dynamic programming.
A completely different approach to the problem is to minimize the number of necessary distance computation in the first place. There are various indexing algorithms which make finding nearest neighbours feasible in large datasets. Incidentally, I implemented two of them. :)<p><a href="http://well-adjusted.de/mspace.py" rel="nofollow">http://well-adjusted.de/mspace.py</a><p>For real usage, you would probably want to reimplement them with performance in mind (and in a different language), though. My implementation is dog slow. But on the plus side, it is very readable and heavily documented.
The edit distance and the algorithms to solve it have been of great use in my research work in neuroscience. The places where I can apply them are as surprising as 3D mesh reconstruction from a stack of outlines ( <a href="http://www.blender.org/community/blender-conference/blender-conference-2006/proceedings/albert-cardona/" rel="nofollow">http://www.blender.org/community/blender-conference/blender-...</a> ), to neuronal recognition (I'm currently finishing a paper on that; see some images here: <a href="http://albert.rierol.net/doodle_programming.html#11" rel="nofollow">http://albert.rierol.net/doodle_programming.html#11</a> ).<p>I know of other labs using edit distance to compare people's timelines (to estimate cost of shifting from one lifestyle to another). And the most widely use application of edit distances is in the comparison of any DNA/RNA/protein sequence for similarity, both for identification purposes and for the study of their evolution.