TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Spelling Corrector in 21 lines of Python

181 pointsby l0nwlfover 14 years ago

8 comments

rayvalover 14 years ago
Interesting that the shortest (and among the most readable) is the 15-line version written in old-school AWK: <a href="http://pacman.blog.br/wiki/index.php?title=Um_Corretor_Ortográfico_em_GAWK" rel="nofollow">http://pacman.blog.br/wiki/index.php?title=Um_Corretor_Ortog...</a><p>A surprise is that the Perl implementation weighs in at 63 lines. I would have expected much less. I expect a much shorter version is possible, relying on idiomatic constructs at the expense of readability.
评论 #2035901 未加载
评论 #2036088 未加载
评论 #2037006 未加载
timrobinsonover 14 years ago
I'm no Python expert, but I liked doing this in Haskell:<p><a href="http://www.partario.com/blog/2009/10/a-spelling-corrector-in-haskell.html" rel="nofollow">http://www.partario.com/blog/2009/10/a-spelling-corrector-in...</a><p><a href="https://github.com/timrobinson/spell-correct/blob/master/Correct.hs" rel="nofollow">https://github.com/timrobinson/spell-correct/blob/master/Cor...</a>
评论 #2037674 未加载
clvvover 14 years ago
I have seen this one before somewhere, and what amazes me is that how you can solve problems without a hassle if you get the "trick" right. Another case I read was that Google use(at least used) two vectors(each consists of many 0s and 1s, which in turn represent whether the web page has the keyword or not) to represent web pages, and calculate the angle between the vectors to figure out the similarity(a value) between web pages.
评论 #2036428 未加载
评论 #2035850 未加载
评论 #2035857 未加载
评论 #2036301 未加载
prsover 14 years ago
Scrolling to the bottom of the article gives you a list of similar implementations in other languages.<p>Very interesting to see how other have tackled this programming problem.
评论 #2036375 未加载
tgflynnover 14 years ago
Anyone else having trouble parsing this list comprehension ?<p><pre><code> e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS</code></pre>
评论 #2035609 未加载
评论 #2035653 未加载
评论 #2036627 未加载
vebover 14 years ago
In PHP, you can use the levenshtein() function.<p><a href="http://php.net/manual/en/function.levenshtein.php" rel="nofollow">http://php.net/manual/en/function.levenshtein.php</a>
评论 #2036119 未加载
评论 #2035672 未加载
评论 #2035545 未加载
SlyShyover 14 years ago
If anyone is interested in other applications of this technique, this is a Ruby library I wrote to do sentence tokenization: <a href="https://github.com/SlyShy/Tactful_Tokenizer" rel="nofollow">https://github.com/SlyShy/Tactful_Tokenizer</a>
singularover 14 years ago
I found it fascinating that you can actually implement something seemingly so magical in such a small amount of code.<p>I'm down as 22 lines of C#, though to be fair I am cheating vastly and the lines are huge :)<p><a href="http://www.codegrunt.co.uk/2010/11/02/C-Sharp-Norvig-Spelling-Corrector.html" rel="nofollow">http://www.codegrunt.co.uk/2010/11/02/C-Sharp-Norvig-Spellin...</a><p>C# does offer some nice features to give you some succinctness but there's no getting away from the verboseness of a java-like language.
评论 #2037141 未加载