TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How I Trie to Make Spelling Suggestions

46 pointsby raffiover 15 years ago

8 comments

mrshoeover 15 years ago
Rather than re-implement this it might be better to use an existing library. In Python, I've used pylevenshtein:<p><a href="http://code.google.com/p/pylevenshtein/" rel="nofollow">http://code.google.com/p/pylevenshtein/</a><p>However, I've found the Jaro-Winkler distance to be more useful than the Levenshtein. You can find good implementations for Jaro out there as well.<p><a href="http://en.wikipedia.org/wiki/Jaro-Winkler_distance" rel="nofollow">http://en.wikipedia.org/wiki/Jaro-Winkler_distance</a>
评论 #1087224 未加载
elblancoover 15 years ago
Clever title. In my experience, in a practical sense, tries tend to be very memory intensive with only a marginal increase in speed over other alternatives like simple hash lookups. However, this is a very interesting and more typical use-case for a trie. I've always like it better than using Hamming distance which still seems to get recommended in undergraduate texts.<p>Oh, and I glossed over the perl code, until I reread it and realized it was sleep code. First time hearing about this language...interesting.<p><a href="http://sleep.dashnine.org/" rel="nofollow">http://sleep.dashnine.org/</a>
评论 #1087186 未加载
abecedariusover 15 years ago
Norvig posted a similar algorithm using a hashtable instead of a trie: edits() in ngrams.py at <a href="http://norvig.com/ngrams/" rel="nofollow">http://norvig.com/ngrams/</a><p>You use a table of all prefixes of all dictionary words. This might be more or less efficient than a trie, depending on implementation; but in interpreted Python the built-in hashtables are bound to win.<p>(It's descended from code I sent in response to <a href="http://norvig.com/spell-correct.html" rel="nofollow">http://norvig.com/spell-correct.html</a>, rewritten to return a dict of candidates each paired with a description of how it's different from the original word. There's a bug of sorts in that the result set misses a very few candidates his first article's code finds, unless you extend the edit-distance cutoff; I only noticed the problem after I'd mailed him the code. I'm not sure if the OP's algorithm has the same shortcoming -- I haven't read it closely.)
tom_pinckneyover 15 years ago
Careful what dictionary you use...my /usr/share/dict/words has things in it that might not be polite/acceptable to suggest.
llimllibover 15 years ago
<a href="http://en.wikipedia.org/wiki/Patricia_tree" rel="nofollow">http://en.wikipedia.org/wiki/Patricia_tree</a> would be an improvement.
评论 #1087190 未加载
yanover 15 years ago
Do people pronounce trie as 'try' or 'tree'? I pronounce it as 'tree,' which is why it took me a second to catch the pun.
评论 #1087335 未加载
评论 #1087656 未加载
评论 #1087275 未加载
评论 #1087408 未加载
indigovioletover 15 years ago
I've used Burkhard-Keller trees to do something like this:<p><a href="http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees" rel="nofollow">http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK...</a><p>I don't know off the top of my head which one is more efficient.
kristianpover 15 years ago
What is the algorithmic complexity of your trie code compared to Norvig's example corrector?