TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How I Trie to Make Spelling Suggestions

46 点作者 raffi超过 15 年前

8 条评论

mrshoe超过 15 年前
Rather than re-implement this it might be better to use an existing library. In Python, I've used pylevenshtein:<p><a href="http://code.google.com/p/pylevenshtein/" rel="nofollow">http://code.google.com/p/pylevenshtein/</a><p>However, I've found the Jaro-Winkler distance to be more useful than the Levenshtein. You can find good implementations for Jaro out there as well.<p><a href="http://en.wikipedia.org/wiki/Jaro-Winkler_distance" rel="nofollow">http://en.wikipedia.org/wiki/Jaro-Winkler_distance</a>
评论 #1087224 未加载
elblanco超过 15 年前
Clever title. In my experience, in a practical sense, tries tend to be very memory intensive with only a marginal increase in speed over other alternatives like simple hash lookups. However, this is a very interesting and more typical use-case for a trie. I've always like it better than using Hamming distance which still seems to get recommended in undergraduate texts.<p>Oh, and I glossed over the perl code, until I reread it and realized it was sleep code. First time hearing about this language...interesting.<p><a href="http://sleep.dashnine.org/" rel="nofollow">http://sleep.dashnine.org/</a>
评论 #1087186 未加载
abecedarius超过 15 年前
Norvig posted a similar algorithm using a hashtable instead of a trie: edits() in ngrams.py at <a href="http://norvig.com/ngrams/" rel="nofollow">http://norvig.com/ngrams/</a><p>You use a table of all prefixes of all dictionary words. This might be more or less efficient than a trie, depending on implementation; but in interpreted Python the built-in hashtables are bound to win.<p>(It's descended from code I sent in response to <a href="http://norvig.com/spell-correct.html" rel="nofollow">http://norvig.com/spell-correct.html</a>, rewritten to return a dict of candidates each paired with a description of how it's different from the original word. There's a bug of sorts in that the result set misses a very few candidates his first article's code finds, unless you extend the edit-distance cutoff; I only noticed the problem after I'd mailed him the code. I'm not sure if the OP's algorithm has the same shortcoming -- I haven't read it closely.)
tom_pinckney超过 15 年前
Careful what dictionary you use...my /usr/share/dict/words has things in it that might not be polite/acceptable to suggest.
llimllib超过 15 年前
<a href="http://en.wikipedia.org/wiki/Patricia_tree" rel="nofollow">http://en.wikipedia.org/wiki/Patricia_tree</a> would be an improvement.
评论 #1087190 未加载
yan超过 15 年前
Do people pronounce trie as 'try' or 'tree'? I pronounce it as 'tree,' which is why it took me a second to catch the pun.
评论 #1087335 未加载
评论 #1087656 未加载
评论 #1087275 未加载
评论 #1087408 未加载
indigoviolet超过 15 年前
I've used Burkhard-Keller trees to do something like this:<p><a href="http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees" rel="nofollow">http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK...</a><p>I don't know off the top of my head which one is more efficient.
kristianp超过 15 年前
What is the algorithmic complexity of your trie code compared to Norvig's example corrector?