TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to Write a Spelling Corrector (2007)

76 点作者 fybs超过 3 年前

9 条评论

Joker_vD超过 3 年前
And after you&#x27;ve read that, here&#x27;s a related blogpost: &quot;A Spellchecker Used to Be a Major Feat of Software Engineering&quot; [0], because Python being &quot;fast enough&quot; and having enough memory for large dictionaries hasn&#x27;t always been the case.<p>[0] <a href="https:&#x2F;&#x2F;prog21.dadgum.com&#x2F;29.html" rel="nofollow">https:&#x2F;&#x2F;prog21.dadgum.com&#x2F;29.html</a>
评论 #28566211 未加载
mtreis86超过 3 年前
One of my most common spelling mistakes is physical mistypes on the keyboard, yet no spell checker seems to account for keyboard layout and locality of keys, or for something like my hand being one position off on the board but typing all the keys relatively correct only positionally shifted.
评论 #28570768 未加载
评论 #28566359 未加载
评论 #28566518 未加载
评论 #28566642 未加载
评论 #28566638 未加载
killingtime74超过 3 年前
Those interested in toy implementations in this area might also enjoy this blog <a href="https:&#x2F;&#x2F;blog.burntsushi.net&#x2F;transducers&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.burntsushi.net&#x2F;transducers&#x2F;</a> on FSMs.<p>Also the NLP Book on the data side <a href="https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;</a>
pandatigox超过 3 年前
I&#x27;ve had a thought and am curious how people would solve it. Sometimes, if you copy words off a PDF lecture slide, all the words are mashed together (eg. Hello Foo bar → HelloFoobar). Is this an AI domain or can it solved by simple programming?
评论 #28563833 未加载
评论 #28564929 未加载
评论 #28563783 未加载
评论 #28565933 未加载
评论 #28563555 未加载
评论 #28563979 未加载
评论 #28565272 未加载
评论 #28563854 未加载
评论 #28564164 未加载
eigenhombre超过 3 年前
One of the more interesting parts of the post, for me, is the list of implementations in other languages, including: one for Clojure written by that language&#x27;s author, Rich Hickey; an interesting one in R that clocks in at 2 lines (with a longer, more readable version further down in the linked post); and one written in functional Java. The first one in Awk is also interesting.
graycat超过 3 年前
My favorite, long standard spell checker is Aspell long part of a TeX distribution.
the-smug-one超过 3 年前
Could use a Hidden Markov Model, how to implement: <a href="https:&#x2F;&#x2F;www.cs.sjsu.edu&#x2F;~stamp&#x2F;RUA&#x2F;HMM.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.sjsu.edu&#x2F;~stamp&#x2F;RUA&#x2F;HMM.pdf</a><p>Here&#x27;s an impl of some kind: <a href="https:&#x2F;&#x2F;github.com&#x2F;crisbal&#x2F;hmm-spellcheck" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;crisbal&#x2F;hmm-spellcheck</a>
da39a3ee超过 3 年前
From a pedagogical point of view presenting that as a Bayesian model and then using the error model he does, is a bit questionable. But as always his python style is inspiring.
blondin超过 3 年前
just wait for old hn to show up and tell new hn that the famous Norvig&#x27;s spelling corrector is not efficient and is not teaching people how to do it right.