TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to Write a Spelling Corrector (2007)

76 pointsby fybsover 3 years ago

9 comments

Joker_vDover 3 years ago
And after you&#x27;ve read that, here&#x27;s a related blogpost: &quot;A Spellchecker Used to Be a Major Feat of Software Engineering&quot; [0], because Python being &quot;fast enough&quot; and having enough memory for large dictionaries hasn&#x27;t always been the case.<p>[0] <a href="https:&#x2F;&#x2F;prog21.dadgum.com&#x2F;29.html" rel="nofollow">https:&#x2F;&#x2F;prog21.dadgum.com&#x2F;29.html</a>
评论 #28566211 未加载
mtreis86over 3 years ago
One of my most common spelling mistakes is physical mistypes on the keyboard, yet no spell checker seems to account for keyboard layout and locality of keys, or for something like my hand being one position off on the board but typing all the keys relatively correct only positionally shifted.
评论 #28570768 未加载
评论 #28566359 未加载
评论 #28566518 未加载
评论 #28566642 未加载
评论 #28566638 未加载
killingtime74over 3 years ago
Those interested in toy implementations in this area might also enjoy this blog <a href="https:&#x2F;&#x2F;blog.burntsushi.net&#x2F;transducers&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.burntsushi.net&#x2F;transducers&#x2F;</a> on FSMs.<p>Also the NLP Book on the data side <a href="https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;</a>
pandatigoxover 3 years ago
I&#x27;ve had a thought and am curious how people would solve it. Sometimes, if you copy words off a PDF lecture slide, all the words are mashed together (eg. Hello Foo bar → HelloFoobar). Is this an AI domain or can it solved by simple programming?
评论 #28563833 未加载
评论 #28564929 未加载
评论 #28563783 未加载
评论 #28565933 未加载
评论 #28563555 未加载
评论 #28563979 未加载
评论 #28565272 未加载
评论 #28563854 未加载
评论 #28564164 未加载
eigenhombreover 3 years ago
One of the more interesting parts of the post, for me, is the list of implementations in other languages, including: one for Clojure written by that language&#x27;s author, Rich Hickey; an interesting one in R that clocks in at 2 lines (with a longer, more readable version further down in the linked post); and one written in functional Java. The first one in Awk is also interesting.
graycatover 3 years ago
My favorite, long standard spell checker is Aspell long part of a TeX distribution.
the-smug-oneover 3 years ago
Could use a Hidden Markov Model, how to implement: <a href="https:&#x2F;&#x2F;www.cs.sjsu.edu&#x2F;~stamp&#x2F;RUA&#x2F;HMM.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.sjsu.edu&#x2F;~stamp&#x2F;RUA&#x2F;HMM.pdf</a><p>Here&#x27;s an impl of some kind: <a href="https:&#x2F;&#x2F;github.com&#x2F;crisbal&#x2F;hmm-spellcheck" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;crisbal&#x2F;hmm-spellcheck</a>
da39a3eeover 3 years ago
From a pedagogical point of view presenting that as a Bayesian model and then using the error model he does, is a bit questionable. But as always his python style is inspiring.
blondinover 3 years ago
just wait for old hn to show up and tell new hn that the famous Norvig&#x27;s spelling corrector is not efficient and is not teaching people how to do it right.