And after you've read that, here's a related blogpost: "A Spellchecker Used to Be a Major Feat of Software Engineering" [0], because Python being "fast enough" and having enough memory for large dictionaries hasn't always been the case.<p>[0] <a href="https://prog21.dadgum.com/29.html" rel="nofollow">https://prog21.dadgum.com/29.html</a>
One of my most common spelling mistakes is physical mistypes on the keyboard, yet no spell checker seems to account for keyboard layout and locality of keys, or for something like my hand being one position off on the board but typing all the keys relatively correct only positionally shifted.
Those interested in toy implementations in this area might also enjoy this blog <a href="https://blog.burntsushi.net/transducers/" rel="nofollow">https://blog.burntsushi.net/transducers/</a> on FSMs.<p>Also the NLP Book on the data side <a href="https://web.stanford.edu/~jurafsky/slp3/" rel="nofollow">https://web.stanford.edu/~jurafsky/slp3/</a>
I've had a thought and am curious how people would solve it. Sometimes, if you copy words off a PDF lecture slide, all the words are mashed together (eg. Hello Foo bar → HelloFoobar). Is this an AI domain or can it solved by simple programming?
One of the more interesting parts of the post, for me, is the list of implementations in other languages, including: one for Clojure written by that language's author, Rich Hickey; an interesting one in R that clocks in at 2 lines (with a longer, more readable version further down in the linked post); and one written in functional Java. The first one in Awk is also interesting.
Could use a Hidden Markov Model, how to implement: <a href="https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf" rel="nofollow">https://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf</a><p>Here's an impl of some kind: <a href="https://github.com/crisbal/hmm-spellcheck" rel="nofollow">https://github.com/crisbal/hmm-spellcheck</a>
From a pedagogical point of view presenting that as a Bayesian model and then using the error model he does, is a bit questionable. But as always his python style is inspiring.
just wait for old hn to show up and tell new hn that the famous Norvig's spelling corrector is not efficient and is not teaching people how to do it right.