TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: N-Gram spelling correction

2 pointsby arthurkabout 14 years ago
Hi,<p>I was wondering why no one has done an n-gram spelling correction yet. Nearly all the research papers take spelling correction as an example of what can be done with n-gram data, yet I see no services that make use of this.<p>What are the disadvantages of using n-gram data for spelling correction?

2 comments

ynn4kabout 14 years ago
A general problem with n-gram is the conundrum of data-sparseness vs reliability of estimation. To have reliable estimation, you need larger order n in n-gram, but it also increases the size of the model which requires larger amount of data and storage. Thanks to the Web as a corpus and cloud computing, we now have upto 5-gram models computable on Terabytes of data provided you are resourceful. One problem with this approach is the selection of the web data to be used for training. The better adaption to the target scenario, the better accuracy.<p><pre><code> i see no services that make use of this. </code></pre> Most services have proprietary implementations of spell correction that is an amalgamation of several techniques including n-grams, and they might not like to make it public.
sagacityabout 14 years ago
Very interesting question. We've been thinking of offering something like this at the back of:<p><a href="http://www.RapiDefs.com" rel="nofollow">http://www.RapiDefs.com</a><p>Will post more on this later today.