TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: N-Gram spelling correction

2 点作者 arthurk大约 14 年前
Hi,<p>I was wondering why no one has done an n-gram spelling correction yet. Nearly all the research papers take spelling correction as an example of what can be done with n-gram data, yet I see no services that make use of this.<p>What are the disadvantages of using n-gram data for spelling correction?

2 条评论

ynn4k大约 14 年前
A general problem with n-gram is the conundrum of data-sparseness vs reliability of estimation. To have reliable estimation, you need larger order n in n-gram, but it also increases the size of the model which requires larger amount of data and storage. Thanks to the Web as a corpus and cloud computing, we now have upto 5-gram models computable on Terabytes of data provided you are resourceful. One problem with this approach is the selection of the web data to be used for training. The better adaption to the target scenario, the better accuracy.<p><pre><code> i see no services that make use of this. </code></pre> Most services have proprietary implementations of spell correction that is an amalgamation of several techniques including n-grams, and they might not like to make it public.
sagacity大约 14 年前
Very interesting question. We've been thinking of offering something like this at the back of:<p><a href="http://www.RapiDefs.com" rel="nofollow">http://www.RapiDefs.com</a><p>Will post more on this later today.