科技回声

Hi,I was wondering why no one has done an n-gram spelling correction yet. Nearly all the research papers take spelling correction as an example of what can be done with n-gram data, yet I see no services that make use of this.What are the disadvantages of using n-gram data for spelling correction?

A general problem with n-gram is the conundrum of data-sparseness vs reliability of estimation. To have reliable estimation, you need larger order n in n-gram, but it also increases the size of the model which requires larger amount of data and storage. Thanks to the Web as a corpus and cloud computing, we now have upto 5-gram models computable on Terabytes of data provided you are resourceful. One problem with this approach is the selection of the web data to be used for training. The better adaption to the target scenario, the better accuracy.<pre><code> i see no services that make use of this. </code></pre> Most services have proprietary implementations of spell correction that is an amalgamation of several techniques including n-grams, and they might not like to make it public.

Very interesting question. We've been thinking of offering something like this at the back of:<a href="http://www.RapiDefs.com" rel="nofollow">http://www.RapiDefs.com</a>Will post more on this later today.

Ask HN: N-Gram spelling correction

2 条评论

Ask HN: N-Gram spelling correction

2 条评论