TechEcho

3 comments

marcodiegoalmost 3 years ago

> The hashing function described above is too simple to do the job properly - dcd, hdb and various other non-words would all hash to 223 and be accepted - but it's possible to devise more complicated hashing functions so that hardly any non-words will be accepted. You may use more than one hashing function; you could derive, say, six numbers from the same word and check them all in the bit map (or in six separate bit maps), accepting the word only if all six bits were set.<p>Just described how a Bloom filter works.

jwstarralmost 3 years ago

A more quantitative approach can be found in a pair of papers from John C Nesbit, who analyzed ten algorithms in 1985/86 (<a href="https://archive.org/details/sim_journal-of-computer-based-instruction_summer-1985_12_3/page/n15/mode/1up" rel="nofollow">https://archive.org/details/sim_journal-of-computer-based-in...</a> ; <a href="https://archive.org/details/sim_journal-of-computer-based-instruction_summer-1986_13_3/page/n18/mode/1up" rel="nofollow">https://archive.org/details/sim_journal-of-computer-based-in...</a>). Generalized edit distance performed best, but also took the most time. The PLATO algorithm, which used a feature vector-esque approach, came in third in quality and was also efficient. Phonetic approaches came in third. Since the charts are hard to read and summarize, I converted the result into F1 scores (<a href="https://ztoz.blog/posts/nesbit/" rel="nofollow">https://ztoz.blog/posts/nesbit/</a>).

homodeusalmost 3 years ago

In 2022, "state of the art" is throwing a deep net at it. It will likely pick up on all of these findings (and better ones, incomprehensible to us) by itself given correct architecture and enough data, but I can't help but feel a bit saddened by this - seeing the ingenuity and mastery of all these cited names be obscured and superseded so easily, in a way.<p>I love advancement in the field and what machine learning will enable us to do, but I don't know what to make of this. One argument is that now we have engineers who design the machine learning models, but it is still depressing to me, for some reason. Never knew I would feel like this, am I the only one?<p>P.S.: I'm commenting purely on this topic, which is an ideal big data case - of course, we still have a long way to go with machine learning, one where human minds will have to especially shine.

3 comments

marcodiegoalmost 3 years ago

jwstarralmost 3 years ago

homodeusalmost 3 years ago

Computer Based Spellchecking Techniques

3 comments

Computer Based Spellchecking Techniques

3 comments