TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Douglas McIlroy responds to Unix spell article with new implementation details

163 pointsby abhi9u3 months ago
I originally wrote this article explaining the data model as implemented in the original Unix &quot;spell&quot;: <a href="https:&#x2F;&#x2F;blog.codingconfessions.com&#x2F;p&#x2F;how-unix-spell-ran-in-64kb-ram" rel="nofollow">https:&#x2F;&#x2F;blog.codingconfessions.com&#x2F;p&#x2F;how-unix-spell-ran-in-6...</a>.<p>This reached Doug, who then responded to it with the follow up work he did which wasn&#x27;t published anywhere.

7 comments

re3 months ago
The text of the email (courtesy of Firefox&#x2F;macOS Text recognition):<p>___________________<p>Subject: yes, it&#x27;s me, the author of &quot;spell&quot;<p>You did a nice job of gently describing &quot;spell&quot;s data representation. But the formula x = 2^{log_2(m)}-m is a complicated way to say x=0. Did you mean that?<p>When memory got bigger, I kept the literal dictionary in memory, but compressed it on secondary memory for fast transfer. The compression was trivial: store a suffix preceded by one byte that contained the length of the prefix that the word shared with its predecessor in dictionary order.<p>We also added one byte per enry for affixing info, e.g. whether in- (or im- or ir-) is preferred over un-, whether a final consonant is doubled before adding a suffix that begins with a vowel, part of speech, etc. Although there were lots of such attributes, the number of distinct combinations of attributes that actually occurred was fewer than 256, so could be coded into one byte, and decoded by lookup in a 256-entry table. I automated the construction of that table, which would change if a dictionary revision created a new combination of attributes.<p>With affixing info we could at the same time be more aggressive about affixing, and make fewer mistakes like allowing -s on a word that cannot be used as a noun or a verb.
评论 #42995298 未加载
devin3 months ago
There used to be a page of Doug McIlroy &quot;facts&quot;. I managed to hunt this page down by searching around.<p><a href="http:&#x2F;&#x2F;git.9front.org&#x2F;plan9front&#x2F;plan9front&#x2F;c28f07b5840216c4e83e1d3b21d8a9ed7c41f8f9&#x2F;lib&#x2F;dougfacts&#x2F;f.html" rel="nofollow">http:&#x2F;&#x2F;git.9front.org&#x2F;plan9front&#x2F;plan9front&#x2F;c28f07b5840216c4...</a><p>One of the &quot;facts&quot; is: &gt; Doug McIlroy can address 8 terabytes of RAM with only 32 bits.
评论 #42998758 未加载
cb3213 months ago
&gt; The compression was trivial: store a suffix preceded by one byte that contained the length of the prefix that the word shared with its predecessor in dictionary order<p>This is more or less what was speculated upon as compatible with his stemming ideas [1], but I (and partly McIlroy1982) doubt you ever really needed the fancy hashing-compression in the first place (though I&#x27;m sure it was fun to figure out!). I got better perf (3x faster on a 5500 word document with an 82,000 word &quot;unabridged&quot; dictionary that prefix-compresses to only 278K which would fit on a 360K floppy of the era) than v7spell on v7 with just a merge algorithm somewhat more finely tuned against that format. Of course, on a shared computer getting either the streaming disk access (without seeks spoiling the party) or all that RAM to yourself (without swapping spoiling the party) is a guess. And, yeah, probably the background Doug mentions is all in the combined-unix-git-history by now { and it would not have been as attractive an academic paper :-) }.<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42931145">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42931145</a>
dang3 months ago
Recent and related:<p><i>How Unix spell ran in 64kb RAM</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42752604">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42752604</a> - Jan 2025 (51 comments)
jll293 months ago
Slightly related:<p>In the paper Leidner and Plachouras (2017), we reported an observation initially due to Bentley, namely that the McIllroy spell(1) implementation emailed a list of unknown words when it was run over a document to the author. While technically, this is a neat way to increase dictionary size by &quot;mining user data&quot;, and certain versions of Microsoft Word and Microsoft Edge (see <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35208333">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35208333</a> ) had the same behavior, it is privacy-violating at least if users are not informed beforehand [1]. Of course this has to be seen in the cultural context of the UNIX community at the time (people were even weary of using passwords then to protect their accounts), but still harm could be done if an email tasked about &quot;malinoma&quot; when that term was still absent from the dictionary, possibly revealing a sensitive condition of the email author or their circle of friends and family to a third party, the software engineer of the speel checker.<p>[1] <a href="https:&#x2F;&#x2F;aclanthology.org&#x2F;W17-1604.pdf" rel="nofollow">https:&#x2F;&#x2F;aclanthology.org&#x2F;W17-1604.pdf</a>
评论 #42998986 未加载
评论 #42997173 未加载
LeoPanthera3 months ago
No login required: <a href="https:&#x2F;&#x2F;nitter.net&#x2F;abhi9u&#x2F;status&#x2F;1887010136155414602" rel="nofollow">https:&#x2F;&#x2F;nitter.net&#x2F;abhi9u&#x2F;status&#x2F;1887010136155414602</a>
npalli3 months ago
Wow, Doug McIlroy is 92 and still sharp, working through the implementation. Too many times, people complain of being over the hill when they hit 30!. Mental capability doesn&#x27;t slow that dramatically (as physical) good perspective to have.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Douglas_McIlroy" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Douglas_McIlroy</a>
评论 #43003786 未加载