Donald Knuth was framed (2020)

358 pointsby goranmoominabout 3 years ago

26 comments

svatabout 3 years ago

I guess this was posted after someone saw this come up on HN yesterday: <a href="https://news.ycombinator.com/item?id=31295427" rel="nofollow">https://news.ycombinator.com/item?id=31295427</a>I've been planning (for 2+ years now so it probably won't happen) writing a large blog post about this matter, which would be called “Tries, Packed Tries, and the Bentley–Knuth–McIlroy story”:• Part 1 would introduce the trie data structure both abstractly (the tree structure) and concretely (how exactly to represent the children of a node), with the trade-offs: as a linked list (space-efficient, slow lookup), as an array (fast lookup, takes space), or as some more nontrivial tree structure itself (“yo dawg…”). Then we'd discuss the really cool idea of “packing” the array, as nicely described in Frank Liang's thesis <a href="https://tug.org/docs/liang/" rel="nofollow">https://tug.org/docs/liang/</a> (also mentioned in TAOCP Exercise 6.3:4 and answer, which incidentally refers to the CACM Bentley–Knuth–McIlroy article that we're talking about). We'd also illustrate how hyphenation is done in the TeX program using packed tries (maybe this can be a separate post).• Part 2 would go into the Bentley–Knuth-McIlroy story:• Part 2a ("Before"): Bentley's thesis/book on writing efficient programs, and the Programming Pearls column. Knuth writing TeX first for himself in SAIL in 1977–78. The instant interest and ports/rewrites it led to. Knuth responding by rewriting TeX into its current form in 1980–82, and how the goals of portable software (thus Pascal, and its limitations! see BWK rant) and of eventually publishing it as a book led him to the idea of literate programming, which he still considers the most important outcome of his years into the TeX project. Meanwhile, at Bell Labs, McIlroy's invention of Unix pipes, and the instant excitement of "pipe day". (“It was absorbed instantly into one's outlook on programming. And by the end of the week secretaries were piping the output of NROFF into the printer”) The Unix philosophy, and how it took a while to spread. How these three threads came together in 1986, when Bentley read Knuth's TeX, wrote about LP in his Pearls column, and published one program of Knuth's (the random-numbers one) and asked him to write another (the frequent words one).• Part 2b (the program): How Knuth ingeniously combined the idea of packed tries with that of hashing with linear probing, to create the data structure especially for this problem (maybe we can call it a hash-packed trie?). Another look at his very nice idea/program, presented differently (maybe some illustrations of how the data structure works), and some design choices he made.• Part 2c (the review and later): How Bentley got McIlroy to review it, a close reading of the review: its actual content and points about LP and Knuth's style, and finally the (first sentence and) last part where McIlroy used the review to advertise his own Unix philosophy and the short shell pipeline version. The further reactions to this, including Bentley's remarks in the same column, and later articles and reactions like the "prefabs" comment (<a href="https://news.ycombinator.com/item?id=22406070" rel="nofollow">https://news.ycombinator.com/item?id=22406070</a>). The short-lived LP column in CACM that it spawned (<a href="https://shreevatsa.net/post/programming-pearls/" rel="nofollow">https://shreevatsa.net/post/programming-pearls/</a>) and how it fizzled out, how it turns out everyone wants to do LP in their own system. How the story got bastardized over time (the misleading "More shell less egg" blog post for example, and some other examples of people misunderstanding the history entirely).• Part 3 would compare the two approaches (even though they are not meant to be compared!), mention how at <a href="https://codegolf.stackexchange.com/questions/188133/" rel="nofollow">https://codegolf.stackexchange.com/questions/188133/</a> I simply translated Knuth's program into C++ and beat the then-fastest Rust solution (since beaten again by translation from C++ back into Rust: still based on the trie idea though!), some words about cache misses and 32-bit “pointers” in a 64-bit world (and Knuth's rant about it).Something like that: it's probably too long for me to ever get around to writing it (or for anyone to be interested in reading), though…

评论 #31306432 未加载

评论 #31305313 未加载

评论 #31320379 未加载

评论 #31307951 未加载

andi999about 3 years ago

Is there more discussion about the 'framed' part? I mean how was the counter perceived. The whole 'setup' looks like:- show how to write a spreadsheet application- here you go, couple of hundred pages it is though- ahh, silly person, why not type 'excel'So what happened next? Everybody rolled eyes, or people said 'yeah, typing excel is pure genious'?

评论 #31305104 未加载

评论 #31302166 未加载

评论 #31303770 未加载

tlarkworthyabout 3 years ago

I am super excited by <a href="https://observablehq.com" rel="nofollow">https://observablehq.com</a> which has made an out-of-the-box literate programming environment for Javascript zero configuration (like Knuth's it has a non-linear execution order). Since switching to literate programming I have found myself adopting a documentation-driven-development methodology, where I ponder about the purpose of the notebook in the introduction, and because I reread it every time I open the notebook, the clarity of the software goes up, which also feedbacks into the ongoing development process too.It takes a little while before habits change, but I think I have learnt more in the last year at age 40 than I did in my first year of undergraduate aged 19.An aspect of Observable which is particularly compelling since Knuth is the fact that inter-notebook dependencies are hyperlinks, so these are not standalone literate programming artifacts, but they are a graph of explanation too. You can learn a lot by surfing notebook dependencies. Bundling code with documentation is such a win.The other thing about Observable is the cells are reactive, so your documentation is not necessarily static either. You can provide animated interactive explanations too.More info on the different types of content you can embed<a href="https://observablehq.com/@observablehq/cell-modes" rel="nofollow">https://observablehq.com/@observablehq/cell-modes</a>More info on the spreadsheet like execution ordering:<a href="https://observablehq.com/@observablehq/observables-not-javascript" rel="nofollow">https://observablehq.com/@observablehq/observables-not-javas...</a> More info on the literate programming support

评论 #31302911 未加载

评论 #31302383 未加载

评论 #31302190 未加载

rendallabout 3 years ago

When I first started in this industry, I thought it was essentially immune to fads. "Does it work? Yes? Ship it!" Seems so naïve. Whole paradigms rise and fall for arbitrary reasons: a viral blog post, or an open letter, or the tech stack of that cool startup. Perhaps this blog post will lead to the reformation of the literate programming approach.

评论 #31302062 未加载

评论 #31303451 未加载

moominabout 3 years ago

If you want to read a thoughtful and valid criticism of Knuth’s ideas, I recommend <a href="https://www.cs.tufts.edu/~nr/pubs/lpsimp-abstract.html" rel="nofollow">https://www.cs.tufts.edu/~nr/pubs/lpsimp-abstract.html</a> by Norman Ramsey. The observation that rang most true for me was: literate programming should be written in the style of a car manual, not a novel.

评论 #31308515 未加载

评论 #31302232 未加载

Semaphorabout 3 years ago

180 comments 2 years ago: <a href="https://news.ycombinator.com/item?id=22406070" rel="nofollow">https://news.ycombinator.com/item?id=22406070</a>

pdpiabout 3 years ago

(2020)It seems to me that both McIlroy’s original critique and this blog post miss the point by a mile. It’s completely meaningless to compare the relative merits of the two solutions, because Knuth’s ultimate goal isn’t to produce a solution.Doing a presentation on literate programming has to deal with two more or less contradictory concerns — you need a sufficiently simple problem that your audience can follow along, but you need your solution to be complex enough that you can actually illustrate LP.A sorted frequency table is a simple enough problem statement, a trie is a sufficiently elaborate solution that doesn’t feel too contrived while also being familiar enough that the audience can follow along. Knuth’s approach was pretty much the perfect way to hit both of those requirements!

评论 #31302422 未加载

评论 #31302111 未加载

评论 #31302438 未加载

评论 #31302287 未加载

评论 #31303395 未加载

评论 #31304265 未加载

评论 #31302089 未加载

scotty79about 3 years ago

McIlroy’s solution was to just use few programs Donald's Knuth's of the world previously wrote to hack together suboptimal solution to his current problem.Ultimately his approach won and today >90% of programming is just stitching together the code someone else wrote so it sorta works for your problem.Much maligned here nodejs ecosystem is implementation of his approach to web development.

评论 #31303466 未加载

评论 #31306561 未加载

评论 #31303191 未加载

jzdziarskiabout 3 years ago

I don’t even understand what we’re comparing here. Is it lines of code? In that case, you must include all of the code from tr, sort, and any other shell commands used to perform the task at hand. By that standard, Knuth still wins by a long shot. Were this an actual coding contest (if such things exist anymore), one does not simply argue that instead of coding the solution in the given language, I’m going to just use five other people’s solutions and claim my trophy. Had the argument been who could most efficiently reuse all of the resources of an operating system to produce the laziest solution that will likely produce the highest number of obscure edge cases in the future, create unexpected dependencies, and likely scale the worst, then perhaps the shell script wins. But I think Knuth was trying to demonstrate quite the opposite - good coding praxis that can be maintainable, debuggable, and made to scale. I guess what I’m saying is that it’s difficult to see any comparison here, whatsoever, since the purposes were so markedly different. The shell script is not a solution any professional would ship in a product, and would only appeal to lazy one off tasks. Who would have been daft enough firstly, to not read Knuth’s paper, but secondly to think there is anything worth comparing in the first place? The script might be the quickest solution. Knuth’s code is the proper solution.

评论 #31303230 未加载

dalyabout 3 years ago

If you didn't know what McIlroy's program did would you be able to figure out what it was supposed to do?I'd like to see the TeX program written in a shell script.

评论 #31303563 未加载

评论 #31303486 未加载

amirathiabout 3 years ago

Jupyter Notebook is THE modern literate programming environment.It's a shame that it's only primarily used for Data Science experimentation & teaching.

评论 #31302264 未加载

评论 #31302197 未加载

评论 #31305190 未加载

评论 #31303173 未加载

thread_idabout 3 years ago

It occurs to me that part of this debate could be expressed in more modern terms as Declarative vs Imperitive programming.In firstclass languages a complex problem can be solved by chaining together functions and can be written in a single statement (this depends on the implmentation of functions and types that are returned). Somtime 2 or three statements.Versus using core language statements and packages to implement conditional logic, control structures, and operations. Relying less on packages that extend the launguage. Having full control over the implementation.Personally I prefer declarative and wrapping it with verbose inline documentation.One downside to this is maintaining the build environment. Packages are versioned and fucntions change overtime (deprecated in favor of new implmentations that replace the old function). Somtimes Declaritive implementations must be refactored to support newer version of packages.Impertive implementations tend to be more durable and have fewer dependencies.

lynguistabout 3 years ago

LP is in essence live coding, what we have now with Twitch and YouTube.And it has its merits: you see how a problem is tackled.I believe this should be obvious.

评论 #31303813 未加载

评论 #31305662 未加载

kkfxabout 3 years ago

The issue was simply a bad one, because we do not use computers as sport cars in a race, at least we shouldn't, we use as daily driver. As a daily driver who constantly change and adapt to changed needs and desire.So Unix classic McIlroy's is good for quick run, daily driver who can't really evolve much, at least not at a little price. Knuth on it's side equally felled in the same trap on the opposite side of the spectrum.The real outcome is that Unix model is wrong, and it's a well-known things behind the Unix Hater's Handbook simply when unix choose to through it's principles in a bin making GUIs from the first CDE and beyond where no small programs nor composability via efficient IPCs is there. Sole IPCs available cut/copy/paste. The right choice was done before: with Smalltalk systems at Xerox, with Lisp-based systems after them: which means a moderately literate and discoverable environment where anything can be easy integrate in code so where shell-scripting is actually the same of system programming and the literate part, based on literate code, is just literate composition not much different then the classic human notes compositions from Mundaneum to ZettelKasten. That's is. Unfortunately since NOBODY want to admit mistakes especially if they was made in the past and imply large areas of development nearly no one want to talk about those terms...

评论 #31303007 未加载

Animatsabout 3 years ago

"He has fashioned a sort of industrial-strength Faberge egg—intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start."That's what computer science in the 1970s and the 1980s was all about. Clever algorithms. Especially at MIT. Read the classic HAKMEM from 1972.[1][1] <a href="https://en.wikipedia.org/wiki/HAKMEM" rel="nofollow">https://en.wikipedia.org/wiki/HAKMEM</a>

评论 #31308943 未加载

richard_toddabout 3 years ago

Sometimes I do "semi-literate" programming where I don't even bother with the commentary. I just use a tangler so I can write the code in the order I want to see it. I really think writing in "human-order" and tangling into "compiler-order" is the bigger benefit to the methodology. It stops you from (for example) writing a function just to break up the logic into manageable parts.

评论 #31306218 未加载

henningabout 3 years ago

McIlroy would have killed it here on Hacker News if it existed back then. "Show HN: 6-line shell script beats 8-page Don Knuth program"

评论 #31302118 未加载

评论 #31303771 未加载

victor9000about 3 years ago

LP sounds like a Jupyter Notebook

评论 #31302161 未加载

评论 #31302181 未加载

评论 #31302213 未加载

评论 #31302171 未加载

ChristopherDrumabout 3 years ago

This seems like as good a thread as any to point out that Inform, one of the largest literate programs to date, was recently published on GitHub. <a href="https://github.com/ganelson/inform" rel="nofollow">https://github.com/ganelson/inform</a>

jvandonselabout 3 years ago

What’s the runtime complexity of Knuth’s solution vs the script solution?

olliejabout 3 years ago

I always disliked these comparisons to "shell" scripting that invariably end up comparing a collection of C (or whatever) tools and claiming to have solved the problem in steel script.It's nonsense - the various shell scripting languages are more than capable of implementing those various tasks without simply shelling out to C, and so should be required to. Otherwise a C programmer has access to system() and start() and those support pipes even.

smclabout 3 years ago

Yeah the comparison between the shell script and the Pascal/LP version is a little bit apples/oranges. Maybe a nice counter to that would have been to make an LP version of the script itself. Here's a quickly hacked together mashup of the Knuth/McIlroy solutions:Given some text input via stdin, we want to find the word that appears most often. The main body of the shell script is as follows - each section uses plain shell utilities and has been developed and tested with (blah blah, maybe mention the version of the utils we used in case GNU versions work but BSD don't or something)<pre><code> <<makeLines>> | <<convertToLowerCase>> | <<sortAndCountLines>> | <<sortLinesByFrequency>> | <<takeFirst>> </code></pre> The first job is to isolate "words", defined here as groups of lower case latin alphabet characters (sorry to our international friends out there!) using `tr`. Importantly anything hyphenated will be treated as a separate words ("short-term" will be "short" and "term", for example), so modify the pattern if this isn't what you need.<pre><code> <<makeLines>> = tr -cs A-Za-z '\n' </code></pre> Next we'll make everything in the input lower-case:<pre><code> <<convertToLowerCase>> = tr A-Z a-z </code></pre> The next step is to sort the lines so exact words are next to each other, then pipe that into `uniq -c` to get counts for each adjacent line. This is a fairly common pattern so I've bunched these guys together.<pre><code> <<sortAlphabeticallyAndCount>> = sort | uniq -c </code></pre> Since we're interested in the most frequent we're sorting descending (-r aka --reverse. IMPORTANT: not -R aka --random-sort) and numerically (-n)<pre><code> <<sortNumerically>> = sort -rn </code></pre> And finally we can take the first line and print it. NOTE: I didn't use `head -n 1` here because blah stupid reason whatever<pre><code> <<takeFirst>> = sed ${1}q </code></pre> (fin)This is the first time I'd ever written anything approaching LP (I just copied the style in the article) so I just quickly dashed it out with placeholder comments, though I'd maybe include a worked example of the program in action too showing the output at each step. Now obviously it's longer than the little shell snippet that was posted, but even though it is a relatively tiny problem to apply LP to, you can see that there is still some value to it - I've been able to make it clear that only the latin alphabet is considered, I've highlighted a potential oopsie in case someone inexperienced in tries to re-use it (-r/-R), I was able to state that there's an alternative approach to the final step and give a reason why I chose my way.

评论 #31307368 未加载

gnufxabout 3 years ago

I haven't gone back and read the original after all these years to check the specification, but it's perhaps worth noting that at least the McIlroy solution is only valid in the C locale. At least these days you can use character classes. In my experience locales regularly bite people -- even en_GB ones.

pplonski86about 3 years ago

The literate programming is very under rated in Jupyter Notebook. It is mainly used for experiments. I hope that in the future there will be more applications in notebook for example in automation. Can you imagine notebooks replacing services like zapier?

评论 #31302458 未加载

pkruminsabout 3 years ago

I illustrated this epic story: <a href="https://comic.browserling.com/knuth-vs-mcilroy.png" rel="nofollow">https://comic.browserling.com/knuth-vs-mcilroy.png</a>

riksucksabout 3 years ago

I didn't even know such a thing like literate programming existed. I wonder if IPython notebooks count as literate programming?

评论 #31303632 未加载