Brian Kernighan sent Gawk maintainer Arnold Robbins an email linking to this blog post with the comment "Hindsight has a lot of benefits, it would appear."<p>Peter Weinberger (quoted with permission) responded:<p>> That's interesting, Here's some thoughts/recollections. (remember that human memory is fallible.)<p>> 1. Using whitespace for string concatenation, in retrospect, was probably not the ideal choice (but '+' would not have worked).<p>> 2. Syntax choices were in part driven by the desire for our local C programmers to find it familiar.<p>> 3. As creatures of a specific time and place awk shared with C the (then endearing, now irritating) property of being underspecified.<p>> I think that collectively we understood YACC reasonably well. We tortured the grammar until the parser came close to doing what we wanted, and then we stopped. The tools then were more primitive, but they did
fit in 64K of memory.<p>Al Aho also replied (quoted with permission):<p>> Peter's observation about torturing the grammar is apt!
As awk grew in its early years, the grammar evolved with it
and I remember agonizing to make changes to the grammar
to keep it under control (understanding and minimizing the
number of yacc-generated parsing-action conflicts) as awk
evolved. I found yacc's ability to point out parsing-action
conflicts very helpful during awk's development. Good
grammar design was very much an art in those days
(maybe even today).<p>It's fun to hear the perspectives of the original AWK creators. I've had some correspondence with Kernighan and Weinberger before, but I think that's the first time I've been on an email thread with all three of A, W, and K.
Awk is something that I think every programmer and especially every sysadmin should learn. 8 like the comparison at the end and have never heard of nnawk or bbawk before.<p>I recently made a dashboard to compare four versions of awk output together, since not all awk scripts I'll run the same on each version: <a href="https://megamansec.github.io/awk-compare/" rel="nofollow">https://megamansec.github.io/awk-compare/</a> I'll have to add those:)
I think this is a good illustration of why parser-generator middleware like yacc is fundamentally misguided; they create <i>totally unnecessary gaps</i> between design intent and the action of the parser. In a hand-rolled recursive descent parser, or even a set of PEG productions, ambiguities and complex lookahead or backtracking leap out at the programmer immediately.
If you think AWK is hard to parse then try C++. The latter is so hard to parse thus very slow compile time that most probably inspired a funny programmer skit like this, one of the most popular XKCDs of all time [1].<p>Then come along fast compilation modern languages like Go and D. The latter is such a fresh air is that even though it's a complex language like C++ and Rust but it managed to compile very fast. Heck it even has RDMD facility that can perform compiled REPL as you interacting with the prompt similar to interpreted programming languages like Python and Matlab.<p>According to its author, the main reason D has very fast compile time (as long as you avoid the CTFE) is because of the language design decisions avoid the notorious symbols that can complicated symbol table just like happened in C++ and the popular << and >> overloading for I/O and shifting. But the fact that Rust come much later than C++ and D but still slow to compile is bewildering to say the least.<p>[1] Compiling:<p><a href="https://xkcd.com/303/" rel="nofollow">https://xkcd.com/303/</a>
If you are parsing awk, you must treat any ream of whitespace that contains a newline as a visible token, which you have to reference in various places in the grammar. Your implementation will likely benefit from a switch, in the lexical analyzer, which sometimes turns off the visible newline.
Another tricky bit is deciding whether "/" is the division operator or the start of a regular expression.<p>IIRC, awk does this in a context sensitive manner, by looking at the previous token.
Reading awk as a human is hard too. And performance of awk is crap. A lot slower than most interpreter language out there. I had replaced all the awk scripts in python and everything is a lot faster.