I get that this timeline is designed to promote the author's own parsing algorithm, and doesn't claim to be exhaustive, but I'd be curious to know how PEGs / packrat parsers and Hutton/Meijer's work on monadic parser combinators fit into this - both chronologically, and in terms of tradeoffs.
Parser generators are an interesting academic exercise, but in practice it seems most language implementers have concluded that hand coding a recursive descent parser is the way to go: GCC being a prime example as the author mentions. I'd be interested to know if Coverity is still using McPeeks elsa/elkhound glr parser generator from years ago.<p>As antlr creator Terence Parr says "Why Program by Hand in Five Days what You Can Spend Five Years of Your Life Automating?" It really is almost trivial to implement recursive descent by hand.
Failing to mention yacc or bison is a bit much.<p>Some languages (notably Pascal, and now, I think, Go) are designed for LL parsing without lookahead. C and its descendants require lookahead.<p>The error reporting problem for syntax-directed parsers comes mostly from the difficulty of recovering from errors in batch compilations. If you simply stop at the first error, reasonable error reporting is possible. Getting back on track is a heuristic problem dependent on the kinds of errors users make. There have been syntax-directed systems with error clauses in the syntax definition to hint how to get back on track, but that never caught on.
Personally, I really like specifying grammars in LALR (or, more specifically, the Menhir parser generator, which consumer LR(1) with some enhancements); LALR makes me feel secure that my grammars are unambiguous. The errors are a problem, yes (I haven't quite figured them out yet), but I imagine they would be in most types of parsers - the problematic part is figuring out the places where errors can appear and where you can give a sensible error message!
An algorithm not listed is that using "derivatives:" <a href="http://matt.might.net/articles/parsing-with-derivatives/" rel="nofollow">http://matt.might.net/articles/parsing-with-derivatives/</a>. I don't know if it can be implemented efficiently enough to displace simpler (less powerful) parsing algorithms, but it's certainly interesting, especially if you like inductive algorithms.
Marpa looks interesting. But I am not sure about this claim in the paper:<p>> Despite the promise of general context-free parsing, and the strong academic literature behind it, it has never been incorporated into a highly available tool like those that exist for LALR[6] or regular expressions.<p>I think this leaves out several such tools (and there are probably more):<p><pre><code> 1. Bison (supports GLR since at least 2009)
2. Elkhound (GLR parser generator, since 2002)
3. ANTLR (its top-down ALL(*) algorithm is general)
</code></pre>
But let’s step back a second. Is generalized context-free parsing really the holy grail that some people think it is?<p>For non-parser-geeks, “generalized” means “can handle all grammars.” That sure <i>seems</i> like a feature, especially for people suffering PTSD from unhelpful Bison error messages like “shift-reduce conflict.” The idea of never having to see a message like that again can sure make generalized parsing seem pretty damn attractive (btw: this is the same selling point for PEGs).<p>But the dark side of generalized parsing is ambiguity. It is undecidable whether a given grammar is ambiguous or not.<p>Here’s what this means, in practical terms. Generalized tools might save you from “shift-reduce conflict,” but they cannot save you from “this grammar might be ambiguous, and it’s impossible to say.”<p>So what? Well this means ambiguities can be hiding in your grammar that you don’t know about. Ambiguities mean that certain syntactical constructs could have two possible meanings, and which one the parser chooses is totally arbitrary.<p>The best real-life example of this is the “dangling else” ambiguity: <a href="http://en.wikipedia.org/wiki/Dangling_else" rel="nofollow">http://en.wikipedia.org/wiki/Dangling_else</a> Everybody knows about it now, but when it was originally introduced into ALGOL 60, it went totally unnoticed. The language had even been published in a technical report before anyone was aware that the ambiguity existed.<p>Now I agree that “shift-reduce conflict” sucks, but to me “parser tools should accept any grammar” is an overreaction. That’s like saying “syntax errors in JavaScript suck, we should make the parser accept anything and try to do something reasonable.” If that idea gives you the heebie jeebies, you’ll know how I feel about generalized parsing.<p>To me the answer isn’t generalized parsing, it’s a parsing formalism and tool that, when it gives you an error, gives you enough information to know <i>exactly</i> what the issue is. The tool can be your helper as you develop your grammar/language, helping you understand whether your language is ambiguous or not and how to fix your ambiguities. When it accepts your grammar, you can have confidence that your language and grammar are unambiguous.<p>Now at least generalized CFG algorithms (like GLR and Marpa) can tell you at runtime that the input is ambiguous. PEGs can’t even do that: they just define the ambiguity away by saying “in cases of ambiguity, the first alternative wins by definition.” Sure it makes the <i>formalism</i> unambiguous, but the language as your users experience it is still just as ambiguous.<p>I wrote about this all in more detail here: <a href="http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-tools.html" rel="nofollow">http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why...</a>
2010: People realize the folly of buffering input until the user hits a button and trying to parse it after the fact. They start integrating tokenizers and auto-complete right into the input method itself.
Note that the author has requested comments be posted in the Marpa Google Group at <a href="https://groups.google.com/d/msg/marpa-parser/5p0IgFqjkqg/cfzx3WkiD8YJ" rel="nofollow">https://groups.google.com/d/msg/marpa-parser/5p0IgFqjkqg/cfz...</a>