TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parsing: a timeline

73 pointsby kencauseyover 10 years ago

10 comments

samstokesover 10 years ago
I get that this timeline is designed to promote the author's own parsing algorithm, and doesn't claim to be exhaustive, but I'd be curious to know how PEGs / packrat parsers and Hutton/Meijer's work on monadic parser combinators fit into this - both chronologically, and in terms of tradeoffs.
评论 #8291723 未加载
评论 #8291813 未加载
marktangotangoover 10 years ago
Parser generators are an interesting academic exercise, but in practice it seems most language implementers have concluded that hand coding a recursive descent parser is the way to go: GCC being a prime example as the author mentions. I&#x27;d be interested to know if Coverity is still using McPeeks elsa&#x2F;elkhound glr parser generator from years ago.<p>As antlr creator Terence Parr says &quot;Why Program by Hand in Five Days what You Can Spend Five Years of Your Life Automating?&quot; It really is almost trivial to implement recursive descent by hand.
评论 #8291635 未加载
评论 #8293390 未加载
评论 #8292886 未加载
评论 #8291802 未加载
Animatsover 10 years ago
Failing to mention yacc or bison is a bit much.<p>Some languages (notably Pascal, and now, I think, Go) are designed for LL parsing without lookahead. C and its descendants require lookahead.<p>The error reporting problem for syntax-directed parsers comes mostly from the difficulty of recovering from errors in batch compilations. If you simply stop at the first error, reasonable error reporting is possible. Getting back on track is a heuristic problem dependent on the kinds of errors users make. There have been syntax-directed systems with error clauses in the syntax definition to hint how to get back on track, but that never caught on.
评论 #8293236 未加载
master_latchover 10 years ago
It doesn&#x27;t mention lex or yacc at all. I think those are noteworthy. I think it&#x27;s interesting that lex was written by Eric Schmidt.
评论 #8291643 未加载
tompover 10 years ago
Personally, I really like specifying grammars in LALR (or, more specifically, the Menhir parser generator, which consumer LR(1) with some enhancements); LALR makes me feel secure that my grammars are unambiguous. The errors are a problem, yes (I haven&#x27;t quite figured them out yet), but I imagine they would be in most types of parsers - the problematic part is figuring out the places where errors can appear and where you can give a sensible error message!
评论 #8293252 未加载
sjolsenover 10 years ago
An algorithm not listed is that using &quot;derivatives:&quot; <a href="http://matt.might.net/articles/parsing-with-derivatives/" rel="nofollow">http:&#x2F;&#x2F;matt.might.net&#x2F;articles&#x2F;parsing-with-derivatives&#x2F;</a>. I don&#x27;t know if it can be implemented efficiently enough to displace simpler (less powerful) parsing algorithms, but it&#x27;s certainly interesting, especially if you like inductive algorithms.
habermanover 10 years ago
Marpa looks interesting. But I am not sure about this claim in the paper:<p>&gt; Despite the promise of general context-free parsing, and the strong academic literature behind it, it has never been incorporated into a highly available tool like those that exist for LALR[6] or regular expressions.<p>I think this leaves out several such tools (and there are probably more):<p><pre><code> 1. Bison (supports GLR since at least 2009) 2. Elkhound (GLR parser generator, since 2002) 3. ANTLR (its top-down ALL(*) algorithm is general) </code></pre> But let’s step back a second. Is generalized context-free parsing really the holy grail that some people think it is?<p>For non-parser-geeks, “generalized” means “can handle all grammars.” That sure <i>seems</i> like a feature, especially for people suffering PTSD from unhelpful Bison error messages like “shift-reduce conflict.” The idea of never having to see a message like that again can sure make generalized parsing seem pretty damn attractive (btw: this is the same selling point for PEGs).<p>But the dark side of generalized parsing is ambiguity. It is undecidable whether a given grammar is ambiguous or not.<p>Here’s what this means, in practical terms. Generalized tools might save you from “shift-reduce conflict,” but they cannot save you from “this grammar might be ambiguous, and it’s impossible to say.”<p>So what? Well this means ambiguities can be hiding in your grammar that you don’t know about. Ambiguities mean that certain syntactical constructs could have two possible meanings, and which one the parser chooses is totally arbitrary.<p>The best real-life example of this is the “dangling else” ambiguity: <a href="http://en.wikipedia.org/wiki/Dangling_else" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dangling_else</a> Everybody knows about it now, but when it was originally introduced into ALGOL 60, it went totally unnoticed. The language had even been published in a technical report before anyone was aware that the ambiguity existed.<p>Now I agree that “shift-reduce conflict” sucks, but to me “parser tools should accept any grammar” is an overreaction. That’s like saying “syntax errors in JavaScript suck, we should make the parser accept anything and try to do something reasonable.” If that idea gives you the heebie jeebies, you’ll know how I feel about generalized parsing.<p>To me the answer isn’t generalized parsing, it’s a parsing formalism and tool that, when it gives you an error, gives you enough information to know <i>exactly</i> what the issue is. The tool can be your helper as you develop your grammar&#x2F;language, helping you understand whether your language is ambiguous or not and how to fix your ambiguities. When it accepts your grammar, you can have confidence that your language and grammar are unambiguous.<p>Now at least generalized CFG algorithms (like GLR and Marpa) can tell you at runtime that the input is ambiguous. PEGs can’t even do that: they just define the ambiguity away by saying “in cases of ambiguity, the first alternative wins by definition.” Sure it makes the <i>formalism</i> unambiguous, but the language as your users experience it is still just as ambiguous.<p>I wrote about this all in more detail here: <a href="http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-tools.html" rel="nofollow">http:&#x2F;&#x2F;blog.reverberate.org&#x2F;2013&#x2F;09&#x2F;ll-and-lr-in-context-why...</a>
评论 #8292899 未加载
评论 #8292629 未加载
pshcover 10 years ago
2010: People realize the folly of buffering input until the user hits a button and trying to parse it after the fact. They start integrating tokenizers and auto-complete right into the input method itself.
wtetznerover 10 years ago
I wonder how Earley compares to GLL.
kencauseyover 10 years ago
Note that the author has requested comments be posted in the Marpa Google Group at <a href="https://groups.google.com/d/msg/marpa-parser/5p0IgFqjkqg/cfzx3WkiD8YJ" rel="nofollow">https:&#x2F;&#x2F;groups.google.com&#x2F;d&#x2F;msg&#x2F;marpa-parser&#x2F;5p0IgFqjkqg&#x2F;cfz...</a>