TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parsing: The Solved Problem That Isn't (2011)

117 pointsby contr-errorover 1 year ago

8 comments

pcfwikover 1 year ago
There was an interesting discussion two years ago regarding nonobvious issues with PEGs:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30414683">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30414683</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30414879">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=30414879</a><p>I spent a year or two working with PEGs, and ran into similar issues multiple times. Adding a new production could totally screw up seemingly unrelated parses that worked fine before.<p>As the author points out, Earley parsing with some disambiguation rules (production precedence, etc.) has been much less finicky&#x2F;annoying to work with. It&#x27;s also reasonably fast for small parses even with a dumb implementation. Would suggest for prototyping&#x2F;settings when runtime ambiguity is not a showstopper, despite the remaining issues described in the article re: having a separate lexer.
评论 #39460246 未加载
kazinatorover 1 year ago
Parsing computer languages is an entirely self-inflicted problem. You can easily design a language so it doesn&#x27;t require any parsing techniques that were not known and practical in 1965, and it will greatly benefit the readability also.
评论 #39460286 未加载
评论 #39462307 未加载
评论 #39463294 未加载
评论 #39458571 未加载
评论 #39458095 未加载
评论 #39462172 未加载
评论 #39458755 未加载
dangover 1 year ago
Related:<p><i>Parsing: The Solved Problem That Isn&#x27;t (2011)</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=8505382">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=8505382</a> - Oct 2014 (70 comments)<p><i>Parsing: the solved problem that isn&#x27;t</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=2327313">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=2327313</a> - March 2011 (47 comments)
jolt42over 1 year ago
After using Instaparse at least it felt like a solved problem: <a href="https:&#x2F;&#x2F;github.com&#x2F;Engelberg&#x2F;instaparse">https:&#x2F;&#x2F;github.com&#x2F;Engelberg&#x2F;instaparse</a>
EdSchoutenover 1 year ago
What I find annoying about using parser generators is that it always feels messy integrating the resulting parser into your application. So you write a file that contains the grammar and generate a parser out of that. Now you build it into your app and call into it to parse some input file, but that ends up giving you some poorly typed AST that is cluttered&#x2F;hard to work with.<p>Certain parser generators make life easier by supporting actions on parser&#x2F;lexer rules. This is great and all, but it has the downside that the grammar you provide is no longer reusable. There&#x27;s no way for others to import that grammar and provide custom actions for them.<p>I don&#x27;t know. In my opinion parsing theory is already solved. Whether it&#x27;s PEG, LL, LR, LALR, whatever. One of those is certainly good enough for the kind of data you&#x27;re trying to parse. I think the biggest annoyance is the tooling.
评论 #39458523 未加载
temp123789246over 1 year ago
Have there been any notable innovations in parsing since this was written?
评论 #39457321 未加载
评论 #39456980 未加载
评论 #39458183 未加载
评论 #39459728 未加载
评论 #39457555 未加载
评论 #39457287 未加载
评论 #39457182 未加载
评论 #39457666 未加载
评论 #39457430 未加载
Nevermarkover 1 year ago
Common example of complications of two grammars being combined: C code and character strings.<p>Double quotes in C code mean begin and end of a string. But strings contain quotes too. And newlines. Etc.<p>So we got the cumbersome invention of escape codes, and so characters strings in source (itself a character string) are not literally the strings they represent.
评论 #39459077 未加载
taericover 1 year ago
My current view of what makes parsing so difficult is that people want to jump straight over a ton of intermediate things from parsing to execution. That is, we often know what we want to happen at the end. And we know what we are given. It is hoped that it is a trivially mechanical problem to go from one to the other.<p>But this ignores all sorts of other steps you can take. Targeting multiple execution environments is an obvious step. Optimization is another. Trivial local optimizations like shifts over multiplications by 2 and fusing operations to take advantage of the machine that is executing it. Less trivial full program optimizations that can propagate constants across source files.<p>And preemptive execution is a huge consideration, of course. Very little code runs in a way that can&#x27;t be interrupted for some other code to run in the meantime. To the point that we don&#x27;t even think of what this implies anymore. Despite accumulators being a very basic execution unit on most every computer. (Though, I think I&#x27;m thankful that reentrancy is the norm nowadays in functions.)
评论 #39457284 未加载