TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Practical parsing with Flex and Bison (2021)

69 pointsby gnubisonover 1 year ago

9 comments

EdwardCoffinover 1 year ago
I used Lex and Yacc (as the article mentions, the direct predecessors of the Flex and Bison it talks about) a bit a long time ago, but more recently when I&#x27;ve written parsers I&#x27;ve just used recursive descent - but I was parsing a well-defined grammar. It did leave me with the knowledge of how to describe grammars in Yacc though, which I have found useful.<p>Guy Steele said something about a use of Yacc for language <i>design</i> (rather than implementation) that stuck with me :<p><i>Be sure that your language will parse. It seems stupid to sit down and start designing constructs and not worry how they will fit together. You can get a language that&#x27;s difficult if not impossible to parse, not only for a computer, but for a person. I use YACC constantly as a check of all my language designs, but I very seldom use YACC in the implementation. I use it as a tester, to be sure that it&#x27;s LR(1) ... because if a language is LR(1) it&#x27;s more likely that a person can deal with it.</i><p>From the Dynamic Languages Wizards series (in 2001), in the panel on language design (1:09:05) [1]<p>I&#x27;ve not yet employed Yacc in this fashion, but it did give me a tool for thinking about object models. A while ago when I was puzzling over how some classes in an entity relationship diagram should be related, and I considered it from the point of view of how would I design a grammar for serializing an instance of the model into text. This essentially made my decision for me in a principled way, though I never reached the point of writing up a grammar for the whole model, just considered the implications for the local bit that was troubling me.<p>[1] <a href="https:&#x2F;&#x2F;youtu.be&#x2F;agw-wlHGi0E?si=n-ann0TYjvZ45ie5&amp;t=4145" rel="nofollow noreferrer">https:&#x2F;&#x2F;youtu.be&#x2F;agw-wlHGi0E?si=n-ann0TYjvZ45ie5&amp;t=4145</a><p>edit: added a few clarifying notes
评论 #37316213 未加载
beeforporkover 1 year ago
OK, the basics. But do not stop reading here if you want to write a parser. There are more modern tools to look at (e.g., antlr).<p>Warning 1: parsing Unicode streams well is awkward with flex -- it&#x27;s from an age where ASCII ruled. But handling multiple input incodings may get weird. If it is only UTF-8, maybe it works, because that&#x27;s essentially bytes. But I find a hand-written scanner more convenient (the grammar is seldom too complex for that). But regexps based on General_Category or ID_Start etc.? Difficult...<p>Warning 2: for various reasons, usually flexibility, conflict resolving, error reporting, and&#x2F;or error recovery, many projects move from bison to something else, even a handwritten recursive descent parser. It&#x27;s longer, but not that difficult.
评论 #37316366 未加载
HaoZekeover 1 year ago
I have to say though, most compilers courses I&#x27;ve seen have an inordinate emphasis on parsing and little else. Still a great post.
评论 #37305920 未加载
Verdexover 1 year ago
If lex&#x2F;yacc style parsing works for you, then great. However, I suspect most people are going to get more mileage out of just hand writing a recursive descent parser and moving on with their lives.<p>The benefit of recursive descent is that they&#x27;re easy to write and modify and understand. You don&#x27;t need any new paradigms, just write code like you typically do. If something goes wrong, your standard debugging skills will serve you well.<p>There&#x27;s also a lot of other relatively easy parsing technologies out there. For example, you can also consider monadic parsing, parser combinators, PEG libraries.<p>I spent a year trying to figure out which parser technique worked best for me, and I&#x27;m glad I didn&#x27;t just stick with my starting point of lex&#x2F;yacc. So again, if this guide allows parsing to just work for you, then great stick with it. But if you find yourself encountering a lot of problems, then it might be worth it to look around because other options exist and work just fine.
评论 #37308546 未加载
davidhsover 1 year ago
After taking a compiler course in uni I found the emphasis on dealing with syntax mostly a waste of time. To begin with, do yourself a favor and use S-expression syntax (like Lisp) for your language. They&#x27;re dead simple to parse. With the syntax out of the way, you can get to meat and potatoes of implementing a language. Later on you can always define a &quot;look&quot; for your language, and you can spend an inordinate amount of time on that.
评论 #37308087 未加载
评论 #37308480 未加载
firtozover 1 year ago
Kind of related, for anyone curious with parsing and JS: I have to recommend peggy for writing simple parsers for files to be consumed by JavaScript. Pretty niche, but does it so well. I developed a few packages using it so far.
ladbergover 1 year ago
Also see: <a href="https:&#x2F;&#x2F;langcc.io&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;langcc.io&#x2F;</a>
frabertover 1 year ago
re2c is a better alternative to flex imo. Also lemon (from the sqlite project) in place of bison&#x2F;yacc.
arnonover 1 year ago
I learned a lot from the first few pages but it really escalated very quickly at some point