TechEcho

9 comments

I used Lex and Yacc (as the article mentions, the direct predecessors of the Flex and Bison it talks about) a bit a long time ago, but more recently when I've written parsers I've just used recursive descent - but I was parsing a well-defined grammar. It did leave me with the knowledge of how to describe grammars in Yacc though, which I have found useful.Guy Steele said something about a use of Yacc for language design (rather than implementation) that stuck with me :Be sure that your language will parse. It seems stupid to sit down and start designing constructs and not worry how they will fit together. You can get a language that's difficult if not impossible to parse, not only for a computer, but for a person. I use YACC constantly as a check of all my language designs, but I very seldom use YACC in the implementation. I use it as a tester, to be sure that it's LR(1) ... because if a language is LR(1) it's more likely that a person can deal with it.From the Dynamic Languages Wizards series (in 2001), in the panel on language design (1:09:05) [1]I've not yet employed Yacc in this fashion, but it did give me a tool for thinking about object models. A while ago when I was puzzling over how some classes in an entity relationship diagram should be related, and I considered it from the point of view of how would I design a grammar for serializing an instance of the model into text. This essentially made my decision for me in a principled way, though I never reached the point of writing up a grammar for the whole model, just considered the implications for the local bit that was troubling me.[1] <a href="https://youtu.be/agw-wlHGi0E?si=n-ann0TYjvZ45ie5&t=4145" rel="nofollow noreferrer">https://youtu.be/agw-wlHGi0E?si=n-ann0TYjvZ45ie5&t=4145</a>edit: added a few clarifying notes

评论 #37316213 未加载

beeforporkover 1 year ago

OK, the basics. But do not stop reading here if you want to write a parser. There are more modern tools to look at (e.g., antlr).Warning 1: parsing Unicode streams well is awkward with flex -- it's from an age where ASCII ruled. But handling multiple input incodings may get weird. If it is only UTF-8, maybe it works, because that's essentially bytes. But I find a hand-written scanner more convenient (the grammar is seldom too complex for that). But regexps based on General_Category or ID_Start etc.? Difficult...Warning 2: for various reasons, usually flexibility, conflict resolving, error reporting, and/or error recovery, many projects move from bison to something else, even a handwritten recursive descent parser. It's longer, but not that difficult.

评论 #37316366 未加载

HaoZekeover 1 year ago

I have to say though, most compilers courses I've seen have an inordinate emphasis on parsing and little else. Still a great post.

评论 #37305920 未加载

Verdexover 1 year ago

If lex/yacc style parsing works for you, then great. However, I suspect most people are going to get more mileage out of just hand writing a recursive descent parser and moving on with their lives.The benefit of recursive descent is that they're easy to write and modify and understand. You don't need any new paradigms, just write code like you typically do. If something goes wrong, your standard debugging skills will serve you well.There's also a lot of other relatively easy parsing technologies out there. For example, you can also consider monadic parsing, parser combinators, PEG libraries.I spent a year trying to figure out which parser technique worked best for me, and I'm glad I didn't just stick with my starting point of lex/yacc. So again, if this guide allows parsing to just work for you, then great stick with it. But if you find yourself encountering a lot of problems, then it might be worth it to look around because other options exist and work just fine.

评论 #37308546 未加载

davidhsover 1 year ago

After taking a compiler course in uni I found the emphasis on dealing with syntax mostly a waste of time. To begin with, do yourself a favor and use S-expression syntax (like Lisp) for your language. They're dead simple to parse. With the syntax out of the way, you can get to meat and potatoes of implementing a language. Later on you can always define a "look" for your language, and you can spend an inordinate amount of time on that.

评论 #37308087 未加载

评论 #37308480 未加载

firtozover 1 year ago

Kind of related, for anyone curious with parsing and JS: I have to recommend peggy for writing simple parsers for files to be consumed by JavaScript. Pretty niche, but does it so well. I developed a few packages using it so far.

ladbergover 1 year ago

Also see: <a href="https://langcc.io/" rel="nofollow noreferrer">https://langcc.io/</a>

frabertover 1 year ago

re2c is a better alternative to flex imo. Also lemon (from the sqlite project) in place of bison/yacc.

arnonover 1 year ago

I learned a lot from the first few pages but it really escalated very quickly at some point

9 comments

EdwardCoffinover 1 year ago

评论 #37316213 未加载

beeforporkover 1 year ago

评论 #37316366 未加载

HaoZekeover 1 year ago

I have to say though, most compilers courses I've seen have an inordinate emphasis on parsing and little else. Still a great post.

评论 #37305920 未加载

Verdexover 1 year ago

评论 #37308546 未加载

davidhsover 1 year ago

评论 #37308087 未加载

评论 #37308480 未加载

firtozover 1 year ago

ladbergover 1 year ago

Also see: <a href="https://langcc.io/" rel="nofollow noreferrer">https://langcc.io/</a>

frabertover 1 year ago

re2c is a better alternative to flex imo. Also lemon (from the sqlite project) in place of bison/yacc.

arnonover 1 year ago

I learned a lot from the first few pages but it really escalated very quickly at some point

Practical parsing with Flex and Bison (2021)

9 comments

Practical parsing with Flex and Bison (2021)

9 comments