ANTLR Mega Tutorial

253 pointsby ftomassettiabout 8 years ago

12 comments

jasodeabout 8 years ago

I last played around with ANTLR in 2012 (when it was version 3) and I discovered that there's a "bigger picture" to the parser generator universe that most tutorials don't talk about:1) ANTLR is a good tool for generating "happy path" parsers. With a grammar specification, it easily generates a parser that accepts or rejects a piece of source code. However, it's not easy to use the hooks to generate high quality diagnostic error messages.2) ANTLR was not good for speculative parsing or probabilistic parsing which would be the basis of today's generation of tools such as "Intellisense" not giving up on parsing when there's an unclosed brace or missing variable declaration.The common theme to the 2 bullet points above is that a high quality compiler written by hand will hold multiple "states" of information and an ANTLR grammar file doesn't really have an obvious way to express that knowledge. A pathological example would be the numerous "broken HTML" pages being successfully parsed by browsers. It would be very hard to replicate how Chrome/Firefox/Safari/IE doesn't choke on broken HTML by using ANTLR to generate an HTML parser.In short, ANTLR is great for prototyping a parser but any industrial-grade parser released into the wild with programmers' expectations of helpful error messages would require a hand-written parser.Lastly, the lexing (creating the tokens) and parsing (creating the AST) is a very tiny percentage of the total development of a quality compiler. Therefore, ANTLR doesn't save as much time as one might think.I welcome any comments about v4 that makes those findings obsolete.

评论 #13821150 未加载

评论 #13821994 未加载

评论 #13823055 未加载

评论 #13820867 未加载

评论 #13820996 未加载

评论 #13820803 未加载

评论 #13822113 未加载

评论 #13820802 未加载

CalChrisabout 8 years ago

I switched over to ANTLR 4. It is strictly superior to ANTLR 3. The listener approach rather than embedding code in the grammar is very natural. Separating leads to clean grammars and clean action code. Odd thing is that I was stuck on 3 because 4 didn't support C yet and then I just switched to the Java target in an anti-C pique. Shoulda done that awhile ago.TParr's The Definitive ANTLR 4 Reference is quite good. And so's this mega tutorial.<a href="https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference" rel="nofollow">https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-re...</a>ANTLR is my goto tool for DSLs.

评论 #13820246 未加载

评论 #13820251 未加载

ttdabout 8 years ago

I think everyone should manually implement a simple recursive descent parser at least once in their careers. It's surprisingly easy, and really (in my experience) helps to break through the mental barrier of parsers being magical black boxes.Plus, once you have an understanding of recursive descent parsing, it's a relatively small leap to recursive descent code generation. And once you're there, you have a pretty good high-level understanding of the entire compilation pipeline (minus optimization).Then all of a sudden, compilers are a whole lot less impenetrable.

评论 #13822302 未加载

raverbashingabout 8 years ago

Just a note on "Why not to use regular expression". Because it's impossible depending on the language complexityREs are level 3 <a href="https://en.wikipedia.org/wiki/Chomsky_hierarchy" rel="nofollow">https://en.wikipedia.org/wiki/Chomsky_hierarchy</a>

评论 #13820389 未加载

评论 #13820780 未加载

评论 #13820570 未加载

评论 #13820258 未加载

nradovabout 8 years ago

I used ANTLR to write a fuzz testing tool which parses an ABNF grammar (like in an IETF RFC) and then generates random output which matches the grammar. Worked great!<a href="https://github.com/nradov/abnffuzzer" rel="nofollow">https://github.com/nradov/abnffuzzer</a>

评论 #13823896 未加载

pjmlpabout 8 years ago

Great tutorial, ANTLR is one of the best tools for prototyping languages and compilers.I wasn't aware it supports JavaScript nowadays.In any case, good selection of languages.

评论 #13820255 未加载

intrasightabout 8 years ago

For .Net projects, I've used Irony. From the CodePlex site:"Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. "

评论 #13821945 未加载

评论 #13821084 未加载

musesumabout 8 years ago

I used Antlr v3 to create a NLP parser for calendar events for iOS and Android. It took longer than expected. iOS + C runtime was opaque, so had to write a tool for debugging. Android + Java runtime overran memory, so had to break into separate grammars. Of course, NLP is not a natural fit. Don't know what problems are fixed by v4.> The most obvious is the lack of recursion: you can’t find a (regular) expression inside another one ...PCRE has some recursion. Here is an example for parsing anything between { }, with counting of inner brackets:'(?>\{(?:[^{}]|(?R))\})|\w+'A C++11 constexpr can make hand coded parsers a lot more readible, allowing token names in case statements. For example , search on "str2int" in the following island parser: <a href="https://github.com/musesum/par" rel="nofollow">https://github.com/musesum/par</a>

betenoireabout 8 years ago

These types of tutorials always start out explaining the problems with Regular Expressions and why not to use them... then immediately proceeding into lexing via regular expressions.Perhaps the tutorials should start with the strengths of regular expressions, and how we can harness that for getting started with a lexer.

destructaballabout 8 years ago

What are the advantages of ANTLR over something like Haskells Parsec?<a href="https://github.com/aslatter/parsec" rel="nofollow">https://github.com/aslatter/parsec</a>

评论 #13821793 未加载

评论 #13820761 未加载

closedabout 8 years ago

This tutorial looks great. I picked up Antlr4 a few months ago, and hadn't done any parsing before then. The first week was basically me, The Definitive Antlr4 Reference, and extreme confusion with how different targets worked. Compounding the problem was the fact that a lot of the antlr4 example grammars only work for a specific target. The use of different language implementations as part of this tutorial seems really useful!(Antlr4 is awesome :)

poppingtonicabout 8 years ago

I used ANTLR to write a Python parser for the SNOMED expression language last year, and testing it was one of the weirder parts of the experience. I was up and running in a few days, which was largely thanks to the ANTLR book. I love this project. It made doing what I did a lot more fun than I thought it would be. Hand-rolling an ABNF parser from scratch would be a nice hobby project, but not when one has a deadline.