I last played around with ANTLR in 2012 (when it was version 3) and I discovered that there's a "bigger picture" to the parser generator universe that most tutorials don't talk about:<p>1) ANTLR is a good tool for generating "happy path" parsers. With a grammar specification, it easily generates a parser that accepts or rejects a piece of source code. However, it's not easy to use the hooks to generate high quality diagnostic error messages.<p>2) ANTLR was not good for speculative parsing or probabilistic parsing which would be the basis of today's generation of tools such as "Intellisense" not giving up on parsing when there's an unclosed brace or missing variable declaration.<p>The common theme to the 2 bullet points above is that a high quality compiler written by hand will hold multiple "states" of information and an ANTLR grammar file doesn't really have an obvious way to express that knowledge. A pathological example would be the numerous "broken HTML" pages being successfully parsed by browsers. It would be very hard to replicate how Chrome/Firefox/Safari/IE doesn't choke on broken HTML by using ANTLR to generate an HTML parser.<p>In short, ANTLR is great for <i>prototyping</i> a parser but any industrial-grade parser released into the wild with programmers' expectations of helpful error messages would require a hand-written parser.<p>Lastly, the lexing (creating the tokens) and parsing (creating the AST) is a <i>very tiny percentage</i> of the total development of a quality compiler. Therefore, ANTLR doesn't save as much time as one might think.<p>I welcome any comments about v4 that makes those findings obsolete.
I switched over to ANTLR 4. It is strictly superior to ANTLR 3. The listener approach rather than embedding code in the grammar is very natural. Separating leads to clean grammars and clean action code. Odd thing is that I was stuck on 3 because 4 didn't support C yet and then I just switched to the Java target in an anti-C pique. Shoulda done that awhile ago.<p>TParr's <i>The Definitive ANTLR 4 Reference</i> is quite good. And so's this mega tutorial.<p><a href="https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference" rel="nofollow">https://pragprog.com/book/tpantlr2/the-definitive-antlr-4-re...</a><p>ANTLR is my goto tool for DSLs.
I think everyone should manually implement a simple recursive descent parser at least once in their careers. It's surprisingly easy, and really (in my experience) helps to break through the mental barrier of parsers being magical black boxes.<p>Plus, once you have an understanding of recursive descent parsing, it's a relatively small leap to recursive descent code generation. And once you're there, you have a pretty good high-level understanding of the entire compilation pipeline (minus optimization).<p>Then all of a sudden, compilers are a whole lot less impenetrable.
Just a note on "Why not to use regular expression". Because it's <i>impossible</i> depending on the language complexity<p>REs are level 3 <a href="https://en.wikipedia.org/wiki/Chomsky_hierarchy" rel="nofollow">https://en.wikipedia.org/wiki/Chomsky_hierarchy</a>
I used ANTLR to write a fuzz testing tool which parses an ABNF grammar (like in an IETF RFC) and then generates random output which matches the grammar. Worked great!<p><a href="https://github.com/nradov/abnffuzzer" rel="nofollow">https://github.com/nradov/abnffuzzer</a>
Great tutorial, ANTLR is one of the best tools for prototyping languages and compilers.<p>I wasn't aware it supports JavaScript nowadays.<p>In any case, good selection of languages.
For .Net projects, I've used Irony. From the CodePlex site:<p>"Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs. "
I used Antlr v3 to create a NLP parser for calendar events for iOS and Android. It took longer than expected. iOS + C runtime was opaque, so had to write a tool for debugging. Android + Java runtime overran memory, so had to break into separate grammars. Of course, NLP is not a natural fit. Don't know what problems are fixed by v4.<p>> The most obvious is the lack of recursion: you can’t find a (regular) expression inside another one ...<p>PCRE has some recursion. Here is an example for parsing anything between { }, with counting of inner brackets:<p>'(?>\{(?:[^{}]<i>|(?R))</i>\})|\w+'<p>A C++11 constexpr can make hand coded parsers a lot more readible, allowing token names in case statements. For example , search on "str2int" in the following island parser: <a href="https://github.com/musesum/par" rel="nofollow">https://github.com/musesum/par</a>
These types of tutorials always start out explaining the problems with Regular Expressions and why not to use them... then immediately proceeding into lexing via regular expressions.<p>Perhaps the tutorials should start with the strengths of regular expressions, and how we can harness that for getting started with a lexer.
What are the advantages of ANTLR over something like Haskells Parsec?<p><a href="https://github.com/aslatter/parsec" rel="nofollow">https://github.com/aslatter/parsec</a>
This tutorial looks great. I picked up Antlr4 a few months ago, and hadn't done any parsing before then. The first week was basically me, The Definitive Antlr4 Reference, and extreme confusion with how different targets worked. Compounding the problem was the fact that a lot of the antlr4 example grammars only work for a specific target. The use of different language implementations as part of this tutorial seems really useful!<p>(Antlr4 is awesome :)
I used ANTLR to write a Python parser for the SNOMED expression language last year, and testing it was one of the weirder parts of the experience. I was up and running in a few days, which was largely thanks to the ANTLR book. I love this project. It made doing what I did a lot more fun than I thought it would be. Hand-rolling an ABNF parser from scratch would be a nice hobby project, but not when one has a deadline.