Good introduction to Backus-Naur Form (BNF), Extended BNF (EBNF), Augmented BNF (ABNF, a la RFC 5234) and the typical extensions. Not earth shattering, but a good reference.
If you want to learn to apply grammars and parsers in practice I recommend the antlr parser-generator:<p><a href="http://www.antlr.org/" rel="nofollow">http://www.antlr.org/</a><p>It can not only do the basic text->tree parsing from a file describing the grammar, but will also allow to specify additional grammars for <i>traversing</i> the generated tree and executing arbitrary code in your language of choice as particular nodes are recognized. I built a little compiler in it some years ago, I had a Xpl.g grammar file for parsing the program text and creating the abstract syntax tree, a SemanticAnalysis.g grammar file for doing a first pass through the tree, annotating it with additional information, filling the symbol table, checking semantic correctness and then finally CodeGeneration.g for emitting JVM bytecode using the annotated tree. The code for this is still on Github:<p><a href="https://github.com/jaroslawr/xpl/tree/509120e66e23aac8493414ce5aa7e8079f6ff7ce" rel="nofollow">https://github.com/jaroslawr/xpl/tree/509120e66e23aac8493414...</a><p>There is a simpler example using the same functionalities described on the ANTLR wiki:<p><a href="http://www.antlr.org/wiki/display/ANTLR3/Simple+tree-based+interpeter" rel="nofollow">http://www.antlr.org/wiki/display/ANTLR3/Simple+tree-based+i...</a><p>I used the ANTLR reference book when learning it, but now there is also a real introductory manual:<p><a href="http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference" rel="nofollow">http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-ref...</a>
<a href="http://pragprog.com/book/tpdsl/language-implementation-patterns" rel="nofollow">http://pragprog.com/book/tpdsl/language-implementation-patte...</a><p>I also used the Dragon Book and "Programming Language Pragmatics" for theory, both great books and PLP certainly deserves to be better known.
I was thrown off by Might stating that "a grammar defines a language," which is not nearly as useful or factual as saying that it "describes" a language, the wording that he relies on throughout the rest of the article. That is the difference between me being able to make a dog or identify a dog based on a set of characteristics.<p>Grammars are only one part of understanding a language, hardly the "language of languages". In natural languages, grammars are one subset of linguistics. It would be just as valid to say vocabularies or phonology are the language of languages as it would be to say grammars are.<p>Other than these overly broad arguments and attempts to define natural languages in the same way that formal languages can be defined, this is a nice general introduction to some specific notation techniques for computer languages.<p>Of course, I might not have read it at all if it were titled "An Introduction to Backus-Naur Form, Extendend BNF, and Augmented BNF Notation Techniques".
Cool tutorial and introduction. In my toy languages, I prefer PEGs to CFGs because I believe in ordinal precedence of choice -- this means that your grammar can NOT be ambiguous. I also enjoy that you can skip tokenization.
Why is there no machine-processable language of languages?<p>Like a KR or some system that could read BNF or whatever and translate directly into another level, like operations defined by the operating system or something.<p>Why can't we describe a language declaratively and semantically so that its low level details wouldn't have to be specified manually and so that it could be related to other languages and reasoning could be done about its effects?<p>The lowest levels of the system would probably have to be described as part of the same representation system.
I just worked with a mathematician who understood grammars, and could write one, but could not use the theory in any practical way. Parsing is <i>not</i> just recognition. You need to pass info up the parse tree so that after parsing, you have a new data structure with everything nicely structured ready to <i></i>ACT UPON<i></i>. People write in specific languages to achieve something, not to simply create syntactically correct bodies of text. So understanding grammars is 50% of being able to use them as tools.<p>The other understanding you need to be able to put them to use is having a mental model of how a bottom up parser processes the tokens. You need to be able to insert actions for each grammar rule at appropriate places, which allows the information to flow bottom up too, and be processed appropriately at the same time. I have found this second bit to language processing is actually the harder bit, and its worth rewriting your grammar to make it simpler.
BNFC is a compiler front end generator for labeled bnf grammars that can generate parsers and traversers for various languages. It uses Labaled BNF grammar. I'm not sure what the difference to regular grammars is since I only used this and it was a while ago. Some added expressiveness anyway:<p><a href="http://bnfc.digitalgrammars.com/" rel="nofollow">http://bnfc.digitalgrammars.com/</a>