(Article author here.)<p>I'm delighted to see this get some attention here.<p>I absolutely love these techniques for parsing, but as my primary research areas is static analysis, I haven't had time to revise this paper and resubmit.<p>As it stands, I may never get the time to do so. :(<p>I posted it on arxiv so that David could reference it for his Ph.D. school apps.<p>Since some of you have asked, here are the reviews:<p><a href="http://matt.might.net/papers/reviews/esop2010-derivatives.txt" rel="nofollow">http://matt.might.net/papers/reviews/esop2010-derivatives.tx...</a><p>I do have an updated implementation that's much cleaner and faster, and I've been planning to do that in a blog post. (Alas, no opportunity yet.)<p>David's also done another implementation in Haskell that's screaming fast and efficient on many restricted classes of grammars (like LL(k)). I'll encourage him to post that as well.<p>If you're interested in getting your name on a scientific publication and helping this get the attention of the scientific community, you can help us by creating an implementation of either technique in your favorite language and beating on it to help find the inefficiencies.<p>(For instance, the original Scala version linked from this paper has memory leaks from the way it caches derivatives. We got around them by rolling top-level repetition by hand.)<p>Please email me if that's something you're interested in doing.<p>Can HN do science? I'd love to find out.
Here's what a grammar actually looks like in their scala version. This grammar is for arithmetic over the language where x represents 1, and s represents +. This only generates the parse tree, not the final answer. Comments are added by me:<p><pre><code> // The Nodes in the parse tree
abstract class Exp
case object One extends Exp
case class Sum(e1 : Exp, e2 : Exp) extends Exp
// Terminals
lazy val S : Parser[Char,Char] = new EqT[Char] ('s')
lazy val X : Parser[Char,Char] = new EqT[Char] ('x')
// Definition of an expression
// I'm pretty sure all the asInstanceOfs are avoidable/unnecessary.
lazy val EXP : Parser[Char,Exp] =
rule(X) ==> { case x => One.asInstanceOf[Exp] } ||
rule(EXP ~ S ~ EXP) ==> { case e1 ~ s ~ e2 => Sum(e1,e2).asInstanceOf[Exp] } ||
rule(EXP ~ S ~ X) ==> { case e1 ~ s ~ e2 => Sum(e1,One).asInstanceOf[Exp] } ||
rule(X ~ S ~ EXP) ==> { case e1 ~ s ~ e2 => Sum(One,e2).asInstanceOf[Exp] } ||
rule(X) ==> { case x => One.asInstanceOf[Exp] } ||
rule(EXP) ==> { case e => e } ||
rule(Epsilon[Char]) ==> { case () => One }
// Actually run the rule
val xin = Stream.fromIterator("xsxsxsxsx".elements)
EXP.parseFull(xin)
// return value => Stream(Sum(One,Sum(Sum(One,One),Sum(One,One))), ?)</code></pre>
What is actually state-of-the-art in parsing?<p>When I, as an Amateur, last looked into it, PEG[1] and extensions like OMeta[2] seemed to be the best options. I've heard good things about Parsec[3], too.<p>[1](<a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar" rel="nofollow">http://en.wikipedia.org/wiki/Parsing_expression_grammar</a>)
[2](<a href="http://www.tinlizzie.org/ometa/" rel="nofollow">http://www.tinlizzie.org/ometa/</a>)
[3](<a href="http://legacy.cs.uu.nl/daan/parsec.html" rel="nofollow">http://legacy.cs.uu.nl/daan/parsec.html</a>)
Well, Antlr (<a href="http://www.antlr.org/" rel="nofollow">http://www.antlr.org/</a>) has replaced yacc for most of the practical work already. Daniel Spiewak also mentions parser combinators and how GLL parsers can be much easier to use and yet fast enough most of the time (<a href="http://www.codecommit.com/blog/scala/unveiling-the-mysteries-of-gll-part-1" rel="nofollow">http://www.codecommit.com/blog/scala/unveiling-the-mysteries...</a>). He mentions parser combinators as domain specif languages for creating parsers (<a href="http://www.codecommit.com/blog/scala/the-magic-behind-parser-combinators" rel="nofollow">http://www.codecommit.com/blog/scala/the-magic-behind-parser...</a>).<p>Despite all of these, adoption of better techniques is really slow. As usual.
Fascinating. I don't understand it all yet - I'm reading it carefully but it'll be weeks before I really get it.<p>However ...<p>I'm getting the feeling that this encompasses properly a feeling I've had about parsing for some time, that there should be a way of parsing the entire text, with the parse settling onto the text all at the same time. I don't know if this is what it's saying, but that's the sense I get from it.<p>But even if it isn't, it looks intriguing.
So many useful links in this discussion, I consolidated them (and a few extras) here <a href="http://post.b3k.us/modern-parsing-bibliography" rel="nofollow">http://post.b3k.us/modern-parsing-bibliography</a>