…if your regexes are, you know, <i>regular</i>. Which Perl-style regular expressions most definitely are not. So it's not a matter of just implementing the regexes you know faster, but limiting them to a subset of features first.<p>This is a perfectly valid thing to do, but it <i>is</i> changing the power of regexes significantly. I feel the article makes this fact easy to overlook—I certainly did until I took a class on automata.<p>But yes, presumably Perl could use an NFA-based algorithm if given a regex that is actually regular. One of my professors attributed the fact that it doesn't to old patent issues, but I have not been able to find a good source for that. It's also possible that it's just a case of worse is better: backtracking is fast enough 99% of the time, and if performance really matters, perhaps you shouldn't be using regexes anyhow.
To be honest, I've never been much of a fan of this article, because it neglects the fastest way to implement regexes <i>in practice</i>: use a fast JIT to emit code that performs recursive backtracking. The Thompson NFA is a better algorithm for pathological regexes that people don't frequently write (e.g. "(a?)﹡a﹡)"), but the constant factors are much worse for common cases.<p>The Thompson NFA has its place as a fallback to avoid exponential blowup on pathological regexes, but if I were implementing a performance-critical regex engine, I would start with a JIT for the recursive backtracking approach, because in practice constant factors dominate over algorithmic worst-time bounds, and only add the Thompson NFA later.
And Ken Thompson's original 1968 paper,<p><a href="http://www.fing.edu.uy/inco/cursos/intropln/material/p419-thompson.pdf" rel="nofollow">http://www.fing.edu.uy/inco/cursos/intropln/material/p419-th...</a>