科技回声

5 条评论

tikhonj大约 10 年前

…if your regexes are, you know, regular. Which Perl-style regular expressions most definitely are not. So it's not a matter of just implementing the regexes you know faster, but limiting them to a subset of features first.This is a perfectly valid thing to do, but it is changing the power of regexes significantly. I feel the article makes this fact easy to overlook—I certainly did until I took a class on automata.But yes, presumably Perl could use an NFA-based algorithm if given a regex that is actually regular. One of my professors attributed the fact that it doesn't to old patent issues, but I have not been able to find a good source for that. It's also possible that it's just a case of worse is better: backtracking is fast enough 99% of the time, and if performance really matters, perhaps you shouldn't be using regexes anyhow.

评论 #9378566 未加载

评论 #9378690 未加载

评论 #9378074 未加载

pcwalton大约 10 年前

To be honest, I've never been much of a fan of this article, because it neglects the fastest way to implement regexes in practice: use a fast JIT to emit code that performs recursive backtracking. The Thompson NFA is a better algorithm for pathological regexes that people don't frequently write (e.g. "(a?)﹡a﹡)"), but the constant factors are much worse for common cases.The Thompson NFA has its place as a fallback to avoid exponential blowup on pathological regexes, but if I were implementing a performance-critical regex engine, I would start with a JIT for the recursive backtracking approach, because in practice constant factors dominate over algorithmic worst-time bounds, and only add the Thompson NFA later.

评论 #9378699 未加载

humanarity大约 10 年前

And Ken Thompson's original 1968 paper,<a href="http://www.fing.edu.uy/inco/cursos/intropln/material/p419-thompson.pdf" rel="nofollow">http://www.fing.edu.uy/inco/cursos/intropln/material/p419-th...</a>

aioprisan大约 10 年前

Any good web-based regex tools that you use daily?

评论 #9378129 未加载

评论 #9383532 未加载

评论 #9378137 未加载

评论 #9388293 未加载

评论 #9379517 未加载

smegel大约 10 年前

And when was the last time anyone used grouping in a regex!

评论 #9378082 未加载

评论 #9377754 未加载

5 条评论

tikhonj大约 10 年前

评论 #9378566 未加载

评论 #9378690 未加载

评论 #9378074 未加载

pcwalton大约 10 年前

评论 #9378699 未加载

humanarity大约 10 年前

aioprisan大约 10 年前

Any good web-based regex tools that you use daily?

评论 #9378129 未加载

评论 #9383532 未加载

评论 #9378137 未加载

评论 #9388293 未加载

评论 #9379517 未加载

smegel大约 10 年前

And when was the last time anyone used grouping in a regex!

评论 #9378082 未加载

评论 #9377754 未加载

Regular Expression Matching Can Be Simple and Fast (2007)

5 条评论

Regular Expression Matching Can Be Simple and Fast (2007)

5 条评论