TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Regular Expression Matching Can Be Simple and Fast (2007)

31 pointsby humanarityabout 10 years ago

5 comments

tikhonjabout 10 years ago
…if your regexes are, you know, <i>regular</i>. Which Perl-style regular expressions most definitely are not. So it&#x27;s not a matter of just implementing the regexes you know faster, but limiting them to a subset of features first.<p>This is a perfectly valid thing to do, but it <i>is</i> changing the power of regexes significantly. I feel the article makes this fact easy to overlook—I certainly did until I took a class on automata.<p>But yes, presumably Perl could use an NFA-based algorithm if given a regex that is actually regular. One of my professors attributed the fact that it doesn&#x27;t to old patent issues, but I have not been able to find a good source for that. It&#x27;s also possible that it&#x27;s just a case of worse is better: backtracking is fast enough 99% of the time, and if performance really matters, perhaps you shouldn&#x27;t be using regexes anyhow.
评论 #9378566 未加载
评论 #9378690 未加载
评论 #9378074 未加载
pcwaltonabout 10 years ago
To be honest, I&#x27;ve never been much of a fan of this article, because it neglects the fastest way to implement regexes <i>in practice</i>: use a fast JIT to emit code that performs recursive backtracking. The Thompson NFA is a better algorithm for pathological regexes that people don&#x27;t frequently write (e.g. &quot;(a?)﹡a﹡)&quot;), but the constant factors are much worse for common cases.<p>The Thompson NFA has its place as a fallback to avoid exponential blowup on pathological regexes, but if I were implementing a performance-critical regex engine, I would start with a JIT for the recursive backtracking approach, because in practice constant factors dominate over algorithmic worst-time bounds, and only add the Thompson NFA later.
评论 #9378699 未加载
humanarityabout 10 years ago
And Ken Thompson&#x27;s original 1968 paper,<p><a href="http:&#x2F;&#x2F;www.fing.edu.uy&#x2F;inco&#x2F;cursos&#x2F;intropln&#x2F;material&#x2F;p419-thompson.pdf" rel="nofollow">http:&#x2F;&#x2F;www.fing.edu.uy&#x2F;inco&#x2F;cursos&#x2F;intropln&#x2F;material&#x2F;p419-th...</a>
aioprisanabout 10 years ago
Any good web-based regex tools that you use daily?
评论 #9378129 未加载
评论 #9383532 未加载
评论 #9378137 未加载
评论 #9388293 未加载
评论 #9379517 未加载
smegelabout 10 years ago
And when was the last time anyone used grouping in a regex!
评论 #9378082 未加载
评论 #9377754 未加载