TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Regular expression matching can be simple and fast (2007)

141 点作者 momonga大约 1 年前

8 条评论

jjice大约 1 年前
This and the associated series of regex posts from Russ Cox are probably the best content on practical regular expression engine internals you can get. If you have a cursory understanding of NFAs and DFAs, it&#x27;s some of the best technical writing I&#x27;ve ever had the pleasure to read.<p>These posts are responsible for my love of regular expressions :)
评论 #40428797 未加载
评论 #40424531 未加载
评论 #40443669 未加载
chx大约 1 年前
&gt; As mentioned earlier, no one knows how to implement regular expressions with backreferences efficiently, though no one can prove that it&#x27;s impossible either. (Specifically, the problem is NP-complete, meaning that if someone did find an efficient implementation, that would be major news to computer scientists and would win a million dollar prize.)<p>It really needs to be noted how NP-complete and an algorithm efficient <i>in practice</i> have nothing to do with each other. If an algorithm takes C * 1.000...eighty zeroes...1^n time it is exponential but for any n you will encounter in real life the time it takes will be indistinguishable from C.
评论 #40426784 未加载
评论 #40428143 未加载
plesner大约 1 年前
This is a fine introduction to automata-based regexps but the framing of comparing &quot;good&quot; automata with &quot;bad&quot; perl-style is misguided. Automata and perl-style regexps are different beasts and solve different problems. The problem seems to be one of terminology: the perl style should never have been called regexps. That&#x27;s not what they are. It&#x27;s a pattern language that happens to have a variant of regexps as a subset.
Aerbil313大约 1 年前
I&#x27;ve recently discovered Lua Parsing Expression Grammars (LPEG)[1]. They are built on a different approach to parsing and are much more capable than conventional regexes, able to handle recursive sequences, able to include code, have debugging utilities, are often much more performant than regexes and they are an absolute delight to work with. It also has a module called re [2] which uses a similar syntax to regexes.<p>1: <a href="https:&#x2F;&#x2F;www.inf.puc-rio.br&#x2F;~roberto&#x2F;lpeg&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.inf.puc-rio.br&#x2F;~roberto&#x2F;lpeg&#x2F;</a><p>2: <a href="https:&#x2F;&#x2F;www.inf.puc-rio.br&#x2F;~roberto&#x2F;lpeg&#x2F;re.html" rel="nofollow">https:&#x2F;&#x2F;www.inf.puc-rio.br&#x2F;~roberto&#x2F;lpeg&#x2F;re.html</a>
kayodelycaon大约 1 年前
It&#x27;s easy to miss in the article, but the reason most regex engines exhibit Perl&#x27;s behavior is due to features like back referencing. (And I assume look behinds and look aheads.)<p>PCRE and theoretically pure regular expressions are different things.
评论 #40429339 未加载
adrianN大约 1 年前
Regexes can be fast and simple, but sometimes it’s faster and simpler to use normal string search. I recently replaced a regex with two finds for 10x speedup.
评论 #40427598 未加载
评论 #40427825 未加载
评论 #40426730 未加载
评论 #40425861 未加载
rurban大约 1 年前
Perl needed a couple of years to catchup. At first they had no idea what Russ was talking about
ChrisArchitect大约 1 年前
Some discussion from 5 years ago:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20310669">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20310669</a>
评论 #40424456 未加载