TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An additional non-backtracking RegExp engine

46 点作者 0xedb超过 4 年前

4 条评论

ufo超过 4 年前
&gt; For now it remains experimental because Irregexp is orders of magnitude faster than the new engine on most common patterns<p>I&#x27;m sure that an important reason for this is that thousands of man-hours have been invested optimizing Irregexp. However, is that the only explanation?<p>In my experience, while an NFA-based engine has better asymptotic complexity, a backtracking-based engine can be hard to beat for patterns that don&#x27;t do much backtracking. In a textbook NFA automaton, the time spent maintaining a set of parallel states may be &quot;wasted&quot; if it turns out that the first state in the list is already going to succeed anyway.<p>Regex experts of Hacker News, do you know if there are any engines and&#x2F;or algorithms that try to get the best of both worlds? Fast execution for patterns that don&#x27;t need much backtracking, but avoiding exponential run time in the worst case?
评论 #25742008 未加载
评论 #25746395 未加载
评论 #25741885 未加载
IlliOnato超过 4 年前
&gt; Unfortunately, backreferences, lookahead and lookbehind cannot be supported without major changes that alter asymptotic worst-case complexity.<p>I live and die by regexes, and regexes without backreferences and lookahead would be pretty useless to me.<p>I guess such regexes are useful for _something_, but I am not sure what it would be.
评论 #25741436 未加载
评论 #25741414 未加载
评论 #25742110 未加载
recursive超过 4 年前
&gt; For the fallback mechanism to kick in, the RegExp must: &gt; * ... &gt; * not have the u (Unicode) or i (case insensitive) flags set.<p>I wonder why these flags are significant. Case-insensitivity and unicode support don&#x27;t seem like they should really affect whatever algorithm they&#x27;re using.<p>Am I underthinking this, or can you convert &#x2F;abc&#x2F;i into &#x2F;[aA][bB][cC]&#x2F; ?
评论 #25741068 未加载
评论 #25742018 未加载
评论 #25746000 未加载
yunruse超过 4 年前
It seems more developer-friendly to me to, at Regex construction, try to parse as linear, and fallback to a backtracking regex if not. The engine is an implementation detail that programmers probably care very little about; a linear engine being default but not raising errors would be the best-of-all-worlds scenario.<p>(It’s a little weird to see the i flag not working with linear regexes, mind. Surely that’s easily implemented when the regex is compiled?)
评论 #25741138 未加载
评论 #25741282 未加载