TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An additional non-backtracking RegExp engine

46 pointsby 0xedbover 4 years ago

4 comments

ufoover 4 years ago
&gt; For now it remains experimental because Irregexp is orders of magnitude faster than the new engine on most common patterns<p>I&#x27;m sure that an important reason for this is that thousands of man-hours have been invested optimizing Irregexp. However, is that the only explanation?<p>In my experience, while an NFA-based engine has better asymptotic complexity, a backtracking-based engine can be hard to beat for patterns that don&#x27;t do much backtracking. In a textbook NFA automaton, the time spent maintaining a set of parallel states may be &quot;wasted&quot; if it turns out that the first state in the list is already going to succeed anyway.<p>Regex experts of Hacker News, do you know if there are any engines and&#x2F;or algorithms that try to get the best of both worlds? Fast execution for patterns that don&#x27;t need much backtracking, but avoiding exponential run time in the worst case?
评论 #25742008 未加载
评论 #25746395 未加载
评论 #25741885 未加载
IlliOnatoover 4 years ago
&gt; Unfortunately, backreferences, lookahead and lookbehind cannot be supported without major changes that alter asymptotic worst-case complexity.<p>I live and die by regexes, and regexes without backreferences and lookahead would be pretty useless to me.<p>I guess such regexes are useful for _something_, but I am not sure what it would be.
评论 #25741436 未加载
评论 #25741414 未加载
评论 #25742110 未加载
recursiveover 4 years ago
&gt; For the fallback mechanism to kick in, the RegExp must: &gt; * ... &gt; * not have the u (Unicode) or i (case insensitive) flags set.<p>I wonder why these flags are significant. Case-insensitivity and unicode support don&#x27;t seem like they should really affect whatever algorithm they&#x27;re using.<p>Am I underthinking this, or can you convert &#x2F;abc&#x2F;i into &#x2F;[aA][bB][cC]&#x2F; ?
评论 #25741068 未加载
评论 #25742018 未加载
评论 #25746000 未加载
yunruseover 4 years ago
It seems more developer-friendly to me to, at Regex construction, try to parse as linear, and fallback to a backtracking regex if not. The engine is an implementation detail that programmers probably care very little about; a linear engine being default but not raising errors would be the best-of-all-worlds scenario.<p>(It’s a little weird to see the i flag not working with linear regexes, mind. Surely that’s easily implemented when the regex is compiled?)
评论 #25741138 未加载
评论 #25741282 未加载