TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The 1960's elegance behind Go's regexp

32 pointsby Dawny33about 8 years ago

9 comments

harpocratesabout 8 years ago
The article by Russ Cox [0] (from which the initial Perl vs. Thompson NFA graph is taken) is a much more informative read. Also, I&#x27;m not sure why this always gets brought up as being a big deal: CS is all about making the right tradeoffs and I agree that sometimes there are cool tricks that let you _almost_ have your cake an eat it. But here, there is no such trick: backtracking has always been exponential and finite state automata have always been linear.<p>No surprises.<p>[0] <a href="https:&#x2F;&#x2F;swtch.com&#x2F;~rsc&#x2F;regexp&#x2F;regexp1.html" rel="nofollow">https:&#x2F;&#x2F;swtch.com&#x2F;~rsc&#x2F;regexp&#x2F;regexp1.html</a>
评论 #13926569 未加载
评论 #13925848 未加载
评论 #13926432 未加载
评论 #13925729 未加载
cestithabout 8 years ago
I stopped reading when a post in 2017 compared anything to perl 5.8.7 which is older than what CentOS 5 shipped.<p>Perl 5.8.x was EOL in 2008. 5.10 was EOL in 2009.<p>The currently maintained versions are 5.22.3 and 5.24.1 and major redesign, refactoring, and rework has been done in the system including the regex parser. It went in the meantime from being a primarily recursive, backtracking NFA to using a DFA when possible, being primarily iterative, and only recursing when necessary.<p>Anyone intentionally picking a more than decade old version whose major version line was end-of-life nearly a decade ago to compare to their new hotness has completely invalidated the evidence for their arguments. All the great things the slides might say may still be true, but they are unsupported by these charts.
评论 #13926116 未加载
stymaarabout 8 years ago
Slightly off-topic: BurntSushi wrote a Go binding[1] for the Rust regex engine[2] (he is also the author) which shares the same approach (finite automata) but has a more polished, highly optimized implementation. It could be useful in some scenario for people facing performance issues with Go&#x27;s regex engine.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;BurntSushi&#x2F;rure-go" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BurntSushi&#x2F;rure-go</a> [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;rust-lang&#x2F;regex" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;rust-lang&#x2F;regex</a>
ggambettaabout 8 years ago
<i>&quot;[A regex is] a style of describing character strings. If a string successfully describes a regex, then it is called a match&quot;</i><p>Shouldn&#x27;t that be &quot;if a regex successfully describes a string&quot;? I&#x27;m confused.
jhallenworldabout 8 years ago
I think the Thompson NFA matcher could be pushed further if someone would bother to do it. One example is that since it finds all solutions to a given regular expression, it could be used within a larger backtracking matcher to support back-references.<p>Consider:<p><pre><code> (a+)(a+)=\1 </code></pre> With an input like: aaaaa=aa<p>The matcher will find these strings by the time it gets to the &#x27;=&#x27;:<p><pre><code> (a)(aaaa)= (aa)(aaa)= (aaa)(aa)= (aaaa)(a)= </code></pre> Each match will end up as a separate still-existing thread in the Thompson matcher. Instead of pruning them as soon as possible, keep them and recursively match the rest of the string with the back-reference set in turn to each until a match is found.
评论 #13953043 未加载
gumbyabout 8 years ago
TL;DR we use a subset of today&#x27;s common regexps which allowed us to go lightning fast (as long as you don&#x27;t need lookahead). Oh, and I&#x27;ll neglect to be clear about that.<p>Nothing wrong in making the tradeoff but I didn&#x27;t see it in the text.
legulereabout 8 years ago
I do not get why there still is so much of a hype for regular expressions. They are hard to read and maintain. They have a relatively large syntax. Edge-cases are easy to get wrong with them.
davidwabout 8 years ago
Looks like Tcl has been doing it right for a while...
moominabout 8 years ago
Seriously, can we stop fetishising old approaches and papers? Yes, there&#x27;s some tricks we&#x27;ve missed along the way, but there&#x27;s a thread of alchemical thinking in our community that is just plain unproductive.
评论 #13926047 未加载
评论 #13926138 未加载
评论 #13926165 未加载
评论 #13925860 未加载
评论 #13925946 未加载