TechEcho

9 comments

harpocratesabout 8 years ago

The article by Russ Cox [0] (from which the initial Perl vs. Thompson NFA graph is taken) is a much more informative read. Also, I'm not sure why this always gets brought up as being a big deal: CS is all about making the right tradeoffs and I agree that sometimes there are cool tricks that let you _almost_ have your cake an eat it. But here, there is no such trick: backtracking has always been exponential and finite state automata have always been linear.No surprises.[0] <a href="https://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">https://swtch.com/~rsc/regexp/regexp1.html</a>

评论 #13926569 未加载

评论 #13925848 未加载

评论 #13926432 未加载

评论 #13925729 未加载

cestithabout 8 years ago

I stopped reading when a post in 2017 compared anything to perl 5.8.7 which is older than what CentOS 5 shipped.Perl 5.8.x was EOL in 2008. 5.10 was EOL in 2009.The currently maintained versions are 5.22.3 and 5.24.1 and major redesign, refactoring, and rework has been done in the system including the regex parser. It went in the meantime from being a primarily recursive, backtracking NFA to using a DFA when possible, being primarily iterative, and only recursing when necessary.Anyone intentionally picking a more than decade old version whose major version line was end-of-life nearly a decade ago to compare to their new hotness has completely invalidated the evidence for their arguments. All the great things the slides might say may still be true, but they are unsupported by these charts.

评论 #13926116 未加载

stymaarabout 8 years ago

Slightly off-topic: BurntSushi wrote a Go binding[1] for the Rust regex engine[2] (he is also the author) which shares the same approach (finite automata) but has a more polished, highly optimized implementation. It could be useful in some scenario for people facing performance issues with Go's regex engine.[1] <a href="https://github.com/BurntSushi/rure-go" rel="nofollow">https://github.com/BurntSushi/rure-go</a> [2] <a href="https://github.com/rust-lang/regex" rel="nofollow">https://github.com/rust-lang/regex</a>

ggambettaabout 8 years ago

"[A regex is] a style of describing character strings. If a string successfully describes a regex, then it is called a match"Shouldn't that be "if a regex successfully describes a string"? I'm confused.

jhallenworldabout 8 years ago

I think the Thompson NFA matcher could be pushed further if someone would bother to do it. One example is that since it finds all solutions to a given regular expression, it could be used within a larger backtracking matcher to support back-references.Consider:<pre><code> (a+)(a+)=\1 </code></pre> With an input like: aaaaa=aaThe matcher will find these strings by the time it gets to the '=':<pre><code> (a)(aaaa)= (aa)(aaa)= (aaa)(aa)= (aaaa)(a)= </code></pre> Each match will end up as a separate still-existing thread in the Thompson matcher. Instead of pruning them as soon as possible, keep them and recursively match the rest of the string with the back-reference set in turn to each until a match is found.

评论 #13953043 未加载

gumbyabout 8 years ago

TL;DR we use a subset of today's common regexps which allowed us to go lightning fast (as long as you don't need lookahead). Oh, and I'll neglect to be clear about that.Nothing wrong in making the tradeoff but I didn't see it in the text.

legulereabout 8 years ago

I do not get why there still is so much of a hype for regular expressions. They are hard to read and maintain. They have a relatively large syntax. Edge-cases are easy to get wrong with them.

davidwabout 8 years ago

Looks like Tcl has been doing it right for a while...

moominabout 8 years ago

Seriously, can we stop fetishising old approaches and papers? Yes, there's some tricks we've missed along the way, but there's a thread of alchemical thinking in our community that is just plain unproductive.

评论 #13926047 未加载

评论 #13926138 未加载

评论 #13926165 未加载

评论 #13925860 未加载

评论 #13925946 未加载

9 comments

harpocratesabout 8 years ago

评论 #13926569 未加载

评论 #13925848 未加载

评论 #13926432 未加载

评论 #13925729 未加载

cestithabout 8 years ago

评论 #13926116 未加载

stymaarabout 8 years ago

ggambettaabout 8 years ago

jhallenworldabout 8 years ago

评论 #13953043 未加载

gumbyabout 8 years ago

legulereabout 8 years ago

I do not get why there still is so much of a hype for regular expressions. They are hard to read and maintain. They have a relatively large syntax. Edge-cases are easy to get wrong with them.

The 1960's elegance behind Go's regexp

9 comments

The 1960's elegance behind Go's regexp

9 comments