> If the string to be matched against contains 20,000 space characters in a row, but not at the end, then the Regex engine will start at the first space, check that it belongs to the \s character class, move to the second space, make the same check, etc. After the 20,000th space, there is a different character, but the Regex engine expected a space or the end of the string. Realizing it cannot match like this it backtracks, and tries matching \s+$ starting from the second space, checking 19,999 characters. The match fails again, and it backtracks to start at the third space, etc.<p>That's not how backtracking works. A regex engine will only backtrack to try and make the rest of the regex match, i.e. it will take characters of the RHS of the string, not try and start "from the second character off the start of the string". I mean, if the engine tried matching from the second space, what would be matching the first space? Something has to.<p>Which meant, that even if the regex engine was incredibly stupid and could not figure out that a greedy block of \s was never going to contain a non-\s, it would only have to check 20,001 times, not 199000 (or whatever it was).<p>I can't reproduce this "bug" in either Perl or Python. The time taken to match a 30,000 block of space either followed by $ or XX$ was basically identical for \s+$.<p>There does appear to be normal backtracking going on, roughly doubling the search time for large strings terminating in non-\s. This is expected, as it has to check 20,000 during the first gobble, then 20,000 as it backtracks <i>from the right</i> 20,000 times.<p><pre><code> $ time perl -e '(" " x 100000000 . "X") =~ /\s+$/ && print "MATCH"'
real 0m0.604s
user 0m0.509s
sys 0m0.094s
$ time perl -e '(" " x 100000000) =~ /\s+$/ && print "MATCH"'
MATCH
real 0m0.286s
user 0m0.197s
sys 0m0.089s</code></pre>