It really feels like they buried the lede here, as almost all discussions about catastrophic backtracking seem to do. Here's the second-to-last paragraph:<p>> Alternatively you could switch to a regular expression engine that doesn’t exhibit this kind of behaviour. RE2 is an excellent library that uses a finite automata approach to avoid this type of catastrophic backtracking. Using this library on the same 32 character input reduces the running time from 12 minutes to basically instant. The downside being that RE2 doesn’t support all the functionality of the builtin re package.<p>Perl Compatible Regular Expressions (the flavor used by most engines, including Python's) _require_ this exponential worst-case behavior (using an implementation called backtracking). But the theory behind regex does not require this, and if you eliminate just a couple (infrequently used) features of PCRE, you can have a regex engine that runs in linear time (with respect to the size of the input string). See [2], this is by the authors of RE2. The features which are incompatible with this sort of implementation are:<p>1. Lookahead assertions.
2. Backreferences to matched groups.<p>If you don't know what these are, or you rarely ever use them, then you really have no business using this kind of regex engine. (And if you are using them, then I'd argue you're shoving a bit too much logic into a regex, but that's beside the point). Nonetheless, every popular programming language (except Go[1]) has included an exponential backtracking regex implementation in their standard library, and exposed the entire industry to this stupidity, all for a few backreferences and lookahead assertions.<p>What's especially crazy is this: it's easy for a regex engine to <i>detect</i> the use of these constructs, because it has to parse the regex anyway! So it's feasible for the engine to optimistically try to use an efficient implementation, and only fall back to using a backtracking implementation when you're actually using these features. This is what the Python re2 module[3] does, it uses RE2 by default and supports falling back to the backtracking implementation if necessary.<p>Instead, we're stuck reading the same postmortem every few years describing how "catastrophic backtracking" ruined another company/person's day, when the problem has been solved for decades, and language/library creators have just failed to include that solution.<p>[1]: Rob Pike invented, or at least popularized, the algorithm used by RE2. He was also involved in the creation of Go, as was Russ Cox[2].
[2]: <a href="https://swtch.com/~rsc/regexp/" rel="nofollow">https://swtch.com/~rsc/regexp/</a>
[3]: <a href="https://pypi.org/project/re2/" rel="nofollow">https://pypi.org/project/re2/</a>