Note that when you use grep, awk, sed, libc regexec(), RE2 [1], or rust/regex, this isn't how your regexes are executed.<p>They use automata-based engines whereas this is a backtracking engine, which can blow up on small inputs. (Although I will say the C code here is cool, because it gives you a fork() bomb, so it might exhaust your operating system and not just your CPU!)<p>Canonical reference: <i>Regular Expression Matching Can Be Simple And Fast
(but is slow in Java, Perl, PHP, Python, Ruby, ...)</i><p><a href="https://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">https://swtch.com/~rsc/regexp/regexp1.html</a><p>Archive since it's down at the moment due to some App Engine stuff (!): <a href="https://web.archive.org/web/20200624195819/https://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">https://web.archive.org/web/20200624195819/https://swtch.com...</a><p>My blog post and sample code that verifies that GNU grep, awk, sed, and libc don't backtrack:<p><i>Comments on Eggex and Regular Languages</i> <a href="http://www.oilshell.org/blog/2020/07/eggex-theory.html" rel="nofollow">http://www.oilshell.org/blog/2020/07/eggex-theory.html</a><p><a href="https://github.com/oilshell/blog-code/tree/master/regular-languages" rel="nofollow">https://github.com/oilshell/blog-code/tree/master/regular-la...</a><p>[1] <a href="https://github.com/google/re2" rel="nofollow">https://github.com/google/re2</a><p>----<p>Also, the production quality way of compiling regular expressions to C code looks like this [2], which is completely different than what's shown in the article:<p><a href="https://www.oilshell.org/release/0.8.pre8/source-code.wwz/_devbuild/gen/osh-lex.h" rel="nofollow">https://www.oilshell.org/release/0.8.pre8/source-code.wwz/_d...</a><p>The regex is converted to an NFA, then a DFA, then a bunch of switch/goto statements in C that implement the DFA.<p>This example is huge, but the switch/goto and lack of any notion of "threading" is the important point.<p>Essentially, the instruction pointer is used as the DFA state. It's a very natural use of "goto" (where you don't write or debug them yourself)<p>[2] via <a href="http://re2c.org/" rel="nofollow">http://re2c.org/</a> (not related to RE2)