I would recommend that the author read up on NFAs and DFAs -- they are a formalism better suited to lexers than tries.<p>At a high level, if compiling a lexer for a run-of-the-mill language like JavaScript takes 5 minutes and 2.5GB of RAM, you are most likely doing it wrong. By "doing it wrong," I mean that there is almost certainly a better approach that is far superior in every measurable aspect (CPU, memory, code size, etc).<p>I don't fully understand what kind of algorithm the author was using, so I can't comment in detail on it, but in general lexers are better thought of as finite automata (NFAs and DFAs) than tries. The two are related, but unlike tries NFAs and DFAs can have cycles, which are essential for representing a lexing state machine efficiently.<p>Another observation: it's not too terribly surprising that you could beat preg_match_all() with the latter being given a huge regular expression. Most regex engines are backtracking (RE2 being a notable counterexample), which means that a regular expression with high-cardinality alternation (ie. A|B|C|D|E|F|G ...etc) is one of their worst cases. This isn't what they are designed for. A backtracking regex engine will basically try each alternative in sequence until that alternative no longer matches, then back up and try the next one. NFA/DFA based parsers will be much faster for this case.<p>The right tool for this job, as another commenter mentioned, is Ragel. It's basically designed for exactly this. It doesn't appear to support PHP though...