There was a youtube video a few years back that showed a guy livecoding...I think an ECMA interpreter, or maybe a WASM interpreter from scratch using some very easy to follow code. I can't remember what it was or the video, but it was gobsmacking. So instead here's another story.<p>I once worked with an old Perl greybeard, he had written some kind of code that could parse an almost impossible large set of string formats that a large collection of automated systems produced except they all had some sort of non-standard/non-guaranteed formatting or semantics. Little bits and pieces (some semi-atomic bits) of them were semi-reliable, but the aggregate strings were just a huge mess, things came in all kinds of orders, had unreliable separators (if any) some things came in backwards, some forwards, and so on.<p>You might imagine them to be things like IP addresses that sometimes had system names attached, sometimes locations, sometimes, no address, sometimes the address was separated by dots or commas, sometimes, pieces of equipment were listed. All legacy of decades of people slowly accruing bits and pieces of the larger system and naming things for local convenience rather for global understanding. As the systems rolled up these strings would <i>sometimes</i> get concatenated in different ways and in different orders.<p>His code looked like total gibberish, a library of a bunch of code that sort of looked like Perl but also sort of looked like a custom rule language, it had something like 1200 rules in it based on the kinds of "atomics" he'd observed. The rules were sort of pieces of regexes, and then a string with embedded functions in it.<p>And then some bizarre code elsewhere I also didn't get at first.<p>It turned out the code was clever way of writing a meta program that would assemble new code on the fly that was capable of processing every possible string format -- potentially billions of them. At the core of the code was simple lexer that atomized the strings based on a few heuristics (nonstandard of course), and then would apply the rules to each lexical piece.<p>It turns out the rules were regexes that emitted code snippets on match. The snippets were often templated to take pieces of the atomic bits and the regex would parse the atom and put the pieces into the snippet.<p>There was some simple logic to assemble all of the emitted code into a "program" that when eval()'d would generate an output object that had all of the components of the original string, but also decoded, enriched, normalized, and ready for downstream work.<p>In the rare case an "atom" was encountered the system didn't have a "rule" for, it would throw an error and dump the string out to a file where he'd take a look at it, and in about 30 seconds slap in a new "rule" to handle the case forever more.<p>The entire codebase was maybe the rules and a couple hundred more lines of code -- maybe 1400-1500 lines. Written in a few weeks, processing tens of millions of strings per day, silent and forgotten in the bowels of some giant enterprise automation system.<p>It, running on a Pentium II or III of some kind, had replaced a rack of Sun Pizza boxes, and an old codebase setup by a consulting company that went down or garbled the processing of the strings at least once a week and required two full-time people to keep running. When the company had a downturn they laid off that team and in a fit of desperation the greybeard had come up with this little monster -- the beauty of which was under 5 minutes of maintenance per week to check logged errors and write a new rule if needed.<p><i>edit</i> ahh here's the video <a href="https://youtu.be/r-A78RgMhZU" rel="nofollow">https://youtu.be/r-A78RgMhZU</a>