Very nice work and cool project.<p>However, the example in "3. Baremetal Cognition" is explained in an overly convoluted way, with many choices that IMO detracts from the point that (I think) you're trying to make. There's typos that makes it even harder to understand.<p>1. Use something like underscore instead of spaces and, maybe even another character like period instead of newline. You can explain after the section that you could have used space and newline instead of _ and .<p>2. Immediately after showing<p><pre><code> ldfgldftgldfdtgl
df_
_
dfiff1_crank_f
</code></pre>
you can parse it out for the reader, as something like<p><pre><code> "l" 'set-non-delim eval
"gl" 'set-non-delim eval
"tgl" 'set-non-delim eval
"dtgl"
"\n" 'set-non-delim eval
"_\n"
"_\n" 'set-non-delim eval
'set-ignore eval
eval
</code></pre>
and so on. Or maybe even<p><pre><code> (set-non-delim "l")
(set-non-delim "gl")
(set-non-delim "tgl")
(set-non-delim "dtgl")
(set-non-delim "_\n")
(set-ignore "_\n")
(dtgl)
</code></pre>
and only <i>then</i> you'd go through the source, character by character. Just because the source is hard to read by humans, doesn't mean we need to stick to it in an explanatory example.<p>3.
> Delimiters have an interesting rule, and that is that the delimiter character is excluded from the tokenized word unless we have not ignored a character in the tokenization loop, in which case we collect the character as a part of the current token and keep going.<p>There are four "negations" in this sentence: "excluded", "unless", "not", "ignored" and two turn to explain something ostensible simple: when to end tokens added to the stack (or container). This together with whitelist, blacklist, delim, singlet needs a much cleaner naming and description.<p>Also set non-delimiter is an extra negation.<p>4. There's an error right after "Now, for the rest of the code: ". The third line contains two spaces instead of a single one. (Using suggestion 1 would have also avoided this for yourself.)<p># Comment about the actual content<p>5. I can kind of see the rationale for this (which is also explained in the beginning). However, I don't see exactly where we'd set clear boundaries since we can alwasy stuff semantics into the initial parser. For example, instead of have `f` bound to eval, we could have set `f` to execute the entire bootstrapping sequence and then rebind `f` to eval. So the entire example would be reduced to just `f`.<p>I guess we'd have to argue about the initial set of functions we are allowed to are somehow primitive enough. But even `d` (set-non-delim) while it only toggles some values in an array (or list) piggybacks on the parsers inherent ability to skip characters in its semantics and `i` (set-ignore) needs inversion implemented in the parser.<p>6. Here we assume that one byte per character is the default starting state of the world but unicode and other encodings don't have this so you'd need some parser to be get started anyways. And in that case, is an initial parser using space and end of line as separators really unusual?<p>7. I don't see why (not) reading ahead would such an important property for modifiable syntax. You just need to not really ahead <i>too much</i>, like the entire rest of the file or stream.<p>8. Regarless, I think this is worth exploring but also keep in mind some of these questions while doing that.