I am at a vacation and bored, so I decided to try to actually understand this. I do not know much APL/K/J, but for a start, here is a version with more traditional line breaking:<p><a href="https://gist.github.com/anonymous/f72e5c4a432492abce59" rel="nofollow">https://gist.github.com/anonymous/f72e5c4a432492abce59</a><p>Some basic observations:<p>- He uses all the obscure C features, like the ability to not declare the type of an argument or return value and let the compiler fill in, and also the old school style of declaring functions:<p><pre><code> foo(x,y) int x, y { return 0; }
</code></pre>
Instead of:<p><pre><code> int foo(int x, int y) { return 0; }
</code></pre>
Since the compiler will infer the int, and since he uses R for return this example eventually becomes:<p><pre><code> foo(x,y){R(0);}
</code></pre>
- Variables, or really registers, are only accessible as letters from a to z, and the "st" array stores the values of all registers. Numbers entered in the REPL have to be between 0 and 9, and hence he avoids the dirty work of making a proper lexer. It's also super easy to trigger a segfault since any error handling is non-existent.<p>- DO(n,x) is a C macro that evaluates the given expression "x" for all numbers between 0 and "n"<p>- V1 is a C macro that defines unary operators for the interpreted language, and V2 defines binary operators. In V1 definitions the operand is called "w", in V2 the operands are called "a" and "w".<p>- For example ",", which calls the cat function, is a binary operator that creates vectors:<p><pre><code> 1,2,3,4
4
1 2 3 4
</code></pre>
- The vt, vd and vm arrays map ascii symbols to the functions defined with V1 and V2. { is the second symbol in vt, so when used as a unary operator it calls "size" (second non-null element of vm):<p><pre><code> {5,6,7,8
4
</code></pre>
and when used as a binary operator it calls "from" (second non-null element of vd).<p>- wd is a parser that goes from the original input string to a weird intermediate form that is an array of longs. Each input character gets mapped one-to-one to an item in this intermediate form.<p><pre><code> If the input character was a number between 0 and 9:
Value type instance gets allocated
Intermediate form for this input character consists of the address of the allocated instance
If the character is a letter between "a" and "z":
Intermediate form consists simply of this character
If the character represents an operator
Intermediate form consists of the index of the operator in the vt array
</code></pre>
In other words, the intermediate form is an array where some elements are ascii characters, others are memory addresses and yet other indices into some array. This part is really something.<p>- The ex function executes the intermediate form. Since everything in the input is fixed length, and there is no syntax checking, it just indexes into the intermediate form assuming everything is well formed, while the parser did not check that so it's not really guaranteed - again a source of easy segfaults. The execution goes from left to right and consists of looking at the first position in the intermediate form and then making recurrent calls if necessary (let X be the current item in the intermediate form):<p><pre><code> If X is a character
Lookahead one item
If it is a '=' char
Assign the result of executing everything after the '=' to the register indicated by X
Assign to X the value of the register named by X
If X is not a character and is a small integer
We are applying a binary operator
X is the index into the "vm" array
Fetch the function from "vm", apply it to the result of executing the rest of the intermediate form
Otherwise:
If there is any more input remaining other than the current item, we are applying a binary operator
Lookup the function in "vt", apply it to the result of executing the intermediate form to the left and to the right of the operator
</code></pre>
- I have the biggest problem with understanding that "a" struct, that represents all values in the interpreted language, which are arrays. ga is clearly the basic allocation function for it, "plus" obviously adds two arrays, so it's clear the "p" field holds the actual contents, but that's where things get very shady.