I saw the repeating 'A' at the end of the base64 text and thought "it's not even 512 bytes; it's smaller!"<p>That said, the title is just a little clickbaity --- it's a C-subset compiler, and more accurately a JIT interpreter. There also appears to be no attempt at operator precedence. Nonetheless, it's still an impressive technical achievement and shows the value of questioning common assumptions.<p>Finally, I feel tempted to offer a small size optimisation:<p><pre><code> sub ax,2
</code></pre>
is 3 bytes whereas<p><pre><code> dec ax
dec ax
</code></pre>
is 2 bytes.<p>You may be able to use single-byte xchg's with ax instead of movs, and the other thing which helps code density a lot in 16-bit code is to take advantage of the addressing modes and LEA to do 3-operand add immediates where possible.
This reminded me the idea of compilers bootstrapping (<a href="https://news.ycombinator.com/item?id=35714194" rel="nofollow">https://news.ycombinator.com/item?id=35714194</a>). That is, now you can code in SectorC some slightly more advanced version of C capable of compiling TCC (<a href="https://bellard.org/tcc/" rel="nofollow">https://bellard.org/tcc/</a>), and then with TCC you can go forward to GCC and so on.
Now they just need to port something like oneKpaq to 16 bit or maybe something from the <i>extremely tiny decompressor</i> thread [1], just to test compression level to get an idea kpaq on its quickest setting(taking minutes instead of what could be days on its highest) reduced SectorC to 82.81% of its size, of course adding the 128 bit stub knocked it to 677 bytes. It would be interesting to try it on the slowest takes day to bruteforce setting, but I'm not going to attempt that.<p>Some of the compressors in that forum thread since they are 32 bytes and such, might find it easier to get net gains.<p>[0] <a href="https://github.com/temisu/oneKpaq">https://github.com/temisu/oneKpaq</a><p>[1] <a href="https://encode.su/threads/3387-(Extremely)-tiny-decompressors" rel="nofollow">https://encode.su/threads/3387-(Extremely)-tiny-decompressor...</a>
This is fascinating, I really did not think it was possible to implement even a tiny subset of C in just 512 bytes of x86 code. Using atoi() as a generic hash function is a brilliantly awful hack!
wow, this is impressive.<p>I wrote a similar x86-16 assembler in < 512 B of x86-16 assembly, and this seems much more difficult <<a href="https://github.com/kvakil/0asm/">https://github.com/kvakil/0asm/</a>>. I did find a lot of similar tricks were helpful: using gadgets and hashes. Once trick I don't see in sectorc which shaved quite a bit off of 0asm was self-modifying code, which 0asm uses to "change" to the second-pass of the assembler. (I wrote some other techniques here: <<a href="https://kvakil.me/posts/asmkoan.html" rel="nofollow">https://kvakil.me/posts/asmkoan.html</a>>.)<p>bootOS (<<a href="https://github.com/nanochess/bootOS">https://github.com/nanochess/bootOS</a>>) and other tools by the author are also amazing works of assembly golf.
Pretty nifty, nice work!<p>I'll point out to any passerby that this C doesn't support structs, so it's unlikely you'd actually want to build anything in it.
Amazing!<p>I think this, from the conclusion, is the real takeaway:<p>> Things that seem impossible often aren’t and we should Just Do It anyway<p>I certainly would never have tried to get a C compiler (even a subset) so small since it my instinct would have been that it was not possible.
See the bootstrap project: <a href="https://bootstrappable.org/projects.html" rel="nofollow">https://bootstrappable.org/projects.html</a>
I'm wondering if you can build an actual "Linux from scratch" with this as the lowest level, without the need to use a host system at all.
I started reading the source... and digging for the part that allocates space for variables.... only to realize variable declarations are ignored and unnecessary... wow... what a <i>breathlessly reckless</i> hack! I love it!<p>It's like using an M18A1 Claymore mine and hoping it actually is aimed (and stays aimed) in the right direction.
> I will call it the Barely C Programming Language<p>Or BCPL, for short.<p>> The C programming language was devised in the early 1970s as a system implementation language for the nascent Unix operating system. Derived from the typeless language BCPL, it evolved a type structure; created on a tiny machine as a tool to improve a meager programming environment, it has become one of the dominant languages of today. This paper studies its evolution. [1]<p>[1] <a href="https://www.bell-labs.com/usr/dmr/www/chist.html" rel="nofollow">https://www.bell-labs.com/usr/dmr/www/chist.html</a>
Reminds of the META II Metacompiler <a href="http://hcs64.com/files/pd1-3-schorre.pdf" rel="nofollow">http://hcs64.com/files/pd1-3-schorre.pdf</a>
That is insane, congrats.<p>I would have wished some explanation on where the function calls like vga_init and vga_set_pixel come from, I'm not a graybeard yet.
something like this could be interesting for deep-space applications where you only have a bare metal environment with hardened processor and limited memory & of course ping time of days (to earth).<p>or alternatively for embedding a C compiler inside a LLM to use the LLM as a form of virtual machine.
really interesting write-up. thanks for sharing!<p>do you think there are any lessons that can be applied to a "normal" interpreter/compiler written in standard C? i'm always interested in learning how to reduce the size of my interpreter binaries