Writing a C compiler in 500 lines of Python

510 pointsby vgelover 1 year ago

26 comments

brundolfover 1 year ago

> Instead, we'll be single-pass: code generation happens during parsingIIRC, C was specifically designed to allow single-pass compilation, right? I.e. in many languages you don't know what needs to be output without parsing the full AST, but in C, syntax directly implies semantics. I think I remember hearing this was because early computers couldn't necessarily fit the AST for an entire code file in memory at once

评论 #37384340 未加载

评论 #37385238 未加载

评论 #37385150 未加载

评论 #37439520 未加载

评论 #37384561 未加载

评论 #37394117 未加载

评论 #37384445 未加载

mati365over 1 year ago

I made similar project in TypeScript[1]. Basically multipass compiler that generates x86 assembly, compiles it to binary and runs it. The worst thing were register allocator, designing IR code and assembler.[1] <a href="https://github.com/Mati365/ts-c-compiler">https://github.com/Mati365/ts-c-compiler</a>

评论 #37385157 未加载

评论 #37385501 未加载

Joker_vDover 1 year ago

I am pretty certain the following is a valid "for"-loop translation:<pre><code> block ;; code for "i = 0" loop ;; code for "i < 5" i32.eqz br_if 1 i32.const 1 loop if ;; code for "i = i + 1" br 2 else end ;; code for "j = j * 2 + 1" i32.const 0 end end end </code></pre> It doesn't require cloning the lexer so probably would still fit in 500 lines? But yeah, in normal assembly it's way easier, even in one-pass:<pre><code> ;; code for "i = 0" .loop_test: ;; code for "i < 5" jz .loop_end jmp .loop_body .loop_incr: ;; code for "i = i + 1" jmp .loop_test .loop_body: ;; code for "j = j * 2 + 1" jmp .loop_incr .loop_end: </code></pre> Of course, normally you'd want to re-arrange things like so:<pre><code> ;; code for "i = 0" jmp .loop_test .loop_body: ;; code for "j = j * 2 + 1" .loop_incr: ;; code for "i = i + 1" .loop_test: ;; code for "i < 5" jnz .loop_body .loop_end: </code></pre> I propose the better loop syntax for languages with one-pass implementations, then: "for (i = 0) { j = j * 2 + 1; } (i = i + 1; i < 5);" :)

评论 #37395366 未加载

tptacekover 1 year ago

A time-honored approach!<a href="https://www.blackhat.com/presentations/win-usa-04/bh-win-04-aitel.pdf" rel="nofollow noreferrer">https://www.blackhat.com/presentations/win-usa-04/bh-win-04-...</a>(minus directly emitting opcodes, and fitting into 500 lines, of course.)

ak_111over 1 year ago

Somewhat unrelated question, but I think one of the second most difficult things of learning C for coders who are used to scripting languages is to get your head around how the various scaler data types like short, int, long,... (and the unsigned/hex version of each) are represented and how they relate to each other and how they relate to the platform.I am wondering if this complexity exists due to historical reasons, in other words if you were to invent C today you would just define int as always being 32, long as 64 and provide much more sane and well-defined rules on how the various datatypes relate to each other, without losing anything of what makes C a popular low-level language?

评论 #37389409 未加载

评论 #37390258 未加载

评论 #37394061 未加载

评论 #37389359 未加载

评论 #37389323 未加载

kaycebasquesover 1 year ago

Is there a C compiler written in Python that aims for maximum readability rather than trying to get as much done under X lines of code?

评论 #37385202 未加载

评论 #37385093 未加载

WalterBrightover 1 year ago

This looks a lot like the Tiny Pascal compiler that BYTE published a listing of back in 1978.<a href="http://www.trs-80.org/tiny-pascal/" rel="nofollow noreferrer">http://www.trs-80.org/tiny-pascal/</a>I figured out the basics of how a compiler works by going through it line by line.

评论 #37385328 未加载

评论 #37385334 未加载

评论 #37396654 未加载

评论 #37389190 未加载

marcodiegoover 1 year ago

It is interesting to think that 500 lines of code is something one can write in one or two days. But, writing a C compiler in 500 of comprehensible code (even in python) is challenge in itself that may take months after a few years of solid learning.I wonder if is this a good path to becoming an extremely productive developer. If some one spends time developing projects like this, but for different areas... A kernel, a compressor, renderer, multimedia/network stack, IA/ML... Will that turn a good dev into a 0.1 Bellard?

评论 #37384515 未加载

评论 #37384428 未加载

评论 #37384885 未加载

评论 #37384503 未加载

评论 #37386160 未加载

评论 #37388093 未加载

评论 #37384838 未加载

评论 #37386036 未加载

jll29over 1 year ago

Writing your own compiler- demystifies compilers, interpreters, linkers/loaders and related systems software, which you now understand. This understanding will no doubt one day help in your debugging efforts;- elevates you to become a higher level developer: you are now a tool smith who can make their own language if needed (e.g. to create domain specific languages embedded in larger systems you architect).So congratulations, on top of other forms of abstraction, you have mastered meta-linguistic abstraction (see the latter part of Structure and Interpretation of Computer Programs, preferably the 1st or 2nd ed.).

评论 #37386984 未加载

评论 #37386641 未加载

评论 #37386841 未加载

评论 #37389187 未加载

mananaysiempreover 1 year ago

> [Building parse trees] is really great, good engineering, best practices, recommended by experts, etc. But... it takes too much code, so we can't do it.It takes too much code in Python. (Not a phrase one gets to say often, but it’s generally true for tree processing code.) In, say, SML this sort of thing is wonderfully concise.

meithamover 1 year ago

Actually with SLY (<a href="https://sly.readthedocs.io" rel="nofollow noreferrer">https://sly.readthedocs.io</a>) now dead, what is the recommended Lexer/Parser library in Python?

评论 #37390864 未加载

nn3over 1 year ago

评论 #37385475 未加载

评论 #37385127 未加载

评论 #37385154 未加载

Shocka1over 1 year ago

These kinds of posts are one of the things that keeps me coming back to HN. Right when I start thinking I'm a professional badass for implementing several features with great well tested code in record time, I stumble along posts like this that set me in my place.

rcarmoover 1 year ago

I have to wonder if there's a Scheme to WASM compiler out there someplace right now I haven't found yet.

评论 #37385543 未加载

评论 #37390006 未加载

aldousd666over 1 year ago

This is crazy cool! Esolangs have been a hobby of mine, (more just an interest lately, since I haven't built any in a while,) so this is like a fun code golf game for compilation. Nice work, and even better, nice explanation article!

varispeedover 1 year ago

I wrote a C compiler back in the day as a learning exercise. It was the most fun and rewarding project.

jokoonover 1 year ago

I don't see he use match case... while it's clearly a good use case.

MrYellowPover 1 year ago

I am really confused by what people call compilers nowadays. This is now a compiler that takes input text and generates output text, which then gets read by a compiler that takes input text and generates JIT code for execution.This is more of a transpiler, than an actual compiler.Am I missing something?

评论 #37389713 未加载

teddyhover 1 year ago

For some value of “C”:> Notably, it doesn't support:> structs :-( would be possible with more code, the fundamentals were there, I just couldn't squeeze it in> enums / unions> preprocessor directives (this would probably be 500 lines by itself...)> floating point. would also be possible, the wasm_type stuff is in, again just couldn't squeeze it in> 8 byte types (long/long long or double)> some other small things like pre/post cremements, in-place initialization, etc., which just didn't quite fit any sort of standard library or i/o that isn't returning an integer from main()> casting expressions

评论 #37385246 未加载

评论 #37385126 未加载

评论 #37388479 未加载

评论 #37384461 未加载

fan_of_yoinkedover 1 year ago

I love the graphic - would go see the worlds largest chomsky

评论 #37386043 未加载

moominover 1 year ago

Inevitably we have to ask: and how many lines of C in library functions?

评论 #37389184 未加载

hamilyon2over 1 year ago

So, given the python is an interpreter and very well understood, can we say that we are sure this compiler does not include Thompson virus?

评论 #37388638 未加载

评论 #37389197 未加载

rhabarbaover 1 year ago

Finally, one can have inefficient C.

评论 #37384190 未加载

评论 #37384528 未加载

评论 #37385385 未加载

评论 #37385159 未加载

ForOldHackover 1 year ago

The *point* of a compiler is to compile itself.

评论 #37403752 未加载

评论 #37389183 未加载

Jake_Kover 1 year ago

Interesting stuff

golemarmsover 1 year ago

Cool. Now try writing a Python compiler in 500 lines of C.

评论 #37391331 未加载