TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Writing a C compiler in 500 lines of Python

510 点作者 vgel超过 1 年前

26 条评论

brundolf超过 1 年前
&gt; Instead, we&#x27;ll be single-pass: code generation happens during parsing<p>IIRC, C was specifically designed to allow single-pass compilation, right? I.e. in many languages you don&#x27;t know what needs to be output without parsing the full AST, but in C, syntax directly implies semantics. I think I remember hearing this was because early computers couldn&#x27;t necessarily fit the AST for an entire code file in memory at once
评论 #37384340 未加载
评论 #37385238 未加载
评论 #37385150 未加载
评论 #37439520 未加载
评论 #37384561 未加载
评论 #37394117 未加载
评论 #37384445 未加载
mati365超过 1 年前
I made similar project in TypeScript[1]. Basically multipass compiler that generates x86 assembly, compiles it to binary and runs it. The worst thing were register allocator, designing IR code and assembler.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;Mati365&#x2F;ts-c-compiler">https:&#x2F;&#x2F;github.com&#x2F;Mati365&#x2F;ts-c-compiler</a>
评论 #37385157 未加载
评论 #37385501 未加载
Joker_vD超过 1 年前
I am pretty certain the following is a valid &quot;for&quot;-loop translation:<p><pre><code> block ;; code for &quot;i = 0&quot; loop ;; code for &quot;i &lt; 5&quot; i32.eqz br_if 1 i32.const 1 loop if ;; code for &quot;i = i + 1&quot; br 2 else end ;; code for &quot;j = j * 2 + 1&quot; i32.const 0 end end end </code></pre> It doesn&#x27;t require cloning the lexer so probably would still fit in 500 lines? But yeah, in normal assembly it&#x27;s way easier, even in one-pass:<p><pre><code> ;; code for &quot;i = 0&quot; .loop_test: ;; code for &quot;i &lt; 5&quot; jz .loop_end jmp .loop_body .loop_incr: ;; code for &quot;i = i + 1&quot; jmp .loop_test .loop_body: ;; code for &quot;j = j * 2 + 1&quot; jmp .loop_incr .loop_end: </code></pre> Of course, normally you&#x27;d want to re-arrange things like so:<p><pre><code> ;; code for &quot;i = 0&quot; jmp .loop_test .loop_body: ;; code for &quot;j = j * 2 + 1&quot; .loop_incr: ;; code for &quot;i = i + 1&quot; .loop_test: ;; code for &quot;i &lt; 5&quot; jnz .loop_body .loop_end: </code></pre> I propose the better loop syntax for languages with one-pass implementations, then: &quot;for (i = 0) { j = j * 2 + 1; } (i = i + 1; i &lt; 5);&quot; :)
评论 #37395366 未加载
tptacek超过 1 年前
A time-honored approach!<p><a href="https:&#x2F;&#x2F;www.blackhat.com&#x2F;presentations&#x2F;win-usa-04&#x2F;bh-win-04-aitel.pdf" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.blackhat.com&#x2F;presentations&#x2F;win-usa-04&#x2F;bh-win-04-...</a><p>(minus directly emitting opcodes, and fitting into 500 lines, of course.)
ak_111超过 1 年前
Somewhat unrelated question, but I think one of the second most difficult things of learning C for coders who are used to scripting languages is to get your head around how the various scaler data types like short, int, long,... (and the unsigned&#x2F;hex version of each) are represented and how they relate to each other and how they relate to the platform.<p>I am wondering if this complexity exists due to historical reasons, in other words if you were to invent C today you would just define int as always being 32, long as 64 and provide much more sane and well-defined rules on how the various datatypes relate to each other, without losing anything of what makes C a popular low-level language?
评论 #37389409 未加载
评论 #37390258 未加载
评论 #37394061 未加载
评论 #37389359 未加载
评论 #37389323 未加载
kaycebasques超过 1 年前
Is there a C compiler written in Python that aims for maximum readability rather than trying to get as much done under X lines of code?
评论 #37385202 未加载
评论 #37385093 未加载
WalterBright超过 1 年前
This looks a lot like the Tiny Pascal compiler that BYTE published a listing of back in 1978.<p><a href="http:&#x2F;&#x2F;www.trs-80.org&#x2F;tiny-pascal&#x2F;" rel="nofollow noreferrer">http:&#x2F;&#x2F;www.trs-80.org&#x2F;tiny-pascal&#x2F;</a><p>I figured out the basics of how a compiler works by going through it line by line.
评论 #37385328 未加载
评论 #37385334 未加载
评论 #37396654 未加载
评论 #37389190 未加载
marcodiego超过 1 年前
It is interesting to think that 500 lines of code is something one can write in one or two days. But, writing a C compiler in 500 of comprehensible code (even in python) is challenge in itself that may take months after a few years of solid learning.<p>I wonder if is this a good path to becoming an extremely productive developer. If some one spends time developing projects like this, but for different areas... A kernel, a compressor, renderer, multimedia&#x2F;network stack, IA&#x2F;ML... Will that turn a good dev into a 0.1 Bellard?
评论 #37384515 未加载
评论 #37384428 未加载
评论 #37384885 未加载
评论 #37384503 未加载
评论 #37386160 未加载
评论 #37388093 未加载
评论 #37384838 未加载
评论 #37386036 未加载
jll29超过 1 年前
Writing your own compiler<p>- demystifies compilers, interpreters, linkers&#x2F;loaders and related systems software, which you now understand. This understanding will no doubt one day help in your debugging efforts;<p>- elevates you to become a higher level developer: you are now a tool smith who can make their own language if needed (e.g. to create domain specific languages embedded in larger systems you architect).<p>So congratulations, on top of other forms of abstraction, you have mastered meta-linguistic abstraction (see the latter part of Structure and Interpretation of Computer Programs, preferably the 1st or 2nd ed.).
评论 #37386984 未加载
评论 #37386641 未加载
评论 #37386841 未加载
评论 #37389187 未加载
mananaysiempre超过 1 年前
&gt; [Building parse trees] is really great, good engineering, best practices, recommended by experts, etc. But... it takes too much code, so we can&#x27;t do it.<p>It takes too much code <i>in Python</i>. (Not a phrase one gets to say often, but it’s generally true for tree processing code.) In, say, SML this sort of thing is wonderfully concise.
meitham超过 1 年前
Actually with SLY (<a href="https:&#x2F;&#x2F;sly.readthedocs.io" rel="nofollow noreferrer">https:&#x2F;&#x2F;sly.readthedocs.io</a>) now dead, what is the recommended Lexer&#x2F;Parser library in Python?
评论 #37390864 未加载
nn3超过 1 年前
Just for comparison the LOCs for some other small C or C like compilers. It&#x27;s not that far away from Ritchie&#x27;s<p>C4x86 | 0.6K (very close)<p>small C (x86) | 3.1K<p>Ritchie&#x27;s earliest struct compiler | 2.3K<p>v7 Unix C compiler | 10.2K<p>chibicc | 8.4K<p>Biederman&#x27;s romcc | 25.0K
评论 #37385475 未加载
评论 #37385127 未加载
评论 #37385154 未加载
Shocka1超过 1 年前
These kinds of posts are one of the things that keeps me coming back to HN. Right when I start thinking I&#x27;m a professional badass for implementing several features with great well tested code in record time, I stumble along posts like this that set me in my place.
rcarmo超过 1 年前
I have to wonder if there&#x27;s a Scheme to WASM compiler out there someplace right now I haven&#x27;t found yet.
评论 #37385543 未加载
评论 #37390006 未加载
aldousd666超过 1 年前
This is crazy cool! Esolangs have been a hobby of mine, (more just an interest lately, since I haven&#x27;t built any in a while,) so this is like a fun code golf game for compilation. Nice work, and even better, nice explanation article!
varispeed超过 1 年前
I wrote a C compiler back in the day as a learning exercise. It was the most fun and rewarding project.
jokoon超过 1 年前
I don&#x27;t see he use match case... while it&#x27;s clearly a good use case.
MrYellowP超过 1 年前
I am really confused by what people call compilers nowadays. This is now a compiler that takes input text and generates output text, which then gets read by a compiler that takes input text and generates JIT code for execution.<p>This is more of a transpiler, than an actual compiler.<p>Am I missing something?
评论 #37389713 未加载
teddyh超过 1 年前
For some value of “C”:<p>&gt; <i>Notably, it doesn&#x27;t support:</i><p>&gt; <i>structs :-( would be possible with more code, the fundamentals were there, I just couldn&#x27;t squeeze it in</i><p>&gt; <i>enums &#x2F; unions</i><p>&gt; <i>preprocessor directives (this would probably be 500 lines by itself...)</i><p>&gt; <i>floating point. would also be possible, the wasm_type stuff is in, again just couldn&#x27;t squeeze it in</i><p>&gt; <i>8 byte types (long&#x2F;long long or double)</i><p>&gt; <i>some other small things like pre&#x2F;post cremements, in-place initialization, etc., which just didn&#x27;t quite fit any sort of standard library or i&#x2F;o that isn&#x27;t returning an integer from main()</i><p>&gt; <i>casting expressions</i>
评论 #37385246 未加载
评论 #37385126 未加载
评论 #37388479 未加载
评论 #37384461 未加载
fan_of_yoinked超过 1 年前
I love the graphic - would go see the worlds largest chomsky
评论 #37386043 未加载
moomin超过 1 年前
Inevitably we have to ask: and how many lines of C in library functions?
评论 #37389184 未加载
hamilyon2超过 1 年前
So, given the python is an interpreter and very well understood, can we say that we are sure this compiler does not include Thompson virus?
评论 #37388638 未加载
评论 #37389197 未加载
rhabarba超过 1 年前
Finally, one can have inefficient C.
评论 #37384190 未加载
评论 #37384528 未加载
评论 #37385385 未加载
评论 #37385159 未加载
ForOldHack超过 1 年前
The *point* of a compiler is to compile itself.
评论 #37403752 未加载
评论 #37389183 未加载
Jake_K超过 1 年前
Interesting stuff
golemarms超过 1 年前
Cool. Now try writing a Python compiler in 500 lines of C.
评论 #37391331 未加载