>I'm calling it Tilde (or TB for tilde backend) and the reasons are pretty simple, i believe it's far too slow at compiling and far too big to be fixed from the inside. It's been 20 years and cruft has built up, time for a "redo".<p>That put a smile on my face because I remember that was how LLVM was born out of frustration with GCC.<p>I dont know how the modern GCC and LLVM compares, I remember LLVM was fast but resulting binary were not as optimised, once those optimisation added it became slower. While LLVM was a wake up call to modernise GCC and make it faster. In the end competition made <i>both</i> a lot better.<p>I believe some industry ( Gaming ) used to swear by VS Studio / MS Compiler / Intel Compiler or languages that depends / prefer the Borland ( What ever they are called now ) compiler. Its been very long since I last looked I am wondering if those two are still used or have we all merged mostly into LLVM / GCC?
Chris Lattner seems to have also created an alternative for LLVM - <a href="https://mlir.llvm.org/" rel="nofollow">https://mlir.llvm.org/</a><p>Because of how the architecture works, LLVM is one of the backends, but it doesn't have to be. Very interesting project, you could do a lot more IR processing before descending to LLVM (if you use that), that way you could give LLVM a lot less to do.<p>Chris has said LLVM is fast at what it is designed to do - lower IR to machine code. However, because of how convoluted it can get, and the difficulty involved in getting information from some language-specific MIR to LLVM, languages are forced to generate tons upon tons of IR so as to capture every possible detail. Then LLVM is asked to clean up and optimize this IR.<p>One thing to look out for is the problem of either losing language-specific information when moving from MIR to Low-level IR (be it Tilde or LLVM) or generating too much information, most of it useless.
I saw Yasser present this at Handmade Seattle in 2023.[0] He explained that when he started working on Tilde, he didn't have any special knowledge or interest in compilers. But he was reading discussions in the Handmade forums, and one of the most popular requests was for an alternative to LLVM, so he thought, "Sure, I'll do that."<p>[0] <a href="https://handmadecities.com/media/seattle-2023/tb/" rel="nofollow">https://handmadecities.com/media/seattle-2023/tb/</a>
Cool. The author has set himself a huge task if he wants to build something like LLVM. An alternative would be to participate in a project with similar goals that is already quite progressed, such as QBE or Eigen (<a href="https://github.com/EigenCompilerSuite/">https://github.com/EigenCompilerSuite/</a>) ; both so far lack of optimizers. I consider Eigen very attractive because it supports much more targets and includes assemblers and linkers for all targets. I see the advantage in having a C implementation; Eigen is unfortunately developed in C++17, but I managed to backport the parts I'm using to a moderate C++11 subset (<a href="https://github.com/rochus-keller/Eigen">https://github.com/rochus-keller/Eigen</a>). There are different front-ends available, two C compilers among them. And - as mentioned - an optimizer would be great.<p>EDIT: just found this podcast where the author gives more informations about the project goals and history (at least the beginning of the podcast is interesting): <a href="https://www.youtube.com/watch?v=f2khyLEc-Hw" rel="nofollow">https://www.youtube.com/watch?v=f2khyLEc-Hw</a>
Looking at the commit history inspires some real confidence!<p><a href="https://github.com/RealNeGate/Cuik/commits/master/">https://github.com/RealNeGate/Cuik/commits/master/</a>
I thought the sea-of-nodes choice was interesting.<p>V8 has been moving away from sea-of-nodes. Here's a video where Ben Titzer is talking about V8's reasons for moving away from sea-of-nodes: <a href="https://www.youtube.com/watch?v=Vu372dnk2Ak&t=184s" rel="nofollow">https://www.youtube.com/watch?v=Vu372dnk2Ak&t=184s</a>. Yasser, the author of Tilde, is
is also in the video.
> a decent linear scan allocator which will eventually be replaced with graph coloring for optimized builds.<p>Before setting out to implement 1980s-style graph coloring, I would suggest considering SSA-based register allocation instead: <a href="https://compilers.cs.uni-saarland.de/projects/ssara/" rel="nofollow">https://compilers.cs.uni-saarland.de/projects/ssara/</a> , I find the slides at <a href="https://compilers.cs.uni-saarland.de/projects/ssara/hack_ssara_ssa09.pdf" rel="nofollow">https://compilers.cs.uni-saarland.de/projects/ssara/hack_ssa...</a> especially useful.<p>Graph coloring is a nice model for the register <i>assignment</i> problem. But that's a relatively easy part of overall register allocation. If your coloring fails, you need to decide what to spill and how. Graph coloring does not help you with this, you will end up having to iterate coloring and spilling until convergence, and you may spill too much as a result.<p>But if your program is in SSA, the special properties of SSA can be used to properly separate these subphases, do a single spilling pass first (still not easy!) and then do a coloring that is guaranteed to succeed.<p>I haven't looked at LLVM in a while, but 10-15 years ago it used to transform out of SSA form just before register allocation. If I had to guess, I would guess it still does so. <i>Not</i> destroying SSA too early could actually be a significant differentiator to LLVM's "cruft".
> I believe it's (LLVM) far too slow at compiling and far too big to be fixed from the inside<p>What are you doing to make sure Tilde does not end up like this?
<p><pre><code> > It's been 20 years and cruft has built up, time for a "redo".
</code></pre>
Ah.. is this one of those "I rewrote it and it's better" things, but when people inevitably discover issues that "cruft" was handling the author will blame the user?
Tsoding explored this project on a recent stream: <a href="https://youtu.be/aKk_r9ZwXQw?si=dvZAZkOX3xd7yjTw" rel="nofollow">https://youtu.be/aKk_r9ZwXQw?si=dvZAZkOX3xd7yjTw</a>
If you're going to rewrite LLVM, you should avoid just trying to 'do it again but less bloated', because that'll end up where LLVM is now once you've added enough features and optimisation to be competitive.<p>Rewriting LLVM gives you the opportunity to rethink some of its main problems. Of those I think two big ones include Tablegen and peephole optimisations.<p>The backend code for LLVM is awful, and tablegen only partially addresses the problem. Most LLVM code for defining instruction opcodes amounts to multiple huge switch statements that stuff every opcode into them, its disgusting. This code is begging for a more elegant solution, I think a functional approach would solve a lot of the problems.<p>The peephole optimisation in the InstCombime pass is a huge collection of handwritten rules that's been accumulated over time. You probably don't want to try and redo this yourself but it will also be a big barrier to achieving competitive optimisation. You could try and solve the problem by using a superoprimisation approach from the beginning. Look into the Souper paper which automatically generates peepholes for LLVM: (<a href="https://github.com/google/souper">https://github.com/google/souper</a>, <a href="https://arxiv.org/pdf/1711.04422.pdf" rel="nofollow">https://arxiv.org/pdf/1711.04422.pdf</a>).<p>Lastly as I hate C++ I have to throw in an obligatory suggestion to rewrite using Rust :p
I'm not familiar with a lot of the acronyms and catch-phrases already in the first part of the article... let me try to make a bit of sense of this:<p><pre><code> IR = Intermediate Representation
SSA = Single Static Assignment
CFG = Control-Flow Graph (not Context-Free Grammar)
</code></pre>
And "sea of nodes" is this: <a href="https://en.wikipedia.org/wiki/Sea_of_nodes" rel="nofollow">https://en.wikipedia.org/wiki/Sea_of_nodes</a> ... IIANM, that means that instead of assuming a global sequence of all program (SSA) instructions, which respects the dependecies - you only have a graph with the partial order defined by the dependencies, i.e. individual instructions are nodes that "float" in the sea.
I appreciate several things about this compiler already:<p>MIT license (the king of licenses, IMHO. That's not an objective statement though)<p>Written in easy to understand C.<p>No python required to build it. (GCC requires perl, and I think that Perl is way easier to bootstrap then Python for LLVM)<p>No Apple. I don't know if you all have seen some of the Apple developers talking with people but some of them are extremely condescending and demeaning towards people of non-formal CS backgrounds. I get it, in a world where you are one of the supreme developers it's easy to be that way but it's also just as bad as having people like Theo or historical Torvalds.
I’m definitely happy to see this happening. But I would like to point out two ingredients that constitute LLVM’s success beyond academic merits: License and modularity. I’m not a lawyer so can’t say much about the first one, all I can say is that I believe license is one of the main reasons Apple switched to LLVM decades ago. Modularity, on the other hand, is one of the most crucial features of LLVM and something GCC struggles to catch up even nowadays. I really hope Tilde can adopt the modularity philosophy, provide building blocks rather than just tools
Shameless plug for another similar system: <a href="http://cwerg.org" rel="nofollow">http://cwerg.org</a>
Less ambitious with a focus on (measurable) simplicity.
This looks pretty cool. I've been looking at all the "small" backends recently. It's so much nicer to work with one of them than trying to wrangle LLVM.<p>QBE, MIR, & IR (php's) are all worth a look too.<p>Personally I've settled on IR for now because it seemed to match my needs the most closely. It's actively developed, has aarch64 in addition to x64 (looks like TB has just started that?), does x64 Windows ABI, and seems to generate decent code quickly.
Again, somebody who comes to the realization something is seriously wrong with ultra-complex languages in the SDK (c++ and similar).<p>In other words, since this alternative LLVM is coded in plain and simple C, it is shielded against those who are still not seeing that computer languages with an ultra complex syntax are not the right way to go if if want sane software.<p>You also have QBE, which with cproc will give you ~70% of latest gcc speed (in my benchmarks on AMD zen2 x86_64).
I dunno if "twice as fast as Clang" is very impressive. How fast is it compared to Clang 1.0?<p>Also starting a new project like this in C is an interesting choice.
I not understand about IR or compiler backend but I know another LLVM alternative like QBE. <a href="https://c9x.me/compile/" rel="nofollow">https://c9x.me/compile/</a>
I'm confused, is this some kind of re-post? I saw this exact same post with the exact same comments on it awhile ago. Could have been more than a couple of weeks. Very strange.
Good stuff. Hope they succeed. LLVM support for jit:ed and gc:ed languages is pretty weak so a competitor that addresses those and other shortcomings would be welcome.
The maintainer said that LLVM has 10M lines of code making it too hard to improve, so he's building its own.
That sounds weird to me: but good luck I guess?
You want to create LLVM alternative and you write it in C?<p>I'm saying this as someone who uses LLVM daily and wishes that it was written in anything else than C/CPP,<p>those languages bring so many cons that it is unreal.<p>Slow compilation, mediocre tooling (cmake), terrible error messages, etc, etc.<p>What's the point of starting with tech debt?