This article provides an excellent overview of the latest in <i>speed of optimizer</i> vs <i>quality of optimization</i>.<p>In particular, copy-and-patch compilation is still the fastest approach because it uses pre-compiled code, though leaves little room for optimization.<p>Cranelift uses e-graphs to represent equivalence on the IR. This allows for more optimizations than the copy-and-patch approach.<p>Of course, the most optimized output is going to come from a more traditional compiler toolchain like LLVM or GCC. But for users who want to get "fast enough" output as quickly as possible, newer compiler techniques provide a promising alternative.
Slightly off-topic, but if you fancy writing compilers in your free time, Cranelift has a great Rust library[0] for doing code generation - it’s a pleasure to use!<p>[0]: <a href="https://docs.rs/cranelift-frontend/0.105.3/cranelift_frontend/index.html" rel="nofollow">https://docs.rs/cranelift-frontend/0.105.3/cranelift_fronten...</a>
I see that there are many comments on full debug builds, but for me the most important difference are incremental build times when making minor changes. In my opinion this is what speeds up the development iterations.<p>Here are my build times when making a trivial change to a print-statment in a root function, comparing nightly dev vs adding cranelift + mold for rust-analyzer[0] (347_290 LoC) and gleam[1] (76_335 LoC):<p><pre><code> $ time cargo build
Compiling rust-analyzer v0.0.0 (/home/user/repos/rust-analyzer/crates/rust-analyzer)
# nightly
Finished `dev` profile [unoptimized] target(s) in 6.60s
cargo build 4.18s user 2.51s system 100% cpu 6.650 total
# cranelift+mold
Finished `dev` profile [unoptimized] target(s) in 2.25s
cargo build 1.77s user 0.36s system 92% cpu 2.305 total
Compiling gleam v1.0.0 (/home/user/repos/gleam/compiler-cli)
# nightly
Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.69s
cargo build --bin gleam 3.02s user 1.74s system 100% cpu 4.743 total
# cranelift+mold
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.99s
cargo build --bin gleam 0.71s user 0.20s system 88% cpu 1.033 total
</code></pre>
For me this is the most important metric and it shows a huge improvement. If I compare it to Go building Terraform[2] (371_594 LoC) it is looking promising. This is a bit unfair since it is the release build for Go and this is really nice in the CI/CD. Love Go compilation times and I thought it would be nice to compare with another language to show the huge improvements that Rust has made.<p><pre><code> $ time go build
go build 3.62s user 0.76s system 171% cpu 2.545 total
</code></pre>
I was looking forward to parallel front-end[3], but I have not seen any improvement for these small changes.<p>[0] <a href="https://github.com/rust-lang/rust-analyzer">https://github.com/rust-lang/rust-analyzer</a><p>[1] <a href="https://github.com/gleam-lang/gleam">https://github.com/gleam-lang/gleam</a><p>[2] <a href="https://github.com/hashicorp/terraform">https://github.com/hashicorp/terraform</a><p>[3] <a href="https://blog.rust-lang.org/2023/11/09/parallel-rustc.html" rel="nofollow">https://blog.rust-lang.org/2023/11/09/parallel-rustc.html</a><p>*edit: code-comments & links + making it easier to see the differences
Tried out the instructions from the article on a tiny Bevy project, and compared it to a "normal" build:<p>> cargo build --release 23.93s user 22.85s system 66% cpu 1:09.88 total<p>> cargo +nightly build -Zcodegen-backend 23.52s user 21.98s system 68% cpu 1:06.86 total<p>Seems just marginally faster than a normal release build. Wonder if there is something particular with Bevy that makes this so? The author of the article mentions 40% difference in build speed, but I'm not seeing anything near that.<p>Edit: just realized I'm caching my release builds with sccache and a local NAS, hence the release builds being as fast as Cranelift+debug builds. Trying it again with just debug builds and without any caching:<p>> cargo +nightly build 1997.35s user 200.38s system 1878% cpu 1:57.02 total<p>> cargo +nightly build -Zcodegen-backend 280.96s user 73.06s system 657% cpu 53.850 total<p>Definitely an improvement once I realized what I did wrong, about half the time spent compiling now :) Neat!
You can use different backends and optimization for different crates. It often makes sense to use optimized LLVM builds for dependencies, and debug LLVM or even Cranelift for your own code.<p>See <a href="https://www.reddit.com/r/rust/comments/1bhpfeb/vastly_improved_recompile_times_in_rust_with/" rel="nofollow">https://www.reddit.com/r/rust/comments/1bhpfeb/vastly_improv...</a>
The Equality Graphs link [0] led me to discover ESC/Java [1] [2]. Has anyone actually tried or had any success with ESC/Java? It's piqued my curiosity to compare with Spot bugs (formerly known as Findbugs).<p>[0] <a href="https://en.wikipedia.org/wiki/E-graph" rel="nofollow">https://en.wikipedia.org/wiki/E-graph</a><p>[1] <a href="https://en.wikipedia.org/wiki/ESC/Java" rel="nofollow">https://en.wikipedia.org/wiki/ESC/Java</a><p>[2] <a href="https://www.kindsoftware.com/products/opensource/escjava2/" rel="nofollow">https://www.kindsoftware.com/products/opensource/escjava2/</a>
Very excited for Cranelift for debug builds to speed up development iteration - in particular for WASM/Frontend Rust where iteration speed is competing with the new era of Rust tooling for JS which lands in the sub 1 second builds sometimes (iteration speed in Frontend is crucial).<p>Sadly, it does not yet support ARM macOS, so us M1-3 users will have to wait a bit :/
Does anyone by chance have benchmarks of runtime (so not the compile time) when using Cranelift? I'm seeing a mention of "twice as slow" in the article, but that's based on data from 2020. Wondering if it has substantially improved since then.
> JIT compilers often use techniques, such as speculative optimizations, that make it difficult to reuse the compiler outside its original context, since they encode so many assumptions about the specific language for which they were designed.<p>> The developers of Cranelift chose to use a more generic architecture, which means that Cranelift is usable outside of the confines of WebAssembly.<p>One would think this has more to do with Wasm being the source language, as it's fairly generic (compared to JS or Python), so there are no specific assumptions to encode.<p>Great article though. It's quite interesting to see E-matching used in compilers, took me down a memory lane (and found myself cited on Wikipedia page for e-graphs).
Is there no native support for M1-M3 Macs currently, and no Windows support either?<p>Unclear what the roadmap is there, as this update from the most active contributor is inconclusive:<p>> Windows support has been omitted for now. And for macOS currently on supports x86_64 as Apple invented their own calling convention for arm64 for which variadic functions can’t easily be implemented as hack. If you are using an M1 processor, you could try installing the x86_64 version of rustc and then using Rosetta 2. Rosetta 2 will hurt performance though, so you will need to try if it is faster than the LLVM backend with arm64 rustc.<p>Source is from Oct 2023 so this could easily be outdated, but I found nothing in the original article: <a href="https://bjorn3.github.io/2023/10/31/progress-report-oct-2023.html" rel="nofollow">https://bjorn3.github.io/2023/10/31/progress-report-oct-2023...</a>
FTA: <i>“Because optimizations run on an E-graph only add information in the form of new annotations, the order of the optimizations does not change the result. As long as the compiler continues running optimizations until they no longer have any new matches (a process known as equality saturation), the E-graph will contain the representation that would have been produced by the optimal ordering of an equivalent sequence of traditional optimization passes […] In practice, Cranelift sets a limit on how many operations are performed on the graph to prevent it from becoming too large.”</i><p>So, in practice, the order of optimizations <i>can</i> change the result? How easy is it to hit that limit?
Very interesting article. I had not heard of equality graphs before. Here's some pretty good background reading on the subject: <a href="https://inst.eecs.berkeley.edu/~cs294-260/sp24/2024-03-04-eqsat-paper" rel="nofollow">https://inst.eecs.berkeley.edu/~cs294-260/sp24/2024-03-04-eq...</a>
It sucks that there is no way to use cranelift from outside of rust to create your own toy language. I would have loved to use cranelift in a toy compiler, but I am not ready to pay the Rust price of complexity.
imo rust debug builds are fast enough, but its nice to see things are going to get even faster! hopefully this will eventually make `rust-analyzer` faster and more efficient.
I feel like im reading advertising blurb reading that article<p>I wish them every success, but i hope for a more balanced overview of pros and cons rather than gushing praise at every step...