Haha, I was flabbergasted to see the results of the subprocess approach, incredible. I'm guessing the memory usage being lower for that approach (versus later ones) is because a lot of the heavy lifting is being done in the subprocess which then gets entirely freed once the request is over. Neat.<p>I have a couple of things I'm wondering about though:<p>- Node.js is pretty good at IO-bound workloads, but I wonder if this holds up as well when comparing e.g. Go or PHP. I have run into embarrassing situations where my RiiR adventure ended with less performance against even PHP, which makes some sense: PHP has tons of relatively fast C modules for doing some heavy lifting like image processing, so it's not quite so clear-cut.<p>- The "caveman" approach is a nice one just to show off that it still works, but it obviously has a lot of overhead just because of all of the forking and whatnot. You can do a lot better by not spawning a new process each time. Even a rudimentary approach like having requests and responses stream synchronously and spawning N workers would probably work pretty well. For computationally expensive stuff, this might be a worthwhile approach because it is so relatively simple compared to approaches that reach for native code binding.
Encore.ts is doing something similar for TypeScript backend frameworks, by moving most of the request/response lifecycle into Async Rust: <a href="https://encore.dev/blog/event-loops" rel="nofollow">https://encore.dev/blog/event-loops</a><p>Disclaimer: I'm one of the maintainers
This is a really cool comparison, thank you for sharing!<p>Beyond performance, Rust also brings a high level of portability and these examples show just how versatile a pice of code can be. Even beyond the server, running this on iOS or Android is also straightforward.<p>Rust is definitely a happy path.
In my opinion, the significant drop in memory footprint is truly underrated (13 MB vs 1300 MB). If everybody cared about optimizing for efficiency and performance, the cost of computing wouldn’t be so burdensome.<p>Even self-hosting on an rpi becomes viable.
Pretty sure Tier 4 should be faster than that. I wonder if the CPU was fully utilized on this benchmark. I did some performance work with Axum a while back and was bitten by Nagle algorithm. Setting TCP_NODELAY pushed the benchmark from 90,000 req/s to 700,000 req/s in a VM on my laptop.
While I agree the enhancement is significant, the title of this post makes it seem more like an advertisement for Rust than an optimization article. If you rewrite js code into a native language, be it Rust or C, of course it's gonna be faster and use less resources.
Rust is simply amazing to do web backend development in. It's the biggest secret in the world right now. It's why people are writing so many different web frameworks and utilities - it's popular, practical, and growing fast.<p>Writing Rust for web (Actix, Axum) is no different than writing Go, Jetty, Flask, etc. in terms of developer productivity. It's super easy to write server code in Rust.<p>Unlike writing Python HTTP backends, the Rust code is so much more defect free.<p>I've absorbed 10,000+ qps on a couple of cheap tiny VPS instances. My server bill is practically non-existent and I'm serving up crazy volumes without effort.
Beware the risks of using NIFs with Elixir. They run in the same memory space as the BEAM and can crash not just the process but the entire BEAM. Granted, well-written, safe Rust could lower the chances of this happening, but you need to consider the risk.
Wow, that's an incredible writeup.<p>Super surprised that shelling out was nearly as good any any other method.<p>Why is the average bytes smaller? Shouldn't it be the same size file? And if not, it's a different alorithm so not necessarily better?
Not trying to be snarky, but for this example, if we can compile to wasm, why not have the client compute this locally?<p>This would entail zero network hops, probably 100,000+ QRs per second.<p>IF it is 100,000+ QRs per second, isn't most of the thing we're measuring here dominated by network calls?
I'm curious how many cores the server the tests ran on had, and what the performance would be of handling the requests in native node with worker threads[1]? I suspect there's an aspect of being tied to a single main thread that explains the difference at least between tier 0 and 1.<p>1: <a href="https://nodejs.org/api/worker_threads.html" rel="nofollow">https://nodejs.org/api/worker_threads.html</a>
If you have a Java library, take a look at Chicory: <a href="https://github.com/dylibso/chicory">https://github.com/dylibso/chicory</a><p>It runs on any JVM and has a couple flavors of "ahead-of-time" bytecode compilation.
Shelling out to a CLI is quite an interesting path because often that functionality could be useful handed out as a separate utility to power users or non-automation tasks. Rust makes cross-platform distribution easy.