Note that nnethercote is probably the world's foremost expert on Rust performance (in addition to being the maintainer of Valgrind). See also his long-running blog series where he discusses his efforts to improve the performance of the Rust compiler itself: <a href="https://nnethercote.github.io/2023/03/24/how-to-speed-up-the-rust-compiler-in-march-2023.html" rel="nofollow">https://nnethercote.github.io/2023/03/24/how-to-speed-up-the...</a>
One thing I've been curious about -- but not enough to actually run through the disassembly yet -- is the performance of Rust matching on enums vs C/C++ switch statement -- in particular in e.g. a 'bytecode' interpreter loop.<p>C compilers do a pretty good job of optimizing these into something pretty efficient, and I imagine it helps that the size of the case values is constant, and it can be turned by the compiler into either a jump table lookup or an unrolled jmp depending on size, etc.<p>Rust enums, in contrast, can be variable sized at match time. And the elements can be nested a few levels deep. The obvious way to write an opcode execution loop in Rust -- a pattern match over an enum of opcodes -- I'm curious if this can be made to perform as well as in C/C++.<p>(I've been writing a virtual machine interpreter loop and so have a modest interest in seeing how this kind of thing, though I'm not optimizing for performance yet, so)<p>Don't see any explicit mention of this in this document.
There's a good discussion on minimizing heap allocations: <a href="https://nnethercote.github.io/perf-book/heap-allocations.html" rel="nofollow">https://nnethercote.github.io/perf-book/heap-allocations.htm...</a>