LuaJIT's interpreter loop is similar, except that it's written in assembly, and the motivation is that the compiler could not optimize the C version to the best possible extent.<p>Kind of related, this post [1] mentioned some ideas to break it down into smaller modules without losing performance with tail calls.<p>[1] <a href="https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html" rel="nofollow">https://blog.reverberate.org/2021/04/21/musttail-efficient-i...</a>