Can someone ELI5 what exactly happens when the external method call is "JIT"ed in a language like LuaJIT -- what does that mean?<p>I understand calling dlopen() and dlsym()<p>And I understand this idea of a PLT and it's indirection<p>But this idea of something external to the JIT'ed program being JIT'ed I do not understand.<p>Does it mean it inlined the instructions of the external function into the JIT'ed code?
Note that this article is four years old. I was trying to figure out why the language I look after (Dart) looked so bad, but then I realized that the benchmark is using a completely obsolete (now removed) approach to FFI and running all the code in interpreted mode, rather than compiling it.
Is it possible to use Lua/LuaJIT in the opposite FFI direction (aka instead of invoking FFI functions, providing them)<p>aka, upon DLL invocation (on Win32 they call it DllMain I guess, I've seen it call ctor/dtor on *nix), spawn the runtime, and expose FFI functions?<p><a href="http://www.drewtech.com/support/passthru.html" rel="nofollow">http://www.drewtech.com/support/passthru.html</a><p>This spec is really big on FFI exposing functions. I always find it an edge case when trying to play with certain technologies (like the one in the article).
Previously: <a href="https://news.ycombinator.com/item?id=17171252" rel="nofollow">https://news.ycombinator.com/item?id=17171252</a><p>(Linked from the article)
It’s been a while since I’ve worked with PE format but isn’t this the purpose of a fixup table? The executable loader can patch the machine language to make direct calls? Step further you have LTCG which may copy and paste the actual code and recompile it inline.<p>So this is a Linux or *nix specific quirk, rather than a C quirk. Apologies if my memory isn’t accurate.
For similar reasons, PyPy's Python implementation can outperform C.<p><a href="https://www.pypy.org/posts/2011/02/pypy-faster-than-c-on-carefully-crafted-5614784244310486765.html" rel="nofollow">https://www.pypy.org/posts/2011/02/pypy-faster-than-c-on-car...</a> - JIT'ing across compilation units<p><a href="https://www.pypy.org/posts/2011/08/pypy-is-faster-than-c-again-string-6756589731691762127.html" rel="nofollow">https://www.pypy.org/posts/2011/08/pypy-is-faster-than-c-aga...</a> - JIT'ing % interpolation.<p>(Wow, those are 11 years old. I remember when PyPy was a new project.)
Are direct calls really all that much faster than indirect calls on current x86 archs? I was under the impression that it’s more or less the same on the current generation of CPUs. Those CPUs do a decent job of branch predicting indirect calls, especially in a micro benchmark loop. The BTB generally works well.
> If the JIT code needed to call two different dynamic functions separated by more than 2GB, then it’s not possible for both to be direct.<p>Well, you can do<p><pre><code> MOV rax, 0x1122334455667788
PUSH rax
RET</code></pre>
in this case. Still direct, just a bit slower. Wonder if modern CPUs speculate past this construction.
Considering this is more a ELF/Linux thing than C, it means there is space for performance improvement of ffi/so-heavy processes on Linux. I wonder why nobody cared to improve it.