I had to deal with a lot of FFI to enable a Java Constraint Solver (Timefold) to call functions defined in CPython. In my experience, most of the performance problems from FFI come from using proxies to communicate between the host and foreign language.<p>A direct FFI call using JNI or the new foreign interface is fast, and has roughly the same speed as calling a Java method directly. Alas, the CPython and Java garbage collectors do not play nice, and require black magic in order to keep them in sync.<p>On the other hand, using proxies (such as in JPype or GraalPy) cause a significant performance overhead, since the parameters and return values need to be converted, and might cause additional FFI calls (in the other direction). The fun thing is if you pass a CPython object to Java, Java has a proxy to the CPython object. And if you pass that proxy back to CPython, a proxy to that proxy is created instead of unwrapping it. The result: JPype proxies are 1402% slower than calling CPython directly using FFI, and GraalPy proxies are 453% slower than calling CPython directly using FFI.<p>What I ultimately end up doing is translating CPython bytecode into Java bytecode, and generating Java data structures corresponding to the CPython classes used. As a result, I got a 100x speedup compared to using proxies. (Side note: if you are thinking about translating/reading CPython bytecode, don't; it is highly unstable, poorly documented, and its VM has several quirks that make it hard to map directly to other bytecodes).<p>For more details, you can see my blog post on the subject: <a href="https://timefold.ai/blog/java-vs-python-speed" rel="nofollow">https://timefold.ai/blog/java-vs-python-speed</a>