Very cool article. I like learning about these kinds of hacks.<p>Btw, it seems the author is also the author of this cool project:<p><a href="https://github.com/hansihe/Rustler" rel="nofollow">https://github.com/hansihe/Rustler</a><p>It is a library which helps write Rust NIFs for Erlang, I am following that.<p>I see both Erlang/Elixir and Rust as one of the best platforms today, which focus on practical safety and fault tolerance. Combining the two is a great idea in a large system. Erlang's VM is solid and battle tested, at the core of many vital projects and systems. Rust probably brought the most innovative language feature recently -- compile time lifetime and safety checking.
To add to the point of one of the "better solutions": if you've got a native library that does heavy lifting (for example, a game physics engine), and you want to "integrate" it with Erlang, please don't use NIFs. Use ports. Write a small C-process wrapper around your native library and talk to it over streams or sockets or shared memory. Use the OS's pre-emptive scheduler for what it's for.<p>Externalizing your bulky native code into its own port-programs heavily increases your system's robustness. Very few native libraries implement a Rust-like set of safe abstractions and then rely solely on them; most just do stupid native things with pointers <i>et al</i>. So most native libraries likely <i>will</i> abort() at some point or another. When the one you're relying on does, it will of course take down "its own OS process." You don't want that OS process to be <i>your Erlang node</i>. You want that crash to be isolated such that the only state that's destroyed is the corrupted state of the subsystem that caused the crash. And then you want a supervisor running in your Erlang node to see the dead port and restart it, so the system can keep chugging along.<p>Of course, if you want that to be a <i>low-overhead</i> solution, then you should try as much as possible to <i>cache</i> anything you'd have to repeatedly send to the C process, within the C process. Treat your externalized native-library "processor" server like an SQL database server: insert complex state into it, get handles back, then manipulate the state <i>without retrieving it</i> by using high-level queries and commands on those handles. (I say cache, but I don't mean <i>persist</i>. The state your Erlang node hands a port-program should all be <i>derived</i> state that the Erlang node holds the canonical copies of, that you can feed into the port-program again <i>when</i> it crashes. Treat port-program state like state in memcached.)<p>---<p>ETA: I've never seen this design myself in the wild, but I've heard it suggested a couple of times, and it's kind of a cool idea:<p>Instead of a native C port-program, you can get the same set of advantages as the above from making your library a set of "dirty" NIFs running in <i>its own isolated Erlang node</i>. The Erlang runtime itself doesn't have much overhead, so it's pretty cheap to use Erlang as the way to "network-enable" each library you have into its own server process. Then your "business logic" Erlang node can communicate with your native-library-wrapper Erlang node, over the Erlang distribution protocol (which is very cheap if they're on the same machine.) Saves you a bunch of hassle in trying to beat C into a form that does IPC protocol-handling well.
On Linux you can use makecontext to create a new stack and swapcontext to jump to it. That API was once defined by POSIX but deprecated after pthreads was added to the standard. It's still useable on Linux and several other OSs, with the caveat that you probably don't want to load or enable pthreads for your process. (I think you can mix the two in glibc today as glibc references thread-local data structures through a dedicated register, whereas a long time ago it was kept at the base of the current stack and so incompatible with alternate stacks. But other OSs might have problems and things might change in the future wrt glibc, too.)<p>Another solution which is actually (probably?) POSIX compatible is to use sigaltstack to create a new stack, save your current context with setjmp, invoke a signal, call setjmp to save your altstack, then longjmp back to your original position. Now you can jump back and forth at will, everything copacetic. Calling longjmp from a signal handler is perfectly legit and POSIX is careful to preserve that ability. But for obvious reasons you have to be very careful how you accomplish it.<p>Now, once you throw an interval timer into the mix things get tricky. Normally you would need to worry about the signal arriving while the C code is in async-unsafe library routines, but Erlang might be (I don't know for a fact, though) one of the few large projects that only ever uses signal-safe syscalls like mmap, read, write, etc. If it's only ever the user library executing libc code then there shouldn't be a problem.<p>One thing I really wish POSIX (or at least Linux) supported is per-thread signal handlers. With per-thread signal handlers you could bundle this magic into libraries in a clean manner in a multi-threaded process by preserving and restoring the existing signal mask, sigaltstack, and signal handler(s). Currently you can only preserve the first two; the third is process-global. I was working on a project recently where I caught SIGSEGV on a page fault, longjmp'd back, doubled the relevant memory buffer (or aborted if the address wasn't managed), and then restarted the computationally expensive operation. This allowed me to remove all the bounds checking code and resulted in a very significant speedup. But touching global state makes it very messy and not suitable for packaging into library form as these days it's best to assume a multi-threaded process environment.