This is cute! It’s worth pointing out that strace ships a similar feature (-e fault) which works for any syscall, even if the binary is statically linked. It works using ptrace, which is lower level than LD_PRELOAD. Although -e fault doesn’t support probabilistic failure, it does provide a flexible way to target specific invocations of a syscall. For example, to fail every second fork() call: -e fault=fork,errno=ENOMEM,when=1+2.
It seems like it would be a lot easier to just have students call, e.g., `ta_fork()` rather than `fork()`, and then provide an implementing file to be linked with their program. Then `ta_fork()` allows the TA to trigger errors, either probabilistically or deterministically (say, by setting environment variables).<p>This approach would also give students insight into testing strategies like mocking, plus it would work on more operating systems.<p>[Edit: Not to disparage this project. It seems like it would have lots of uses, and it was probably a lot of fun to develop.]
Nice writeup! One limitation to this approach is that the fault injections happen at the dynamic linkage to libc layer, meaning that an enterprising student who either statically links their binary or invokes syscalls directly will circumvent the interposed functions. But in a teaching setting I could imagine this isn’t a practical concern :-)<p>(I built a similar tool[1] a few years ago, but at the syscall layer to ensure that statically linked binaries could also have faults injected into them reliably. My colleagues used it to find a handful of bugs on prominent Go codebases.)<p>[1]: <a href="https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-without-really-trying/" rel="nofollow">https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootk...</a>
You can use pthread_once() to simplify the initialization part: <a href="https://man.archlinux.org/man/pthread_once.3.en" rel="nofollow">https://man.archlinux.org/man/pthread_once.3.en</a><p>I don't understand the desire not to link to pthread, it's about as ubiquitous as a library can be.<p>I doubt it's really a problem in this application... but naive userspace spinlocks are absolutely horrendous, see NOTES here: <a href="https://man.archlinux.org/man/pthread_spin_init.3.en" rel="nofollow">https://man.archlinux.org/man/pthread_spin_init.3.en</a><p><pre><code> User-space spin locks [...] are, by definition, prone to priority inversion and unbounded spin times. A programmer using spin locks must be exceptionally careful not only in the code, but also in terms of system configuration, thread placement, and priority assignment.</code></pre>
Very cool! I've started a similar project around `LD_PRELOAD` a few months ago to profile the time different programs spend on LibC calls. Provoking failures was the next step :)<p>Logging nicely was also an issue. I decided to avoid linking to any other symbols and implemented it with inline Assembly for x86/64 and aarch64: <a href="https://github.com/ashvardanian/LibSee/blob/fdae92e71c449c9196a7d3b7d547bdbd6417e481/libsee.c#L425-L542">https://github.com/ashvardanian/LibSee/blob/fdae92e71c449c91...</a>
Perl, at build time, get errno numbers of the system in a similar way[0]: preprocess errno.h with `$CC -E` and recursively scan all files in # markers for macro defines.<p>The configure script even checks the existence of several system headers this way, so if your C compiler don't support # markers in -E output, you get missing includes everywhere.<p>[0] <a href="https://github.com/Perl/perl5/blob/blead/ext/Errno/Errno_pm.PL">https://github.com/Perl/perl5/blob/blead/ext/Errno/Errno_pm....</a>
Great article. Although I’ve been a Linux user since the time when stack of Slackware floppies was the prevailing installation media, I just recently learned that libc.so is also an executable.
> and parsing it out of the man pages is not something I’d like to imagine doing reliably. So I must satisfy myself by manually writing these facts down. And this turns out to be the bottleneck of the entire operation.<p>You can probably use an LLM for this.
> (cannot dynamically load position-independent executable)<p>...why though? I mean, it's position-independent, just load and relocate it wherever? Or does "PIE" mean something different in Linux from what it does in Windows?