Any blacklist-based syscall filtering solution that aims to run untrusted code is bound to be doomed, as the surface of all possible syscalls and ways they can be exploited to bypass some policy is enormous.<p>Poignantly, the naive approach of 'let's just block read(2) to prevent file access' doesn't work - there's multiple ways to bypass simple read(2) filtering like this. The easiest that come to mind are:<p><pre><code> - using readv(2)
- using sendfile(2)
- sym/hardlinks to bypass path checks, and the inherent TOCTOU exploits of further naive checks
</code></pre>
The same applies to any other policy you wish to implement, and for every one of those you need to consider the collection of all Linux syscalls and filter all of the relevant ones. There's around 300 syscalls in Linux as of writing.<p>Not to mention typical newbie mistakes that this project makes: not following forks, not checking for 32-bit syscalls, etc.<p>gVisor [1] does this well - instead of filtering, it reimplements the logic for handling Linux syscalls in userspace (eg., is actually responsible for handing out FDs and other handles, presenting the filesystem to the user, etc).<p>[1] - <a href="https://github.com/google/gvisor" rel="nofollow">https://github.com/google/gvisor</a>