If I understand the mechanism correctly, you compile the compiler with itself. Then you compile the compiler with a trusted compile and then compile the compiler with itself. You then make sure the two versions are bit identical.<p>I think one big limitation with this is the assumption that the compiler applies optimizations in a deterministic manner. That’s possible today through the control of the RNG seed via a command line parameter, but one could imagine alternate attacks that violate the assumption around a deterministic mapping from input to output of the malicious compiler (eg __TIME__ in the compiler mechanism would completely thwart this approach until the compiler is fixed to be deterministic). For example, you could have the exploit only injected when it detects a specially crafted PGO payload that adjusts the heuristics in a way that triggers specific passes to turn on or not. Now the mechanism would still protect against that I think but only if you had access to the exact PGO profile used. Similarly, I suspect there are non determinism sources to exploit around file iteration ordering. Again, not that it defeats the protection but that it subverts the assumption of the protection mechanism. You could also imagine a compiler that uses a search mechanism to refine and apply optimizations (ie E-Graph compilers output is dependent on the amount of search time allowed for).<p>It would be nice though if compilers adopted this now to harden attacks so that a) you have to do more esoteric attacks b) compiler authors think about sources of non determinism and how to control for them (they already do but this would be another layer).<p>The hardest thing though would be coming up with the trusted compiler. The easiest hack is to have gcc be that for clang and clang be that for gcc to raise the cost of the attack required. Not sure what a better mechanism might look like.
Something I've been wondering about: is it possible to use something like mutation testing / a reduction tool to heuristically "prove" that the binary can be attributes to real lines of code in the input.<p>Solves a different problem to this.