Their idea to checksum the global variables is really clever. Many benchmarks and testsuites simply rely on verifying program output. They're able to verify a greater surface area of the compiler by ensuring that all the intermediate global values used in their random programs have the same values at the end of the computation.<p>Its not in the slides, but during the Q&A they revealed that far fewer bugs were found in gcc than in LLVM. With compilers, as with most software, battle-tested dinosaurs win the day when it comes to code quality.<p>Its also interesting to note that the greatest number of bugs were found in the InstCombine pass, which has been completely refactored. In LLVM2.6 it was one monolithic source code file (13000 lines) with a zillion different peephole optimizations. Now its broken up into 15 files.
I first ran into the tool used in the article when I was working in the compiler team of a company that produced DSPs. I figured I'd fire up a set of 100 randomly generated tests to see how we coped.<p>I was astonished, these few tests yielded something like 5 serious bugs (crashes, bad optimisations) in the development branch. That was only when built -O (full optimisation, rarely reveals bugs) on a single architecture, if I'd spent a bit more time I reckon I could've uncovered a few more just be adding or changing the build switches alone for the same tests.<p>Unfortunately I wasn't able to do this, as I was let go shortly afterwards and wasn't able to convince anyone that we should include this in regular testing before I left. I wasn't even able to submit bug reports to John Regehr and his team at Utah University - who were curious about what kinds of bugs their tool was uncovering - even though I promised I would.
<i>Found deep optimization bugs unlikely to be uncovered by other means</i><p>Very cool.<p>The slides don't make it clear whether the various bugs discovered ended up being covered by unit tests in the regular LLVM suite.<p>Does anybody remember in the mid 90s there was a 'crashme' program that could be used to fuzz test the Linux kernel? I recall looking for it again about 5 years ago and couldn't find references to it. Did that technique fall out of use?
LLVM is pretty cool. I've started using it in development simply because the compile times are much faster than GCC and the clang error messages actually make sense.<p>I also wish that this testing methodology would be adopted by other projects, makes QA much easier.
If you prefer the Google viewer (seems more efficient and pleasant to use than scribd):<p><a href="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.llvm.org%2Fdevmtg%2F2010-11%2FYang-HardenLLVM.pdf" rel="nofollow">http://docs.google.com/viewer?url=http%3A%2F%2Fwww.llvm.org%...</a>