Wow, this: "random() was returning values in int range rather than long." is a very nice bug find. Randomness is VERY hard to check for humans. For example, Python's binomial distribution is very bad on some inputs [1], giving widely wrong values, but nobody found it. I bumped into it when I implemented an algorithm to compute the approximate volume of solutions to a DNF, and the results were clearly wrong [2]. The algorithm is explained here by Knuth, in case you are interested [3]<p>[1] <a href="https://www.cs.toronto.edu/~meel/Slides/meel-distform.pdf" rel="nofollow">https://www.cs.toronto.edu/~meel/Slides/meel-distform.pdf</a>
[2] <a href="https://github.com/meelgroup/pepin">https://github.com/meelgroup/pepin</a>
[3] <a href="https://cs.stanford.edu/~knuth/papers/cvm-note.pdf" rel="nofollow">https://cs.stanford.edu/~knuth/papers/cvm-note.pdf</a>
> String to float conversion had a table missing four values. This caused an array access overflow which resulted in imprecise values in some cases.<p>I've once wrote a function to parse the date format from log files that Go doesn't natively support, and forgot to add November. I quit that job in April, so I never saw any issues. However when 1st of November came my ex-colleagues saw no logs for this day, and when they found out the reason they created a hash tag #nolognovember which you can probably find somewhere to this day :)
> the vast bulk of sanitizer complaints came from invoking undefined or implementation-defined behavior in harmless ways<p>This is patently false. <i>Any</i> Undefined Behavior is harmful because it allows the optimizer to insert totally random code, and this is not a purely theoretical behavior, it's been repeatedly demonstrated happening. So even if your UB code isn't called, the simple fact it exists may make some seemingly-unrelated code behave wrongly.
> Passing pointers to the middle of a data structure. For example, free takes a pointer to the start of an allocation. The management structure appears just before that in memory; computing the address of which appears to be undefined behavior to the compiler.<p>To clarify, the undefined behavior here is that the sanitizer sees `free` trying to access memory outside the bounds of what was returned by `malloc`.<p>It's perfectly valid to compute the address of a struct just before memory pointed to by a pointer you have, as long as the result points to valid memory:<p><pre><code> void not_free(void *p) {
struct header *h = (struct header *) (((char *)p) - sizeof(struct header));
// ...
}
</code></pre>
In the case of `free`, that resulting pointer is technically "invalid" because it's outside what was returned by `malloc`, even though the implementation of `malloc` presumably returned a pointer to memory just past the header.
> [...] detect places where the program wanders into parts of the C language specification [...]<p>Small nitpick, the UB sanitizer also has some checks specific for C++
<a href="https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html" rel="nofollow">https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html</a>
And don't forget -fbounds-safety, which is in Apple's clang/llvm and perhaps other versions. <a href="https://clang.llvm.org/docs/BoundsSafety.html" rel="nofollow">https://clang.llvm.org/docs/BoundsSafety.html</a>
That arithmetic shift right implementation is also what I came up with for a video game fantasy architecture that only has logical shift right. (16-bit registers)<p><pre><code> ; asr rd, rs1, rs2 ; rd = signed(rs1) >> rs2
and rt, rs1, 0x8000 ; isolate sign bit
lsr rt, rt, rs2 ; shift sign bit to final position
neg rt, rt ; sign-extended part of final result
lsr rd, rs1, rs2 ; base part of final result
or rd, rd, rt ; combine both parts
</code></pre>
It might be easier to understand broken down this way for anyone who didn't understand the article's one-liner.