Upvoted for the signed integer overflow example. I'll admit that I actually don't know the most idiomatic, bulletproof way of testing for signed overflow in C; if you google "how to test signed integer overflow in C", the very first result is essentially equivalent to the buggy example in the blog post ( <a href="https://www.geeksforgeeks.org/check-for-integer-overflow/" rel="nofollow">https://www.geeksforgeeks.org/check-for-integer-overflow/</a> ), and I'm not keen to repeat the legendary case of signed overflow within the PHP interpreter: <a href="https://web.archive.org/web/20120412194929/http://use.perl.org/use.perl.org/_Aristotle/journal/33448.html" rel="nofollow">https://web.archive.org/web/20120412194929/http://use.perl.o...</a>
In Zig we embrace undefined behavior. It's what allows tools to catch mistakes, and it's what arms the optimizer with the assumptions it needs to be effective.<p>For example, not only is it undefined behavior to overflow on signed integer addition, in Zig it's also undefined behavior to overflow on unsigned integer addition. If you want wrapping integer addition, you have to use the wrapping integer addition operator, which is defined to wraparound on overflow.<p>Here's the trick though - Zig catches most kinds of undefined behavior before they have a chance to cause problems in release builds. Some undefined behavior is caught at compile time, and otherwise most undefined behavior is caught at runtime, in debug builds. And finally, if you are not confident in the level of testing your software has undergone, you can make a "release-safe" build, which has optimizations on, but includes undefined behavior checks and will crash (or invoke user-defined panic function) rather than invoke undefined behavior.<p>You can see some examples of this here:
<a href="https://ziglang.org/documentation/master/#Undefined-Behavior" rel="nofollow">https://ziglang.org/documentation/master/#Undefined-Behavior</a>
I agree with everything in the article - the example of non-intuitive effects of the strict aliasing rule, a tricky integer overflow example, and the unpopular plea to switch away from C/C++.<p>When I write C and C++ code, I try to make my logic portable and standards-compliant so that it will work on all platforms. So instead of assuming int is 32 bits, I am only allowed to assume that int is at least 16 bits wide. I assume that sizeof(char) could equal sizeof(int) and both could be 64 bits. I avoid bitwise manipulation on negative numbers, because they're not guaranteed to be two's complement. Keeping all of these pessimistic assumptions in mind while I code is a mental burden that I don't experience in other languages.<p>Regarding integer promotions, here is one tricky situation I reasoned about and asked in <a href="https://stackoverflow.com/questions/39964651/is-masking-before-unsigned-left-shift-in-c-c-too-paranoid" rel="nofollow">https://stackoverflow.com/questions/39964651/is-masking-befo...</a> . Suppose you want to compute:<p><pre><code> uint32_t a = UINT32_C(0xFFFFFFFF);
uint32_t b = a << 31;
b should be 0x80000000
</code></pre>
Looks innocent, eh? Left-shifting an unsigned integer will discard the top bits and never cause undefined behavior. Except, this reasoning can be wrong on some platforms. Suppose:<p><pre><code> typedef unsigned short uint32_t;
typedef int int48_t;
</code></pre>
Now (uint32_t)a → (unsigned short)a → (int)a → (int48_t)a, due to typedefs and integer promotion. But because a is a signed integer, it is undefined behavior to shift 1's into the sign bit. Kaboom.
Undefined behavior gets a bad rap, but it's not <i>always</i> evil. Compilers and executables would be a lot slower if they had to account for these cases.<p>If you're writing serious C, you should be using tools like valgrind on debug-mode executables to make sure you aren't relying on undefined behavior. The tools are there. It's just not something a lot of people do.
Just a reminder that there are ZERO published studies showing that these UB "optimizations" have significant value for any real programs. They impose a bizarre notion of C semantics that is not compatible with the language design. A good critique, for example, of the UB alias behavior can be found in Brian Kernhighans article on Pascal ( <a href="http://www.cs.virginia.edu/~evans/cs655-S00/readings/bwk-on-pascal.html" rel="nofollow">http://www.cs.virginia.edu/~evans/cs655-S00/readings/bwk-on-...</a> ). The Standards authors have made the exact same error but in an ad hoc hacked up manner.<p>Just use the flags that Linux has forced on the compiler developers in order to be able to make use of C or else give up on writing correct code. <a href="http://www.yodaiken.com/2018/11/17/standard-c-is-more-fun-than-ordinary-c/" rel="nofollow">http://www.yodaiken.com/2018/11/17/standard-c-is-more-fun-th...</a>
Is there a compiler flag that can be set to print a warning about all undefined behaviour? I'm no a C dev but this UB business seems like a little cat and mouse game between the developer and the compiler which tries to find "tricks" to avoid doing stuff, which seems backwards.
The penultimate example tripped me up. I was under the impression that arithmetic conversions for binary operators only happened if the types of the operands were different. But reading the standard, and then actually experimenting with clang and __auto_type, does confirm that if the operands can be converted to int or unsigned int, then they will be (and that it will convert to int if int can represent all the values). That's really kind of nasty given the lack of wrapping overflow on signed integers.<p>This actually makes me wonder, if I <i>do</i> want wrapping overflow on signed integers in C, how do I request it? Is there some compiler builtin or stdlib function to say "please add/multiply/whatever these signed integers with overflow"?
More of this: <a href="https://www.youtube.com/watch?v=yG1OZ69H_-o" rel="nofollow">https://www.youtube.com/watch?v=yG1OZ69H_-o</a>
> The C standard specifies that values “cannot” be accessed through pointers that do not match the effective type of the value<p>I think this is used in "type punning" in union structures. This is a related comment by Linus Torvalds on the kernel list:<p><a href="https://lkml.org/lkml/2018/6/5/769" rel="nofollow">https://lkml.org/lkml/2018/6/5/769</a>
> The C standard specifies that values “cannot” be accessed through pointers that do not match the effective type of the value<p>Yet they can be and often are by using the union trick.<p>To decide whether to use the union trick requires a discussion of a program's desired portability which-- while usually desirable-- is a separate issue from undefined behavior.