/* I can't help but remember a joke on the topic. One guy says: "I can operate on big numbers with insane speed!" The other says: "Really? Compute me 97408743 times 738423". The first guy, immediately: "987958749583333". The second guy takes out a calculator, checks the answer, and says: "But it's incorrect!". The first guy objects: "Despite that, it was very fast!" */
I love the last sentence: “…, if you set yourself the goal of crossing an 8-lane freeway blindfolded, it does make sense to focus on doing it as fast as you possibly can.”
Discussed at the time:<p><i>C and C++ prioritize performance over correctness</i> - <a href="https://news.ycombinator.com/item?id=37178009">https://news.ycombinator.com/item?id=37178009</a> - Aug 2023 (543 comments)
As a pure C programmer [1], let me post my full agreement: <a href="https://gavinhoward.com/2023/08/the-scourge-of-00ub/" rel="nofollow">https://gavinhoward.com/2023/08/the-scourge-of-00ub/</a> .<p>[1]: <a href="https://gavinhoward.com/2023/02/why-i-use-c-when-i-believe-in-memory-safety/" rel="nofollow">https://gavinhoward.com/2023/02/why-i-use-c-when-i-believe-i...</a>
This headline is badly misunderstanding things. C/C++ date from an era where "correctness" in the sense the author means wasn't a feasible feature. There weren't enough cycles at build time to do all the checking we demand from modern environments (e.g. building medium-scale Rust apps on a Sparcstation would be literally *weeks* of build time).<p>And more: the problem faced by the ANSI committee wasn't something where they were tempted to "cheat" by defining undefined behavior at all. It's that <i>there was live C code in the world that did this stuff</i>, for real and valid reasons. And they knew if they published a language that wasn't compatible no one would use it. But there were also variant platforms and toolchains that didn't do things the same way. So instead of trying to enumerate them all individually (which probably wasn't possible anyway), they identified the areas where they knew they could define firm semantics and allowed the stuff outside that boundary to be "undefined", so existing environments could continue to implement them compatibly.<p>Is that a good idea for a new language? No. But ANSI wasn't writing a new language. They were adding features to the language in which Unix was already written.
I was disappointed that Russ didn't mention the strongest argument for making arithmetic overflow UB. It's a subtle thing that has to do with sign extension and loops. The best explanation is given by ryg here [1].<p>As a summary: The most common way given in C textbooks to iterate over an array is "for (int i = 0; i < n; i++) { ... array[i] ... }". The problem comes from these three facts: (1) i is a signed integer; (2) i is 32-bit; (3) pointers nowadays are usually 64-bit. That means that a compiler that can't prove that the increment on "i" won't overflow (perhaps because "n" was passed in as a function parameter) has to do a sign extend on every loop iteration, which adds extra instructions in what could be a hot loop, especially since you can't fold a sign extending index into an addressing mode on x86. Since this pattern is so common, compiler developers are loath to change the semantics here--even a 0.1% fleet-wide slowdown has a cost to FAANG measured in the millions.<p>Note that the problem goes away if you use pointer-width indices for arrays, which many other languages do. It also goes away if you use C++ iterators. Sadly, the C-like pattern persists.<p>[1]: <a href="https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7" rel="nofollow">https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759...</a>
There's a way I like to phrase this:<p>In C and C++, it's easy to write incorrect code, and difficult to write correct code.<p>In Rust, it's also difficult to write correct code, but near-impossible to write incorrect code.<p>The new crop of languages that assert the inclusion of useful correctness-assuring features such as iterators, fat-pointer collections, and GC/RC (Go, D, Nim, Crystal, etc.) make incorrect code hard, but correct code easy. And with a minimal performance penalty! In the best-case scenarios (for example, Nim with its RC and no manual heap allocations, which is very easy to achieve since it defaults to hidden unique pointers), we're talking about only paying a 20% penalty for bounds-checking compared to raw C performance. For the ease of development, maintenance, and readability, that's easy to pay.
You can implement C in completely different ways. For example, I like that signed overflow is UB because it is trivial to catch it, while unsigned wraparound - while defined - leads to extremely difficult to find bugs.
The infinite loops example doesn't make sense. If count and count2 are volatile, I don't see how the compiler could legally merge the loops. If they aren't volatile, it can merge the loops because the program can't tell the difference (it doesn't even have to update count or count2 in memory during the loops). Only code executing after the loops could even see the values in those variables.
After perusing the article, I'm thinking that maybe Ferraris should be more like Volvos, because crashing at high speed can be dangerous.<p>But if one doesn't find that exciting, at least they'd better blaze through the critical sections as fast as possible. And double check that O2 is enabled (/LTCG too if on Windows).
If you don't write a specification then any program would suffice.<p>We're at C23 now and I don't think that section has changed? Anyone know why the committee won't revisit it?<p>Is it purely, "pragmatism," or dogma? (Are they even distinguishable in our circles...)
It's not really that they prioritize performance over correctness (your code becomes no more correct if out-of-bounds write was well-defined to reboot the machine...), it's that they give unnecessary latitude to UB instead of constraining the valid behaviors to the minimal set that are plausibly useful for maximizing performance. E.g. it is just complete insanity to allow signed integer overflow to format your drive. Simply reducing it to "produces an undefined result" would seem plenty adequate for performance.
Already all fixed in C++.<p>And I don't know why now everything has to be beginner friendly. Then just use a high level language. C++ is never advertised as a high level language it's a mid-level language.<p>Still with that even C++ has never shut down my computer or bricked my computer even once.<p>All these young people are just too spoiled.
They let the programmer be the ultimate definer of correctness.<p>They don't prioritize performance over correctness, they prioritize programmer control over compiler/runtime control.