Undefined behavior in C is a reading error (2021)

25 点作者 nequo9 个月前

12 条评论

layer88 个月前

TFA is misunderstanding. As he cites from the C standard, “Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose.” Since it’s difficult (and even, in the general case of runtime conditions, impossible at compile time) to diagnose, the implementor (compiler writer) has two choices: (a) assume that the undefined behavior doesn’t occur, and implement optimizations under that assumption, or (b) nevertheless implement a defined behavior for it, which in many cases amounts to a pessimization. Given that competition between compilers is driven by benchmarks, guess which option compiler writers are choosing.The discussions in comp.lang.c (a Usenet newsgroup, not a mailing list) were educating C programmers that they can’t rely on (b) in portable C, and moreover, can’t make any assumptions about undefined behavior in portable C, because the C specification (the standard) explicitly refrains from imposing any requirements whatsoever on the C implementation in that case.The additional thing to understand is that compiler writers are not malevolently detecting undefined behavior and then inserting optimizations in that case, but instead that applying optimizations is a process of logical deduction within the compiler, and that it is the lack of assumptions related to undefined behavior being put into the compiler implementation, that is leading to surprising consequences if undefined behavior actually occurs. This is also the reason why undefined behavior can affect code executing prior to the occurrence of the undefined condition, because logical deduction as performed by the compiler is not restricted to the forward direction of control flow (and also because compilers reorder code as a consequence of their analysis).

评论 #41452635 未加载

评论 #41452737 未加载

评论 #41452684 未加载

Ericson23148 个月前

I really don't like reading these dimwitted screeds. We did not get here because layering over a standard or a document --- this is not the US supreme court or similar. We got here because- There are legit issues trying to define everything without loosing portability. This affects C and anything like it.- Compiler writes do want to write optimizations regardless of whether this is C or anything else --- witness that GCC / LLVM will use the same optimizations regardless of the input language / compiler frontend- Almost nobody in this space, neither the cranky programmers against, or the normy compiler writers for, has a good grasp of modern logic and proof theory, which is needed to make this stuff precise.

评论 #41457694 未加载

评论 #41453716 未加载

nickelpro8 个月前

If you want slow code that behaves exactly the way you expect, turn off optimizations. Congrats, you have the loose assembly wrapper language you always wanted C to be.For the rest of us, we're going to keep getting every last drop out of performance that we can wring out of the compiler. I do not want my compiler to produce an "obvious" or "reasonable" interpretation of my code, I want it to produce the fastest possible "as if" behavior for what I described within the bounds of the standard.If I went outside the bounds of the standard, that's my problem, not the compiler's.

评论 #41452281 未加载

评论 #41452487 未加载

brudgers8 个月前

A consensus standard happens by multiple stakeholders sitting down and agreeing on what everyone will do the same way. And agreeing one what they won't all do the same way. The things they agree to doing differently don't become part of the standard.With compilers, different companies usually do things differently. That was the case with C87. The things they talked about but could not or would not agree to do the same way are listed as undefined behaviors. The things everyone agreed to do the same way are the standard.The consensus process reflects stakeholder interests. Stakeholders can afford to rewrite some parts of their compilers to comply with the standards and cannot afford to rewrite other parts to comply with the standards because their customers rely on the existing implementation and/or because of core design decisions.

评论 #41452366 未加载

评论 #41452287 未加载

评论 #41452247 未加载

saghm8 个月前

The crux of this argument seems to be that the author interprets the "range of permissible behavior" they cite as specifications on undefined behavior as not allowing the sort of optimizations that potentially render anything else in the program moot. A large part of this argument depends arguing that the earlier section defining the term undefined behavior has an "obvious" interpretation that's been ignored in favor of a differing one. I don't think their interpretation of the definition of undefined behavior is necessarily the strongest argument against the case they're making though; to me, the second section they quote is if anything even more nebulous.To be overly pedantic (which seems to be the point of this exercise), the section cites a "range" of permissible behavior, not an exhaustive list; it doesn't sound to me like it requires that only those three behaviors are allowed. The potential first behavior it includes is "ignoring the situation completely with unpredictable results", followed by "behaving during translation or program execution in a documented manner characteristic of the environment". I'd argue that the behavior this article complains about is somewhere between "willfully ignoring the situation completely with unpredictable results" to "recognizing the situation with unpredictable results", and it's hard for me read this as being obviously outside the range of permissible behavior. Otherwise, it essentially would mean that it's still totally allowed by the standard to have the exact behavior that the author complains about, but only if it's due to the compiler author being ignorant rather than willful. I think it would be a lot weirder if the intent of the standard was that deviant behavior due to bugs is somehow totally okay but purposely writing the same buggy code is a violation.

评论 #41452557 未加载

mst8 个月前

The practical reality appears to be that compilers use the loose interpretation of UB and that every compiler that works hard to optimise things as much as possible takes advantage of that as much as it can.I am very much sympathetic to the people who really wish that wasn't the case, and I appreciate the logic of arguments like this one that in theory it shouldn't be the case, but in practice, it is the case, and has been for some years now.So it goes.

ajross8 个月前

I think the problem is sort of a permutation of this argument: way way too much attention is being paid to warning about the dangers and inadequacies of the standard's UB corners, and basically none to a good faith effort to clean up the problem.I mean, it wouldn't be that hard in a technical sense to bless a C dialect that did things like guarantee 8-bit bytes, signed char, NULL with a value of numerical zero, etc... The overwhelming majority of these areas are just spots where hardware historically varied (plus a few things that were simple mistakes), and modern hardware doesn't have that kind of diversity.Instead, we're writing, running and trying to understand tools like UBSan, which is IMHO a much, much harder problem.

评论 #41455715 未加载

评论 #41459328 未加载

评论 #41452871 未加载

Animats8 个月前

Well, where have we had trouble in C in the past? Usually, with de-referencing null pointers. The classic is<pre><code> char* p = 0; char c = *p; if (p) { ... } </code></pre> Some compilers will observe that de-referencing p implies that P is non-null. Therefore, the test for (p) is unnecessary and can optimized out. The if-clause is then executed unconditionally, leading to trouble.The program is wrong. On some hardware, you can't de-reference address 0 and the program will abort at "*p". But many machines (i.e. x86) let you de-reference 0 without a trap. This one has caught the Linux kernel devs at least once.From a compiler point of view, inferring that some pointers are valid is useful as an optimization. C lacks a notation for non-null pointers. In theory, C++ references should never be null, but there are some people who think they're cool and force a null into a reference.Rust, of course, has<pre><code> Option<&Foo> </code></pre> with unambiguous semantics. This is often implemented with a zero pointer indicating None, but the user doesn't see that.So, what else? Use after free? In C++, the compiler knows that "delete" should make the memory go away. But that doesn't kill the variable in that scope. It's still possible to reference a gone object. This is common in some old C code, where something is accessed after "free". This is Common Security Weakness #414.[1]Not a problem in Rust, or any GC language.Over-optimization in benchmarks can be amusing.<pre><code> for (i=0; i<100000000; i++) {} </code></pre> will be removed by many compilers today. If the loop body is identical every time, it might only be done once. This is usually not a cause of bad program behavior. The program isn't wrong, just pointless.What else is a legit problem?[1] <a href="https://cwe.mitre.org/data/definitions/416.html" rel="nofollow">https://cwe.mitre.org/data/definitions/416.html</a>

评论 #41461294 未加载

评论 #41454684 未加载

jcranmer8 个月前

So a while back, I did some spelunking into the history of C99 to actually try to put the one-word-change-theory to bed, but I've never gotten around to writing anything that's public on the internet yet. I guess it's time for me to rectify it.Tracking down the history of the changes at that time is a bit difficult, because there's clearly multiple drafts that didn't make it into the WG14 document log (this is the days when the document log was literal physical copies being mailed to people), and the drafts in question are also of a status that makes them not publicly available. Nevertheless, by reading N827 (the editors report for one of the drafts), we do find this quote about the changes made:> Definitions are only allowed to contain actual definitions in the normative text; anything else must be a note or an example. Things that were obviously requirements have been moved elsewhere (generally to Conformance, see above), the examples that used to be at the end of the clause have been distributed to the appropriate definitions, anything else has been made into a note. (Some of the notes appear to be requirements, but I haven't figured out a good place to put them yet.)In other words, the change seems to be have made purely editorially. The original wording was not intended to be read as imposing requirements, and the change therefore made it a note instead of moving it to conformance. This is probably why "permissible" became "possible": the former is awkward word choice for non-normative text.Second, the committee had, before this change, discussed the distinctions between implementation-defined, unspecified, and undefined behavior in a way that makes it clear that the anything-goes interpretation is intentional. Specifically, in N732, one committee introduces unspecified behavior as consisting of four properties: 1) multiple possible behaviors; 2) the choice need not be consistent; 3) the choice need not be documented; and 4) the choice needs to not have long-range impacts. Drop the third option, and you get implementation-defined behavior; drop the fourth option, and you get undefined behavior[1]. This results in a change to the definition of unspecified and implementation-defined behavior, while undefined behavior retains the same. Notice how, given a chance to very explicitly repudiate the notion that undefined behavior has spooky-action-at-a-distance, the committee declined to, and it declined to before the supposed critical change in the standard.Finally, the C committee even by C99 was explicitly endorsing optimizations permitted only by undefined behavior. In N802, a C rationale draft for C99 (that again predates the supposed critical change, which was part of the new draft in N828), there is this quote:> The bitwise logical operators can be arbitrarily regrouped [converting `(a op b) op c` to `a op (b op c)`], since any regrouping gives the same result as if the expression had not been regrouped. This is also true of integer addition and multiplication in implementations with twos-complement arithmetic and silent wraparound on overflow. Indeed, in any implementation, regroupings which do not introduce overflows behave as if no regrouping had occurred. (Results may also differ in such an implementation if the expression as written results in overflows: in such a case the behavior is undefined, so any regrouping couldn’t be any worse.)This is the C committee, in 1998, endorsing an optimization relying on the undefined nature of signed integer overflow. If the C committee is doing that way back then, then there is really no grounds one can stand on to claim that it was somehow an unintended interpretation of the standard.[1] What happens if you want to drop both the third and fourth option is the point of the paper, with the consensus seeming to be "you don't want to do both at the same time."

krackers8 个月前

See also <a href="https://news.ycombinator.com/item?id=41146860">https://news.ycombinator.com/item?id=41146860</a> and <a href="https://gavinhoward.com/2023/08/the-scourge-of-00ub/" rel="nofollow">https://gavinhoward.com/2023/08/the-scourge-of-00ub/</a>

kazinator8 个月前

This drivel is posted in a private blog precisely in order to evade expert arguments.

mianos8 个月前

It's quite ironic that a blog named 'keeping simple' is summarised by GPT as:"In summary, the essay employs a hyperbolic tone to argue that the prevailing interpretation of undefined behavior has severely compromised the utility and stability of C. While it raises valid points about the implications of undefined behavior, the dramatic language and sweeping claims might make the situation appear more catastrophic than is universally agreed upon."I know it's bad form to quote GPT, but I could not say this better.As someone who writes C and C++ every day of the week I feel I just wasted 30 minutes of my life reading it and the arguments.