UndefinedBehaviorSanitizer's Unexpected Behavior

48 点作者 ekimekim7 个月前

10 条评论

rom1v7 个月前

> This construct works perfectly fine in CIntuitively, I would say that this is actually undefined behavior (it would probably be difficult to expose a wrong behavior in practice though).In C specs, I found 6.5.2.2, paragraph 9:> If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined.We might discuss whether<pre><code> void (*)(char *) </code></pre> is "compatible" with<pre><code> void (*)(void *) </code></pre> but I think it isn't, since:<pre><code> void target(void *ptr) {} void (*name)(char *ptr) = target; </code></pre> fails to compile with the error message:<pre><code> initialization of ‘void (*)(void *)’ from incompatible pointer type ‘void (*)(char *)’ </code></pre> The compiler explicitly says "incompatible pointer type".Same for:<pre><code> void target(char *ptr) {} void (*name)(void *ptr) = target;</code></pre>

评论 #41868295 未加载

评论 #41867841 未加载

评论 #41868271 未加载

评论 #41869890 未加载

评论 #41868221 未加载

quelsolaar7 个月前

This is one of the very rare cases in C where something is technically Undefined Behaviour, but in practice works and is recommended.The typedef struct trick, is very common idiom that creates _more_ safety, not less. All reasonable compilers should (and do) support this. It is sad that the ISO standard is not in line with reality at all times. (I say that as a member of the wg14 and the UB study group)I Recommend Daniel keep his typedef struct definition, and then have an ifdef to revert to the void definition for when Clang does its UB sanitizer. While checking for prototype discrepancies is a very good thing to automate, Clang should add an exception for this.

评论 #41870122 未加载

评论 #41868464 未加载

评论 #41868345 未加载

simonask7 个月前

Awesome writeup. Always interesting to read what Daniel has to say.I think the fact that it turned out that he was wrong (and UBsan was right, as usual) is a great testament to the shortcomings of C.Lots of people - both inexperienced and very experienced - celebrate it for being "simple" and "close to the hardware", but the truth of the matter is that it is precisely not close enough to the hardware for people who _know_ what the hardware is doing to be able to do what they expect, and it's too close to the hardware to be able to be able to ignore it.Lots of experienced C programmers (and - guilt by association - C++ programmers as well) run into UB because they have clear expectations of the compiler. I.e., they know what the compiler should generate, more or less, and C is just a convenient notation. But compilers don't live up to those expectations, because they don't actually compile your code for the hardware. They compile it to the virtual machine abstraction defined by the standard, which very often works differently from any real architecture, and then translate that into machine code. Even though there is basically a single set of semantics that every single "relevant" (mainstream) architecture implements. This is a holdover from when C had to target architectures that are 100% irrelevant today.Everybody's favorite example is signed integer overflow. In both x86-64 and ARM64, that just works - two's complement is the only relevant implementation, so there's no issue. But `int` in C and C++ is not that.Almost every single common UB pitfall has reasonable behavior at the assembler level for every mainstream architecture, and almost every single niche architecture.C gives you the illusion of being close to the hardware, but in actual reality the hardware is several steps removed, so if you want to leverage your knowledge of the hardware, calling conventions, assembly, or other low-level details, you have to go out of your way to work around the C standard.(Aside: We need new languages to tackle this, and I coincidentally happen to like Rust. Lots of people coming from C or C++ are irritated and frustrated by Rust, but 99% of the time it's because Rust gives you a compile error where C would give you UB. This is one example of that out of thousands.)

评论 #41868826 未加载

评论 #41868085 未加载

评论 #41867958 未加载

评论 #41872465 未加载

评论 #41867912 未加载

评论 #41868387 未加载

评论 #41868440 未加载

Someone7 个月前

I would think the proper way to do this would be<pre><code> #if defined(BUILDING_LIBCURL) struct Curl_easy { … } #else struct Curl_easy; #endif typedef struct Curl_easy CURL; </code></pre> If BUILDING_LIBCURL isn’t defined that tells code “CURL is identical to a struct named Curl_easy”. If it is defined, that also tells code what fields it has.

gpderetta7 个月前

Interesting problem. The typical solution in C++ to deal with type erasing function pointer types is to go through a trampoline function:<pre><code> struct X {}; void use_x(X*); using F = void(void*); void bar(F* fn, void* y) { fn(y); } template<class T, auto fn> void trampoline(void*arg) { return fn(reinterpret_cast<T*>(arg)); } X x; bar(&trampoline<X, use_x>, &x); </code></pre> In plain C there is no way to generate the trampoline at the point of use in the same way template instantiation works, but it can be generated by a macro at global scope.

badmintonbaseba7 个月前

Yes, it's very much undefined behavior. As I recall, GTK's glib does this all over the place for signals.edit: I'm not advocating that this being UB is fine. I don't expect compilers to exploit this for optimization, because so many projects rely on this working. There might be room to extend compatible function types to make this defined.

account427 个月前

> In 2016 I wanted to change the type universally to just typedef struct Curl_easy CURL; … as I thought we could do that without breaking neither API nor ABI.This seems to be the obvious solution and how most libraries define their opaque handles. It doesn't break a guaranteed API and it doesn't break the ABI any more than than only using it when building the library - and you can check that it doesn't break the ABI on any platform where you want to guarantee ABI stability.

评论 #41868004 未加载

olliej7 个月前

Ok, this is UB, calling a function pointer through a different type than its definition is a pretty clear example of UB. The problem here is that there's a confusion between "void * and char * are implicitly convertible in C" and "void * and char * are the same". The latter is true for many platforms (especially older ones) but not all (I think there were platforms where functionally they had `typedef char void`). There's a side note of conflating "this has defined behavior on my platform that is stable and works for me" and "it's not UB if it's stable and works on a platform", just like integer overflow is UB despite being entirely defined behavior on every platform under the sun.Anyway, if folk are curious there are many platforms where not only can the representation of `void()(void)` and `void()(char)` be different - even if pointing to the same function - but the representation of even just the data pointers void* and char* may not be the same, again while pointing to the same memory.For example, on platforms with pointer authentication function pointers are generally (I would say "always" but in principle it can be avoided) signed, and in some configuration the type of the function is incorporated into the function. Calling the function pointer requires authenticating the pointer, and authenticating the pointer requires that the call site agrees on the type of the pointer because otherwise the signature fails.Absent actual pointer auth hardware you could imagine someone implementing this as some kind of monstrosity like this (very hypothetical, unpleasant, and footgun heavy) horror:<pre><code> #define SIGN_FPTR(fptr) (typeof(fptr))(((uintptr_t)fptr)|(MAGIC_HASH(stringify(typeof(fptr)) << some_number_of_bits)) #define AUTH_FPTR(fptr) (typeof(fptr))(((uintptr_t)fptr)^(MAGIC_HASH(stringify(typeof(fptr)) << some_number_of_bits)) </code></pre> and you can immediately see that if you had code that used these but disagreed on the type of the function you'd have a bad time. With compiler+hardware pointer auth this is just handled transparently.In principle a pointer auth environment could apply this type discrimination logic to data pointers as well, but I'm unaware of any that do so implicitly. But if a platform did do so, then the incorrect type of the parameter would mean you would fail inside the function, if you were able to call it (say if you weren't using a function pointer, but had mistyped the prototype).Similarly, in other environments, the pointer may be directly aware the type being referenced, and I believe that CHERI supports this, in which case even if you could call the function pointer when attempting to read the pointer I believe it would fail.Having got here, you might be saying "but hang on C says void* and char* are the same", and we go all the way back to my first sentence where I said "are implicitly convertible" :DOn plenty of systems casting from one pointer type to another is an entirely source level feature and once lowered is completely invisible. But in the environments we're discussing<pre><code> (Type1*)pointerToType2 </code></pre> Under pointer auth it requires re-signing the pointer (you have to auth the original value to verify it, and then sign the result according to the new schema), and under CHERI I believe there are instructions for controlling how a pointer is tagged.But the important thing is the C does not say they are the same thing, just that the conversion is automatic, just like numbers and bools, or numbers and bools in JS, or numbers and strings in JS, or objects and strings in JS, or nothing and strings in JS, or .... :D

评论 #41872399 未加载

pistoleer7 个月前

Man rants about not expecting weird type system abuse that works on his machine to be undefined behavior

评论 #41867851 未加载

评论 #41867914 未加载

评论 #41868338 未加载

评论 #41869660 未加载

baq7 个月前

They say Rust is much harder than C for... disallowing these kinds of things?

评论 #41867920 未加载