I wrote this just for fun when saw article about sso in Rust[1]. My string can store up to 23 (excluding null-terminator) 8-bit chars without calling allocator.<p>Here I can mistake, but..
Curious fact: both - libstdc++[2] and libc++[3] - do access to union member without any check that it is active now.
AFAIK, this is UB in C++. But I assume that they just rely on theirs compiler features. I tried to avoid this using `std::byte[]`.
But I'm still sure that there are several UB's in my code :)<p>[1] <a href="https://tunglevo.com/note/an-optimization-thats-impossible-in-rust/" rel="nofollow">https://tunglevo.com/note/an-optimization-thats-impossible-i...</a><p>[2] <a href="https://github.com/gcc-mirror/gcc/blob/d09131eea083e80ccad60cc2686c09e9fdae0188/libstdc%2B%2B-v3/include/bits/basic_string.h#L269">https://github.com/gcc-mirror/gcc/blob/d09131eea083e80ccad60...</a><p>[3] <a href="https://github.com/llvm/llvm-project/blob/4468d58080d0502a050b71a33413d5206ad5e8fd/libcxx/include/string#L1880">https://github.com/llvm/llvm-project/blob/4468d58080d0502a05...</a>
> Curious fact: both - libstdc++ and libc++ - do access to union member without any check that it is active now.<p>Accessing a data member that's within the common initial sequence[1] of both union alternatives is perfectly well-defined.[2]<p>However, it's true that in this case (I'm looking at libc++) the member isn't quite the same in both alternatives: In one case it's a `char:1` and in the other case a `size_t:1`. Also, in both cases it's nested inside an anonymous `struct __attribute__((packed))`, which means we're dealing with two different compiler extensions already. (Standard C++ supports anonymous unions,[3] but not anonymous structs.) So yes, pedantically speaking, they're relying on the compiler's behavior.<p>> I tried to avoid this using `std::byte[]`<p>I don't know about Rust, but in C++ you probably wouldn't be able to type-pun `std::byte[]` in all the ways you'd need to during constant evaluation (i.e., at constexpr time). C++20-and-later require `std::string` to be constexpr-friendly. So that's probably relevant to the library vendors' choices here.<p>[1] <a href="https://eel.is/c++draft/class.mem#def:common_initial_sequence" rel="nofollow">https://eel.is/c++draft/class.mem#def:common_initial_sequenc...</a><p>[2] <a href="https://eel.is/c++draft/class.mem#general-28" rel="nofollow">https://eel.is/c++draft/class.mem#general-28</a><p>[3] <a href="https://eel.is/c++draft/class.union.anon" rel="nofollow">https://eel.is/c++draft/class.union.anon</a>
Interesting. The writing is a little unclear, but I enjoyed nonetheless!<p>Here's my user test:<p><a href="https://news.pub/?try=https://www.youtube.com/embed/tQXoCbUhWOQ?si=Zc-Uo4pF6QaX2hyV" rel="nofollow">https://news.pub/?try=https://www.youtube.com/embed/tQXoCbUh...</a>