TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A bug story: data alignment on x86

151 pointsby sconxuover 8 years ago

15 comments

ogoffartover 8 years ago
Again someone who relies on undefined behavior. Casting pointer of wrong alignement is not a platform specific behavior, it&#x27;s an undefined behavior. Relying on it is an error.<p>The author did not know &quot;What Every C Programmer Should Know About Undefined Behavior&quot;: <a href="http:&#x2F;&#x2F;blog.llvm.org&#x2F;2011&#x2F;05&#x2F;what-every-c-programmer-should-know.html" rel="nofollow">http:&#x2F;&#x2F;blog.llvm.org&#x2F;2011&#x2F;05&#x2F;what-every-c-programmer-should-...</a><p>Another good link about that: <a href="http:&#x2F;&#x2F;blog.regehr.org&#x2F;archives&#x2F;213" rel="nofollow">http:&#x2F;&#x2F;blog.regehr.org&#x2F;archives&#x2F;213</a>
评论 #12893977 未加载
评论 #12892624 未加载
qb45over 8 years ago
The correct solution for GCC is specifying 1-byte alignment for this particular array:<p><pre><code> #include &lt;stdlib.h&gt; #include &lt;stdint.h&gt; typedef uint32_t __attribute__((__aligned__(1))) uint32_t_unaligned; uint64_t sum (const uint32_t_unaligned * p, size_t nwords) { uint64_t res = 0; size_t i; for (i = 0; i &lt; nwords; i++) res += p [i]; return res; } </code></pre> Probably works on clang too and IIRC the MS compiler provides similar functionality with different syntax. AFAIK there is no portable solution.<p>And I&#x27;m not sure how exactly this code will fail on architectures which don&#x27;t support unaligned uint32_t.
评论 #12892326 未加载
评论 #12890688 未加载
评论 #12896330 未加载
评论 #12892647 未加载
评论 #12892859 未加载
评论 #12892126 未加载
评论 #12892642 未加载
rwmjover 8 years ago
These SSE instructions that operate only on aligned data are a pain. It&#x27;s not well known that Linux&#x2F;x86 stack frames must always be 16 byte aligned. GCC uses this knowledge to use the SSE aligned instructions when accessing certain fields on the stack.<p>Unfortunately a while back the OCaml compiler generated non-aligned stack frames. Which is no problem for pure OCaml code and even saves a little bit of memory. However if the code called out to C, then <i>sometimes</i> and unpredictably (think different call stacks, ASLR) the C code would crash. That was a horrible bug to track down:<p><a href="https:&#x2F;&#x2F;caml.inria.fr&#x2F;mantis&#x2F;view.php?id=5700#c10779" rel="nofollow">https:&#x2F;&#x2F;caml.inria.fr&#x2F;mantis&#x2F;view.php?id=5700#c10779</a>
评论 #12891271 未加载
评论 #12891903 未加载
评论 #12890450 未加载
评论 #12892950 未加载
pjc50over 8 years ago
Well, that&#x27;s pretty horrendous. Note that the naive code which just casts the input to uint16_t would work fine. I can&#x27;t help but wonder if the solution to this might have been better expressed as naive implementation + platform-specific <i>assembly</i> implementation.<p>After all, if you have to understand the underlying instructions executed in order to fix the problem, why not stop trying to make the compiler emit the &quot;right&quot; instructions and just write them yourself?<p>(Language lawyers: is casting a char* to a uint32_t* actually defined behavior? For unaligned data?)
评论 #12890153 未加载
评论 #12890382 未加载
评论 #12890398 未加载
评论 #12890170 未加载
评论 #12893623 未加载
GlitchMrover 8 years ago
Compiler is allowed to assume alignment of pointers (what are you doing is creating a pointer to a value with invalid alignment, hence undefined behaviour (just creating a pointer is undefined behaviour)). The correct solution would be to read values indirectly. For example, a function like that could be used to replace every access to &quot;q&quot; variable.<p><pre><code> static uint32_t read(const char *p, size_t index) { uint32_t out; memcpy(&amp;out, &amp;p[index * sizeof out], sizeof out); return out; } </code></pre> A compiler can recognize this pattern, and continue to use unaligned accesses that would work.<p>This has a cost of unaligned accesses on non-x86 platforms (a quite big at that), but considering the original code didn&#x27;t work on these at all, it&#x27;s an improvement.
评论 #12894971 未加载
评论 #12891673 未加载
slededitover 8 years ago
At some point you might as well byte the bullet and just write the code in assembly.
评论 #12890941 未加载
ambrop7over 8 years ago
Note that even if you try to manually correct the pointer to work on aligned data (read any initial bytes via char pointer and read the rest via uint32_t pointer), you still generally have undefined behavior: strict-aliasing violation. And the worst thing here is that whether you do have a violation depends on how <i>other</i> code accesses the same data &#x2F; how the object is initially declared. E.g., you&#x27;re fine if the original declaration is char[] or uint32_t[], but not if it&#x27;s uint16_t[]. Because that would entail access to the same data via both uint16_t and uint32_t, a violation of strict-aliasing.<p>Actually two out of three inet checksum implementations in lwIP have this bug [1].<p>And like the problem discovered in the article, this is NOT theoretical. I have personally seen code &quot;miscompiled&quot; due to strict aliasing violations (in that case, packed structures were involved).<p>I think the only way to do this &quot;manual alignment handling&quot; is to use assembly, either by writing the entire thing in assembly, or using inline asm sections for doing the individual 32-bit memory reads&#x2F;writes.<p>Funny story... When I was looking for a fast inet checksum implementation to use for an embedded ARM project, I took the one from RTEMS, which is written in C with much inline asm, and like the lwIP code, it has strict aliasing violations (and also problems compiling correctly with clang). What I did was, compiled it to assembly with gcc once, then included this compiled assembly in the source code. Assuming that this was compiled correctly, I don&#x27;t need to be afraid of future compiler change breaking it.<p>[1] <a href="http:&#x2F;&#x2F;git.savannah.gnu.org&#x2F;cgit&#x2F;lwip.git&#x2F;tree&#x2F;src&#x2F;core&#x2F;inet_chksum.c" rel="nofollow">http:&#x2F;&#x2F;git.savannah.gnu.org&#x2F;cgit&#x2F;lwip.git&#x2F;tree&#x2F;src&#x2F;core&#x2F;inet...</a>
评论 #12894705 未加载
评论 #12894627 未加载
lukegoover 8 years ago
Related Snabb experiments with IP checksum in C with automatic vectorization, C with vector intrinsics, and AVX2 assembler: <a href="https:&#x2F;&#x2F;github.com&#x2F;snabbco&#x2F;snabb&#x2F;pull&#x2F;899" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;snabbco&#x2F;snabb&#x2F;pull&#x2F;899</a>
noviaover 8 years ago
I&#x27;m taking assembly right now, and we&#x27;re working on our first RISC project after spending all semester working with the x86. Why does RISC crash if the bytes are not aligned?
评论 #12892705 未加载
评论 #12892115 未加载
评论 #12892348 未加载
Hello71over 8 years ago
You can probably also just pass -fno-strict-aliasing to gcc.
otover 8 years ago
If you&#x27;re willing to use compiler extensions, you can avoid the memcpy by using packed structs. This can generate better code.<p>Folly has a generic `loadUnaligned()` that uses this trick: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;folly&#x2F;blob&#x2F;5d52fb8c30e567403b8ccb65e5c1a159fb92d707&#x2F;folly&#x2F;Bits.h#L539" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;folly&#x2F;blob&#x2F;5d52fb8c30e567403b8cc...</a>
phkahlerover 8 years ago
What if you put the array in a struct and made a union of both uint32_t and uint8_t? Would the union with the larger size force the compiler to generate a 4-byte aligned array for the bytes?<p>I suggest this because it would be portable without any compiler specific stuff.
评论 #12895246 未加载
koverstreetover 8 years ago
Attribute((aligned)) might be useful here.
ameliusover 8 years ago
TL;DR: Even though most instructions of your processor (x86) allow data to be aligned on any byte, your compiler might not.
pklauslerover 8 years ago
So much HTML to complain about C working the way C is defined rather than the way the OP wants it to work! It&#x27;s not that hard to write a fast ones&#x27;-complement checksum that&#x27;s portable and compliant, but whining&#x27;s always easier than coding.