TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to compare two packed bitfields without having to unpack each field (2019)

81 pointsby luuabout 1 year ago

9 comments

djmipsabout 1 year ago
The author says who needs SIMD registers! And indeed on early processors like the 68000 you didn&#x27;t have dedicated SIMD registers so in order to do SIMD within a register (SWAR) you had to do some variant on the techniques presented.<p>This was used in a fast 4 sample player where it could do math on four 8 bit wave table indices all in one 32 bit operation on the 68000. I had forgotten the details so this was a nice reminder of how it must have worked. I would otherwise have to disassemble it since source no longer exists. Speed was necessary since it was for the Atari ST and the work was done in an interrupt which was quite frequent and it was important to keep the overhead as minimal as possibe.
评论 #40449706 未加载
anematodeabout 1 year ago
You can get a fast equivalent to his zero-padding method on recent-ish x86 and ARM SVE. The x86 assembly would look like (untested):<p><pre><code> compare: ;; lhs in eax, rhs in ebx mov ecx, 0b11111011111101111 pdep eax, eax, ecx pdep ebx, ebx, ecx sub eax, ebx test eax, 0b10000010000001000 mov eax, 0 setnz al ret</code></pre>
pcwaltonabout 1 year ago
I just benchmarked the three versions (in Rust, with criterion) on my Coffee Lake CPU. The quartiles are:<p>Naive version, best case: [481.40 ps 487.84 ps 494.56 ps]<p>Naive version, mid case: [751.56 ps 758.84 ps 766.17 ps]<p>Naive version, worst case: [953.71 ps 970.95 ps 994.65 ps]<p>Packed bitfield version: [685.98 ps 698.25 ps 711.57 ps]<p>So on average, the naive version and packed bitfield versions are within 10% of one another. Modern CPUs frequently don&#x27;t benefit much from these kinds of tricks anymore.
trompabout 1 year ago
The suggested code tests for bitfield borrows with<p><pre><code> auto c = ((~x &amp; y) | (~(x ^ y) &amp; (x - y)); c &amp;= 0x8410; return c == 0; </code></pre> where the literal bitmap contains the most significant bit of each bitfield.<p>Couldn&#x27;t this be written equivalently as<p><pre><code> auto c = x ^ y ^ (x - y); c &amp;= 0x10820; </code></pre> where the literal bitmap now contains the bits just left of each bitfield?
评论 #40447425 未加载
mannyvabout 1 year ago
Is the vector method really faster than mask&#x2F;subtract&#x2F;compare?<p>They never really say in the article.
评论 #40444210 未加载
评论 #40444798 未加载
评论 #40446379 未加载
MrLeapabout 1 year ago
I saw in a wild bitshift methods text-only-page many years ago that had a method to do an approximate distance between two 2d vectors using only bitwise operators. Anybody know it? I&#x27;ve been unable to find it for about 8~ years
samatmanabout 1 year ago
Something I&#x27;ve been greatly enjoying in using Zig is builtin support for packed bitfields in the form of packed structs.<p>The data structure from the article would be:<p><pre><code> const RGB = packed struct(u16) { b: u5, g: u6, r: u5, } </code></pre> So the test becomes<p><pre><code> const gte: bool = x.r &gt;= y.r and x.g &gt;= y.g and x.b &gt;= y.b; </code></pre> Okay, so this doesn&#x27;t get us the optimal form described in the article.<p>Or does it? Since the compiler knows about packed structs, it could perform the optimization for me.<p>Does it, right this instant? Eh, probably not. But compilers have a tendency to improve, and this is a local pattern which would be fairly easy to recognize. The recognition is the point: it&#x27;s much, much harder to recognize the intent in the initial implementations described in the article, and then replace them with the optimal version. The way I was able to write it in Zig, in addition to being far more expressive of the algorithm&#x27;s intent, conveys to the compiler: compare the values of these bitfields, I don&#x27;t care how. The compiler doesn&#x27;t have to prove that I have no other reason for the other operations, besides making said comparison possible: it can just emit the optimal code.
评论 #40445635 未加载
not2babout 1 year ago
Interesting optimization, particularly for something like an FPGA implementation but, as the article says, also useful for improving an inner loop. Thanks for sharing it.
评论 #40449739 未加载
082349872349872about 1 year ago
compare <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;pdf&#x2F;10.1145&#x2F;800136.804488" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;pdf&#x2F;10.1145&#x2F;800136.804488</a>