TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Branchless Conditionals (2011)

41 pointsby djuliusabout 10 years ago

6 comments

zwiebackabout 10 years ago
It&#x27;s interesting that ARM32 has conditional execution and I like them a lot for writing readable assembly code. Short jumps that result from a simple if can be encoded in three successive instructions, no branches.<p>However, it&#x27;s now falling out of favor (mostly gone from ARM 64) and apparently it&#x27;s due to the relative cost of putting conditional execution on the die vs. relying on smarter compilers.
评论 #9411061 未加载
评论 #9410442 未加载
chipsyabout 10 years ago
One simple branchless optimization form I&#x27;ve used is collision detection across an array of values: instead of testing each one, i add their value to a counter(perhaps with some mapping of data to collision value). After iterating over a lot of them, I can do just one test. This is very cpu-friendly as the pipeline gets to crunch all the numbers in one go.
评论 #9410888 未加载
vardumpabout 10 years ago
One example seems a bit odd.<p><pre><code> if(LocalVariable &amp; 0x00001000) return 1; else return 0; mov eax, [ebp - 10] and eax, 0x00001000 neg eax sbb eax, eax neg eax ret </code></pre> Hmm... wouldn&#x27;t this be faster? Two instructions less:<p><pre><code> mov eax, [ebp - 10] and eax, 0x00001000 shr eax, 12 ret </code></pre> Well, who knows. Didn&#x27;t bother to analyze this case. Maybe the article&#x27;s example is faster somehow?
评论 #9411394 未加载
strictfpabout 10 years ago
Nice article. It inspired me to look around for some more straightforward way of optimizing, and I found the setcc class of instructions: <a href="http:&#x2F;&#x2F;www.nynaeve.net&#x2F;?p=178" rel="nofollow">http:&#x2F;&#x2F;www.nynaeve.net&#x2F;?p=178</a><p>I&#x27;m thinking that this combined with some CAS (CMPXCHG8B) could acheive the same, right?<p>Something like (pseudo):<p>Comparewith(4)<p>Ifequalstore(54)<p>Ifnotequalstore(2)<p>Return
评论 #9411047 未加载
评论 #9410920 未加载
kstenerudabout 10 years ago
I know that gcc and clang both have __builtin_expect(). If you tell the compiler the more likely path, wouldn&#x27;t that make the branching version faster?<p>Actually, I&#x27;ve always wondered how __builtin_expect translates to something the CPU&#x27;s branch prediction engine can use...
评论 #9411331 未加载
评论 #9411414 未加载
评论 #9411312 未加载
mlindnerabout 10 years ago
x86 branch predictors are not 60% correct... Any decent branch predictor is over 90% correct and I believe modern ones are over 96% correct.
评论 #9410492 未加载
评论 #9411456 未加载