TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Beating the L1 cache with value speculation

157 pointsby rostayobalmost 4 years ago

14 comments

drewg123almost 4 years ago
How realistic is it to have a linked list in contiguous memory that stays nicely ordered? In my experience, lists are normally cobbled together with nodes that are allocated widely separated in time and hence I have no knowledge that I can use to predict the address of the next pointer.<p>If you&#x27;re in a scenario where this optimization works, it seems like it would be better to just use an array.
评论 #27931983 未加载
评论 #27932112 未加载
评论 #27932515 未加载
评论 #27931882 未加载
评论 #27937378 未加载
kardosalmost 4 years ago
This works because we&#x27;ve arranged for the nodes to be sequential in memory, if we have the luxury of doing that then we could use an array instead of a linked list. Is there a scenario where this value speculation trick would work, where we &#x2F;don&#x27;t&#x2F; have this luxury?
评论 #27934342 未加载
评论 #27936192 未加载
bob1029almost 4 years ago
A quick googling brought me to the following paper:<p>&quot;Data value speculation in superscalar processors&quot;<p><a href="https:&#x2F;&#x2F;www.sciencedirect.com&#x2F;science&#x2F;article&#x2F;abs&#x2F;pii&#x2F;S0141933198000866" rel="nofollow">https:&#x2F;&#x2F;www.sciencedirect.com&#x2F;science&#x2F;article&#x2F;abs&#x2F;pii&#x2F;S01419...</a><p>It would appear there is an entire section dedicated to &quot;Value prediction in a realistic scenarios&quot;, but I am not curious enough yet to pay money to access this document.<p>Edit: Found another that is accessible.<p>&quot;Practical Data Value Speculation for Future High-end Processors&quot;<p><a href="http:&#x2F;&#x2F;class.ece.iastate.edu&#x2F;tyagi&#x2F;cpre581&#x2F;papers&#x2F;HPCA14DataSpeculation.pdf" rel="nofollow">http:&#x2F;&#x2F;class.ece.iastate.edu&#x2F;tyagi&#x2F;cpre581&#x2F;papers&#x2F;HPCA14Data...</a>
评论 #27939857 未加载
solidanglealmost 4 years ago
``` if (node != next) { node = node-&gt;next; } ``` How does this work? Shouldn&#x27;t this be ``` if (node != next) { node = next; } ```?
评论 #27934196 未加载
评论 #27933642 未加载
评论 #27931789 未加载
codetrotteralmost 4 years ago
How do they get the cycles&#x2F;elem and instrs&#x2F;cycle counts? Is it via the perf utility? Or something they calculate? Or they get it some other way?
评论 #27931905 未加载
评论 #27931404 未加载
评论 #27931959 未加载
swileyalmost 4 years ago
This seems like it would cause a lot of those &quot;I just wish the damn machine did what I told it to: situations.
评论 #27936295 未加载
评论 #27938053 未加载
andy_pppalmost 4 years ago
I like it as an exercise for understanding the CPU but is this stuff really useful in the real world - the example given is &quot;optimised&quot; away by GCC&#x2F;clang without hacks. My guess is that if this is a good way to iterate over linked lists compilers would have optimised for this by now? Either through cleverness or hints added to your code.
评论 #27931525 未加载
评论 #27938099 未加载
评论 #27931290 未加载
评论 #27933365 未加载
alfiedotwtfalmost 4 years ago
&gt; Both gcc and clang easily deduce that the guessing is semantically pointless<p>Can someone explain this please, because I don&#x27;t see how - as `node` and `next` could possibly be different values, wouldn&#x27;t that `if` be needed and so is in no way a NOP?
评论 #27938529 未加载
Const-mealmost 4 years ago
Interesting, but that kind of code can sometimes be made a lot faster with prefetch instructions, _mm_prefetch intrinsic in C++.
pranithalmost 4 years ago
Very well explained.. its a really interesting idea.
thereare5lightsalmost 4 years ago
Would this introduce spectre like vulnerabilities?
mhh__almost 4 years ago
I was under the impression that high end CPUs already perform value prediction while speculating?
评论 #27933994 未加载
asoylualmost 4 years ago
This works when list nodes are allocated subsequently without other irrelevant allocations intervened. If your case doesn&#x27;t fit this constraint, value speculation doesn&#x27;t seem to work. Any comments?
xferalmost 4 years ago
So the solution is to write assembly for an extremely common micro-architecture design. Maybe the compilers should provide more tools to exploit such parallelism. That asm block looks like a weird hack.