How realistic is it to have a linked list in contiguous memory that stays nicely ordered? In my experience, lists are normally cobbled together with nodes that are allocated widely separated in time and hence I have no knowledge that I can use to predict the address of the next pointer.<p>If you're in a scenario where this optimization works, it seems like it would be better to just use an array.
This works because we've arranged for the nodes to be sequential in memory, if we have the luxury of doing that then we could use an array instead of a linked list. Is there a scenario where this value speculation trick would work, where we /don't/ have this luxury?
A quick googling brought me to the following paper:<p>"Data value speculation in superscalar processors"<p><a href="https://www.sciencedirect.com/science/article/abs/pii/S0141933198000866" rel="nofollow">https://www.sciencedirect.com/science/article/abs/pii/S01419...</a><p>It would appear there is an entire section dedicated to "Value prediction in a realistic scenarios", but I am not curious enough yet to pay money to access this document.<p>Edit: Found another that is accessible.<p>"Practical Data Value Speculation for Future High-end Processors"<p><a href="http://class.ece.iastate.edu/tyagi/cpre581/papers/HPCA14DataSpeculation.pdf" rel="nofollow">http://class.ece.iastate.edu/tyagi/cpre581/papers/HPCA14Data...</a>
How do they get the cycles/elem and instrs/cycle counts? Is it via the perf utility? Or something they calculate? Or they get it some other way?
I like it as an exercise for understanding the CPU but is this stuff really useful in the real world - the example given is "optimised" away by GCC/clang without hacks. My guess is that if this is a good way to iterate over linked lists compilers would have optimised for this by now? Either through cleverness or hints added to your code.
> Both gcc and clang easily deduce that the guessing is semantically pointless<p>Can someone explain this please, because I don't see how - as `node` and `next` could possibly be different values, wouldn't that `if` be needed and so is in no way a NOP?
This works when list nodes are allocated subsequently without other irrelevant allocations intervened. If your case doesn't fit this constraint, value speculation doesn't seem to work. Any comments?
So the solution is to write assembly for an extremely common micro-architecture design. Maybe the compilers should provide more tools to exploit such parallelism. That asm block looks like a weird hack.