one thing that'd be interesting to see benchmarked is the performance of a linked list implementation vs a heap backed by an array allocated with sufficient capacity for expected usage, so it doesn't need to be resized often (or ever).<p>one downside of linked lists is that each node in the list can reside in an arbitrary part of memory. in the worst case each traversal of a link is a cache miss, so you sit around for 100 or 200 cycles waiting for a main memory read. similarly if all the values in the list that are used to implement comparisons are also stored by reference and scattered throughout memory.<p>with heaps, another interesting thing to try is switching from a binary heap to e.g. a 4-ary heap (i.e. 4 children per parent). i've been fiddling with heaps recently for a priority queue & see a decent speedup switching to a 4-ary heap -- provided the "heapify down" logic that runs whenever you pop an element is structured to let the cpu do as many pairwise compares as possible in parallel, rather than structuring it as a big chain of dependent computations. during "heapify" operations you're jumping around inside a heap a fair bit, which may also cause cache misses, but each bunch of children will be arranged linearly in memory, so there is a bit more signal to noise of each cache line read.