> <i>the final surprise is that Bubble Sort takes the crown for small arrays</i><p>This didn't need to be a surprise! It was a commonplace bit of wisdom in the microcomputer era, when sorting always came with questions about the application. You can't beat it for locality— but you also can't beat it for mostly-sorted, small collections.<p>Rather than sorting arrays, the area where bubblesort shines (ok. where bubblesort can still be considered!) is <i>keeping linked lists sorted</i>, especially where the sort order is important rather than critical. So event queues: you want to float the highest priority to the top, but it's ok if occasionally #2 is launched before #1, because events aren't guaranteed an execution order. We're using an intrusive linked list because the scheduler has to take what it's given, it can't lay out memory.<p>Also, any time spent tinkering with an event queue is wasted time. So every round, walk the event queue and perform one (1) bubble sort. This takes the time it takes to traverse the list, effectively. So if you've placed exactly one event at the end of the queue (head of the list), it ends up precisely where it should be. Two? You leave one of them behind for the next pass.<p>It's easy to reason out that when the sort starts performing badly, the problem you have is event congestion, not a poor big-O complexity for your queue sort.<p>Bubblesort is sometimes treated as a pessimal joke like bogosort, it isn't, it's just that the reason it's treated as a basic sort pedagogically has been lost, as the profession's center of gravity moves away from these kinds of system-level constructs.
...<i>It’s easy to see that the above loop can be implemented without conditional branches. The conditional swapping can be implemented by conditional moves and the conditional pointer increase could be implemented as a unconditional</i><p><pre><code> p += (*it < pivot)
</code></pre>
Does that really guarantee that it is branchless? Surely a compiler is free to turn that into a comparison and jump?<p>Or, to put it another way, the compiler was also free to turn the original 'if/then' structure into branchless code using conditional instructions. My point is, surely neither version <i>guarantees</i> that the generated code is branchless, it's all still up to the compiler.
Why not use combsort, which is generally
does similar swaps and outperforms bubble sort
in all cases?
<a href="https://en.wikipedia.org/wiki/Comb_sort" rel="nofollow">https://en.wikipedia.org/wiki/Comb_sort</a>
<a href="https://www.geeksforgeeks.org/comb-sort/" rel="nofollow">https://www.geeksforgeeks.org/comb-sort/</a>
This is well known to game developers, and common advice in 3d engines for sorting for z-culling.<p>You can look this advice up in a raft of 1990s books when we were implementing our own 3d, eg Michael Abrash's Graphics Programming Black Book
So, there are a few reasons it isn't fair to compare to std::sort.<p>The C++ standard requires that std::sort work for "move-only types", so you can't copy the pivot. I suspect this is where quite a bit of the slowdown happens -- as the pivot has to be taken by reference/pointer, the compiler optimises worse.<p>Could std::sort implementations optimise for when the pivot can be copied? Certainly! Particularly when it is a "small value" like this. I worked on this years ago for libstdc++ (g++'s std::sort), but it turned out to make the code really horrible, so it was never merged in.<p>std::sort implementations already use an alternative sort (usually insertion sort) for small sized lists.
A while ago I was tinkering with Quicksort and avoiding branch mispredictions.<p><a href="https://easylang.online/blog/qsort_c.html" rel="nofollow">https://easylang.online/blog/qsort_c.html</a><p>My implementation is pretty fast. At a size of 40 or 50, I switch to Insertion sort, and there the branchless bubblesort is significantly slower (I just tried it).<p>But I have to admit defeat to this <i>Sample sort</i>:<p><a href="https://github.com/SaschaWitt/ips4o" rel="nofollow">https://github.com/SaschaWitt/ips4o</a>
Bubblesort is usually the worst sorting algorithm, in every conceivable way. Given that in a sorting network, bubble sort is dual to insertion sort [1], why can't the optimisations in the blog post be applied to insertion sort? I believe they can be, and the author was just being facetious.<p>Stop teaching bubble sort.<p>[1] - <a href="https://en.wikipedia.org/wiki/Sorting_network#Insertion_and_Bubble_networks" rel="nofollow">https://en.wikipedia.org/wiki/Sorting_network#Insertion_and_...</a>
Not a surprise, really.<p>I have battled so many times with smug novice "engineers", fresh from their CS courses, pointing, without even thinking twice, O(n^2) or even O(n^3) algorithms as errors on code reviews. This without taking into account the size of the input or the underlying machine executing the code.<p>It is my constant source of amazement -- to interview yet another candidate who can write any number of sorting algorithms from memory on a whiteboard but who can't tell whether a linked list or an array list insertion is going to be faster for a given application.
Perhaps <a href="https://tug.org/svn/texlive/trunk/Build/source/texk/dvipsk/afm2tfm.c?revision=61654&view=markup#l974" rel="nofollow">https://tug.org/svn/texlive/trunk/Build/source/texk/dvipsk/a...</a> has value at the Bank of Sans Serriffe.
Anyone interested in the ins and outs of Quicksort should have a look at this PhD thesis: <a href="https://www.wild-inter.net/publications/wild-2016.pdf" rel="nofollow">https://www.wild-inter.net/publications/wild-2016.pdf</a>