To get more gonzo, in CPython's list.sort(), the C code that actually merges two lists (which happen, in context, to be two contiguous array slices) is in listobject.c's `merge_at(()` function. That brings in the world of "galloping" (exponential search) optimizations, much faster than one-pair-at-a-time compares in many real-world cases.<p>So that's a whole lot of additional complication, but it's the heart of what _really_ makes timsort shine in its "jaw dropping" cases.<p>Tim (of "timsort" - which was an inside joke name at the start, because I never expected anything outside of CPython would use it ;-) ).
As someone that rarely works with Python lists, as opposed to numpy arrays, I was pleasantly surprised to see numpy does what I would expect in providing a mergesort option. I'm surprised Python doesn't, other than via heapq and only implemented in Python (according to my reading of the post and a very quick Google search).<p>Oops, just for fun the numpy documentation currently states: "The datatype determines which of ‘mergesort’ or ‘timsort’ is actually used, even if ‘mergesort’ is specified. User selection at a finer scale is not currently available." Awesome ...<p>Also, apparently mergesort may also be done using radix sort for integers "‘mergesort’ and ‘stable’ are mapped to radix sort for integer data types."
I definitely used this to my advantage in ACM ICPC contests. Most of the students and professors wrote the problems and benchmarked against Java. However, on several problems, it boiled down to "write this specific algorithm for this class of problem, or use C." I once netted my team an extra problem after we timed out in Java and I just rewrote the teammate's solution in C++.
Past TimSort threads, for anyone interested:<p><i>Timsort, the Python sorting algorithm</i> - <a href="https://news.ycombinator.com/item?id=21196555" rel="nofollow">https://news.ycombinator.com/item?id=21196555</a> - Oct 2019 (131 comments)<p><i>On the Worst-Case Complexity of TimSort</i> - <a href="https://news.ycombinator.com/item?id=17883461" rel="nofollow">https://news.ycombinator.com/item?id=17883461</a> - Aug 2018 (74 comments)<p><i>Timsort is a sorting algorithm that is efficient for real-world data</i> - <a href="https://news.ycombinator.com/item?id=17436591" rel="nofollow">https://news.ycombinator.com/item?id=17436591</a> - July 2018 (77 comments)<p><i>Functional verification with mechanical proofs of TimSort [pdf]</i> - <a href="https://news.ycombinator.com/item?id=9778243" rel="nofollow">https://news.ycombinator.com/item?id=9778243</a> - June 2015 (1 comment)<p><i>Timsort</i> - <a href="https://news.ycombinator.com/item?id=3214527" rel="nofollow">https://news.ycombinator.com/item?id=3214527</a> - Nov 2011 (27 comments)<p><i>Java has switched from Mergesort to TimSort</i> - <a href="https://news.ycombinator.com/item?id=752677" rel="nofollow">https://news.ycombinator.com/item?id=752677</a> - Aug 2009 (13 comments)
It seems like the built in heapq.merge() should be as fast as the implementation here. I mean if it's really just merging sorted lists, why wouldn't it be as fast as this implementation?<p>edit: Okay one reason is that heapq.merge is written in python:<p><a href="https://github.com/python/cpython/blob/6252670732c68420c2a8b3f0259559e45eed7610/Lib/heapq.py#L314-L392" rel="nofollow">https://github.com/python/cpython/blob/6252670732c68420c2a8b...</a><p>It might also be since the python version has more options (like arbitrarily many iterables), but I presume it's the fact that the extension is written in C.
TimSort is cool. In it's worst case it looks no better than merge sort, but give it some partially order data, or small amounts of data, where it uses insertion sort, and it really shines.
Interesting read. This post seems to be written by Adam Gordon Bell who is also the host of the excellect CoRecursive podcast. Recently featured at HN in The Untold Story of SQLite [0].<p>[0]: <a href="https://news.ycombinator.com/item?id=27718701" rel="nofollow">https://news.ycombinator.com/item?id=27718701</a>
The first Python implementation is bad - removing the first element in each iteration is O(n), the C implementation gets this right by maintaining one index into each list instead of modifying the lists.
I was thinking about how to make sorting faster the other day and was toying around with the idea of a permutation sort. If you think about a list of items generally (at least) one of the permutations of the list is guaranteed to be the correct sort order. Theoretically you should be able to do a binary search to find a correct permutation since there should always be a comparison you can do to reduce the search space by half. You should be able to go faster if there is duplicated data since there are multiple permutations of the list that would work.
Great post. I almost did a spit take when I read:<p>> Timsort overcomes this disadvantage by being written in C rather than Python.<p>Yeah, Python is SLOW. Thankfully the author dug into C. That was nice to see.<p>Edit: lol I have no idea why this is being downvoted. Y’all weird.