I've been using a variation of quickselect to find median rows of a matrix at certain columns.<p>There's a problem with quickselect: while it will find the median, it doesn't properly pivot the median around itself if the median occurs multiple times. So you could end up with the median sprinkled around both sides, which may or may not be a problem (it was for me).<p>One way to solve this is to take a second pass over the data to pivot -- which is essentially the core of QS where you nibble from both ends and swap low/high values.<p>Another nice property of QS is that it'll be approximately sorted, with values generally growing closer to the median towards the middle. This helps if you need to do repeated sub-selections on the partitions.
maybe this is included and i missed it, but it's asymptotically faster to maintain a sorted tree of values plus a list of pointers to nodes in order added. if you store "number of values to right" in each tree node then finding the next median (moving the window one position) is O(log w) and total cost for whole array is O(n log w) iirc.<p>this is used to median filter images in the IRAF package. i don't know if the approach is published anywhere (it's pretty obvious once the idea of keeping points within the window in a sorted tree "clicks"), but frank valdes did test it against other approaches.<p>the main drawbacks are that the overhead/constant is pretty high, so you need fairly large datasets (more exactly, large windows) for it to be a win, and implementation in old fortran is a pain...
I would have liked to see a benchmark for the deterministically linear O(n) BFPRT [1] algorithm as well. It plays out well in worst case scenarios if you have a very large dataset.<p>[1] <a href="http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm" rel="nofollow">http://en.wikipedia.org/wiki/Selection_algorithm#Linear_gene...</a>
That's funny. I just looked back at my old email, and I used this very quickselect.c implemention back in April 2008. It worked well and was wicked fast.<p>The algorithm by Blum, Floyd, Pratt, Rivest, and Tarjan takes 24n comparisons in the worst case. A description here:<p><a href="http://www.ics.uci.edu/~eppstein/161/960130.html" rel="nofollow">http://www.ics.uci.edu/~eppstein/161/960130.html</a><p>However, this algorithm has an average case performance of 4n comparisons:<p><a href="http://www.ics.uci.edu/~eppstein/161/960125.html" rel="nofollow">http://www.ics.uci.edu/~eppstein/161/960125.html</a>
Torbens method is very interesting. At each step it takes O(n) to half the range of inputs. This doesnt say much about how many numbers in the array it removes, but it does say that on average its about half, leading to O(nlogn) expected time. In the worst case it is Quadratic, with one example being [1, 2, 4, 8, 16, 32, 64, ...]