TechEcho

I remember learning as part of my CS degree that knowing the distribution of your data allows you to sort it in N log(log(N)) instead of N log(N) (which assumes a comparison based sort), although back then it wasn't given such a fashionable name as "learned sort", and determining the distribution wasn't called "training"

>The results show that our approach yields an average 3.38x performance improvement over C++ STL sort, which is an optimized Quick- sort hybrid, 1.49x improvement over sequential Radix Sort, and 5.54x improvement over a C++ implementation of Tim- sort, which is the default sorting function for Java and Python.<p>This is a lot

Can this approach be made more robust so it could be used in critical programs? Is there a way to formally verify performance of an approach like this?

Is the paper "NN-sort: Neural Network based Data Distribution-aware Sorting" [1] not a year prior to the cited paper? It seems to discuss the same approach.<p>[1] NN-sort: Neural Network based Data Distribution-aware Sorting: <a href="https://arxiv.org/pdf/1907.08817.pdf" rel="nofollow">https://arxiv.org/pdf/1907.08817.pdf</a>

tl,dr: linear spline fitting to approximate a CDF and then recursive bucket sort similar to radix.

Can this approach be made more robust so it could be used in critical programs? Is there a way to formally verify performance of an approach like this?

tl,dr: linear spline fitting to approximate a CDF and then recursive bucket sort similar to radix.

The case for a learned sorting algorithm

5 comments

The case for a learned sorting algorithm

5 comments