I'm the author of this paper. I didn't realize this paper cause so many intensively discussions here until I heard from my friends. I really appreciate your advice. I would like to explain some questions about this paper.<p>First, this method is not based on comparsion, I definitely know the lower boundary of comparsion sorting is NlogN. The idea of this algorithm is that when we sort big data, the data may have some statistical properties cause it so large, and if we choose some of them, the sub-dataset should hold the properties, this is the basic assumption. I know this assumption is strong, but it is reasonable for big data. According to the paper, we transform the sorting problem to a mapping problem, so we learn the cumulative distribution and predict the position for a data in the future sorted sequence.<p>Second, I totally agree with the saying that this algorithm highly depends on the quality of distribution function fitting. That’s why I choose the General Vector Machine(arXiv:1704.06885), it is a 3-layer NN without Backpropogation. It is based on Monte-Carlo method and you can read this paper if you are interested. GVM is suitable for function fitting as the fig 2,5,6 in arXiv:1704.06885. And since it's a 3-layer NN, it can reduce time complexity for the ML. And if the distribution has some sharp shape region, we may apply ML piecewise.<p>Third, in the paper, I compare the relation between M and LogN. Actually the theoretical lower boundary of M is 1, in this case we assume we know the cummulative distribution in advance. But, unfortunately, the number of neurons M is usually larger than 1(we take M=100 in this paper), we are not able to get a good approximate distribution with too little neurons. Actually, I think it's worth for further study. Also, with this O(NM) sorting algorithm, we do not mean it’s faster than the NlogN quick sorting in all the cases. If we can apply GPU/TPU acceleration, it may show a huge potential for sorting large data. The current parallel computing sorting algorithms spend a large proportion of time in networking, but with our method, little networking is needed, hence we can largely reduce networking time cost.<p>If you would like to further discuss about our paper, welcome to email me at phzhq@mail.ustc.edu.cn