TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Writing a Faster Sorting Algorithm

201 点作者 zefei超过 8 年前

9 条评论

tener超过 8 年前
Neat algorithm, but it is largely comparing apples to oranges. std::sort is universal, ska_sort is not:<p>&gt; Another problem is that I’m not sure what to do for data that I can’t sort. For example this algorithm can not sort a vector of std::sets<p>&gt; Another problem is that right now there can only be one sorting behavior per type. You have to provide me with a sort key, and if you provide me with an integer, I will sort your data in increasing order. If you wanted it in decreasing order, there is currently no easy interface to do that.<p>Useful algorithm, but we already knew there are faster algorithms if we introduce additional requirements!
评论 #13309794 未加载
评论 #13309756 未加载
Animats超过 8 年前
This is a classic. In fact it&#x27;s close to the first algorithm ever patented - SyncSort, patented in 1971. Still being developed and sold, SyncSort remains the leading technology for big sorts.[1] When you need to sort a few terabytes, they have a product for that. It&#x27;s a radix sort with self-adjusting bucket boundaries. So it can deal with keys that are not well distributed.<p>[1] <a href="http:&#x2F;&#x2F;www.syncsort.com&#x2F;en&#x2F;Products&#x2F;Sort" rel="nofollow">http:&#x2F;&#x2F;www.syncsort.com&#x2F;en&#x2F;Products&#x2F;Sort</a>
necessity超过 8 年前
Can&#x27;t test the code as there are numerous compilation errors with my version of gcc and libc++. Besides what others have written, the author seems to have taken only one measurement for each case. And more importantly there is no mention of randomization at all. If he sorted all input sizes using algorithm A and then all input sizes using algorithm B (instead of doing it in random order) it is possible that there is bias in the measurements (due to cache, branch prediction, etc), specially if he used the same input, just varying the size. I&#x27;m also worried about what he used to measure time, as he mentions that for a small number of elements it takes a few microseconds (gettimeofday(3) has a resolution of 1 microsecond, so the measured time is dangerously close to the clock resolution - clock_gettime(3) should be use instead if available, having a resolution of 1ns - or the C++ equivalent for nanosecond precision). Making the algorithm code available is not enough, it is just as important (perhaps even more) to make the benchmark code available. The &quot;tests and benchmarks&quot; code available on Github is a .hpp unfit for reproduction. I want to see the raw data used in the plots and the code to generate it, and I want to be able to run it myself. Otherwise there&#x27;s no reason to trust it.
testerofwaters超过 8 年前
It might just be faster because the author is comparing their hybrid radix sort implementation to std::sort (which is comparison-based).<p>Would be more valid to compare to Boost&#x27;s &quot;spreadsort&quot; which is similarly a hybrid radix sort algorithm (and also outperforms std::sort).
评论 #13307927 未加载
评论 #13308731 未加载
ryao超过 8 年前
Just to share since it is related, you can sort faster than a naive O(nlogn) algorithm whenever data is in the form of partitions where the partitions are sorted, but the contents of each partition are not. i.e. The largest element in each partition is smaller than the smallest in the next partition.<p>There the total number of elements are n and the average size of a partition is m. Then you need only apply any O(log(m)) algorithm to each partition to sort everything. That multiplies the time complexity by n&#x2F;m. Then substitute n = m^c and the sort becomes O(n&#x2F;c*log(n)) time. That gives you a factor of c speed up on what would otherwise be an O(nlogn) operation.<p>This trick is usable when you want to sort a B-Tree like structure where the data in each node is unsorted, but the nodes themselves are sorted. The file system hierarchy on a machine is like that. The default output of zfs list is also like that.
amelius超过 8 年前
Every sort algorithm should take locality of data-access into account. Without this, a comparison to other algorithms may not be fair.
评论 #13309760 未加载
web007超过 8 年前
&gt; This is a trivial function which is less complicated than the comparison operator that you would have to write for std::sort.<p>Unless it uses the same interface (aka call a comparator for each eval) it&#x27;s apples to oranges. I also didn&#x27;t see comparisons vs swaps enumerated, which matter for complex data structures or complex comparators.
armamut超过 8 年前
I&#x27;d be happy to see, how well it would behave compared to the famous Timsort (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Timsort" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Timsort</a>).
Pica_soO超过 8 年前
Watching std:sort in sort sound- i can only wonder, why this second fine granular sorting pass, long after you pushed out the first sorting part out of cache?