TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Sorting a Billion Numbers with Julia

123 点作者 josep2大约 9 年前

5 条评论

ljw1001大约 9 年前
I&#x27;m working on a dataframe for Java for large datasets: <a href="https:&#x2F;&#x2F;github.com&#x2F;lwhite1&#x2F;tablesaw" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lwhite1&#x2F;tablesaw</a>.<p>It&#x27;s not ready for prime time, but:<p>Time to sum 1,000,000,000 floats: 1.5 seconds<p>Time to sort 1,000,000,000 floats: 30.5 seconds<p>The code to fill a column and sort it:<p><pre><code> FloatColumn fc = new FloatColumn(&quot;test&quot;, 1_000_000_000); for (int i = 0; i &lt; 1_000_000_000; i++) { fc.add((float) Math.random()); } fc.sortAscending();</code></pre>
评论 #11334347 未加载
评论 #11335787 未加载
rxin大约 9 年前
As part of Spark 2.0, we are introducing some new neat optimizations to make a general engine as efficient as specialized code.<p>I just tried on Spark master branch (i.e. the work-in-progress code for Spark 2.0). It takes about 1.5 secs to sum up 1 billion 64-bit integers using a single thread, and about 1 secs using 2 threads. This was done on my laptop (Early 2015 Macbook Pro 13, 3.1GHz Intel Core i7).<p>We haven&#x27;t optimized integer sorting yet, so that&#x27;s probably not going to be super fast, but the aggregation performance has been pretty good.<p><pre><code> scala&gt; val start = System.nanoTime start: Long = 56832659265590 scala&gt; sqlContext.range(0, 1000L * 1000 * 1000, 1, 2).count() res8: Long = 1000000000 scala&gt; val end = System.nanoTime end: Long = 56833605100948 scala&gt; (end - start) &#x2F; 1000 &#x2F; 1000 res9: Long = 945 </code></pre> Part of the time are actually spent analyzing the query plan, optimizing it, and generating bytecode for it. If we run this on 10 billion integers, the time is about 5 secs.
en4bz大约 9 年前
I was initially skeptical of the 2.5s for the SUM base case but after a bit of experimenting I concluded the following. Note this is for the SUM base case only. The whole file is loaded into memory for my tests.<p>MMAP&#x27;d from HDD 1st run - 47s<p>MMAP&#x27;d from HDD 2nd run - 1.35s (OS caches pages in memory)<p>MMAP&#x27;d from NVMe 1st run - 6.5s (OS Caches dropped)<p>MMAp&#x27;d from NVMe 2nd run 1.35s (OS cached again)
brashrat大约 9 年前
in a purely functional language like Haskell, you can sort a billion numbers in nanoseconds, all with no performance crushing side effects. Yes, it&#x27;s true, that&#x27;s the kind of results you get with lazy evaluation!<p>When you start searching the resultant sorted list, it might be somewhat slower than if you sorted using other techniques, but--silver lining--search times will only improve after that!<p>I&#x27;d give you the actual stats but so far I&#x27;ve only lazy evaluated them.
评论 #11333855 未加载
评论 #11332956 未加载
评论 #11335304 未加载
评论 #11334277 未加载
评论 #11333318 未加载
评论 #11332943 未加载
mateuszb大约 9 年前
1 billion numbers (let&#x27;s say DWORDs) doesn&#x27;t even use full 4GB worth of space. Load it up into memory and sort. With QWORDS it grows to about 8GB. A modern laptop still can load twice the amount and sort everything in memory. That is not a large dataset.