科技回声

6 条评论

lorenzhs超过 8 年前

The companion paper to Thrill, with more details on its architecture and some benchmarks and comparisons to Spark and Flink: <a href="https://arxiv.org/abs/1608.05634" rel="nofollow">https://arxiv.org/abs/1608.05634</a>

评论 #13628599 未加载

codepie超过 8 年前

There's also Blogel [0] which is a distributed graph processing framework in C++ and it runs significantly faster than its counterpart in Java, Apache Giraph [1].I have started wondering if the big data developers really care about the speed; the advantages of these Java softwares start to fade out when compared with their C++ counterparts.[0] - <a href="http://www.cse.cuhk.edu.hk/blogel/" rel="nofollow">http://www.cse.cuhk.edu.hk/blogel/</a>[1] - <a href="http://www.cse.cuhk.edu.hk/blogel/papers/blogel.pdf" rel="nofollow">http://www.cse.cuhk.edu.hk/blogel/papers/blogel.pdf</a>

评论 #13627635 未加载

adrianN超过 8 年前

There is also the STXXL [1] for times when your data is big but not "big". It contains containers and algorithms optimized for external storage.<a href="http://stxxl.sourceforge.net/" rel="nofollow">http://stxxl.sourceforge.net/</a>

评论 #13627703 未加载

pzh超过 8 年前

Does anybody know how this is different from Spark? These Distributed Immutable Arrays sound suspiciously similar to Spark's Resilient Distributed Datasets. Is it just the choice of C++ as opposed to Scala that would make this more efficient?Also, I wonder if and how they implemented the concept of lineage (unless these DIAs are not really very resilient)... I thought Spark relied on Scala's delayed evaluation to do that, though I may be mistaken.

评论 #13626854 未加载

评论 #13627713 未加载

Mikeb85超过 8 年前

Very cool. Will have to remember this, maybe write an R package that makes use of it.

评论 #13627489 未加载

tmsldd超过 8 年前

The Force is strong with this one

6 条评论

lorenzhs超过 8 年前

评论 #13628599 未加载

codepie超过 8 年前

评论 #13627635 未加载

adrianN超过 8 年前

评论 #13627703 未加载

pzh超过 8 年前

评论 #13626854 未加载

评论 #13627713 未加载

Mikeb85超过 8 年前

Very cool. Will have to remember this, maybe write an R package that makes use of it.

评论 #13627489 未加载

tmsldd超过 8 年前

The Force is strong with this one

Thrill – Big Data Processing with C++

6 条评论

Thrill – Big Data Processing with C++

6 条评论