TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Improving MapReduce Performance

18 点作者 tlipcon超过 15 年前

4 条评论

strlen超过 15 年前
Great article Todd,<p>One key thing to highlight is the importance of compression and using a streaming compression algorithm. Compression means there's less data to transfer (across the network and -- even more importantly -- from disk), which means the transfers will complete faster.<p>Not only does it allow your compressed files to be splittable (not possible with a conventional compression algorithm which requires all compressed data to have its own Huffman tree), it runs very quickly and easily adopts to a _stream_ (rather than a monolithic chunk) of data.<p>We've just added support for LZF (a similar arithmetic/streaming compression codec) into Voldemort and performance results have been great:<p><a href="http://groups.google.com/group/project-voldemort/browse_thread/thread/cb366257d3714da3" rel="nofollow">http://groups.google.com/group/project-voldemort/browse_thre...</a><p>Here's some background: <a href="http://en.wikipedia.org/wiki/Arithmetic_coding" rel="nofollow">http://en.wikipedia.org/wiki/Arithmetic_coding</a> <a href="http://en.wikipedia.org/wiki/Lempel_Ziv" rel="nofollow">http://en.wikipedia.org/wiki/Lempel_Ziv</a><p>(I had the good fortune to take an information theory class during undergrad)
jganetsk超过 15 年前
Liked the post!<p>Interesting point about allocating too many Writables. This problem is an indication that *Writable classes are poorly designed. Instead of having public constructors, they should each have some sort of static method, akin to that of a factory class, that implements some sort of intelligent pooling and reuse.<p>Also, NullWritable is awesome! I don't think you mentioned it. Very useful for counters!
评论 #1002862 未加载
houseabsolute超过 15 年前
Has any non-Googler here actually used Hadoop or any of the other public MapReduce solutions?
评论 #1002387 未加载
评论 #1002279 未加载
评论 #1002843 未加载
brg超过 15 年前
On the topic of MapReduce, does anyone have pointers to articles detailing different implementations of the shuffle phase?