TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Improving MapReduce Performance

18 pointsby tlipconover 15 years ago

4 comments

strlenover 15 years ago
Great article Todd,<p>One key thing to highlight is the importance of compression and using a streaming compression algorithm. Compression means there's less data to transfer (across the network and -- even more importantly -- from disk), which means the transfers will complete faster.<p>Not only does it allow your compressed files to be splittable (not possible with a conventional compression algorithm which requires all compressed data to have its own Huffman tree), it runs very quickly and easily adopts to a _stream_ (rather than a monolithic chunk) of data.<p>We've just added support for LZF (a similar arithmetic/streaming compression codec) into Voldemort and performance results have been great:<p><a href="http://groups.google.com/group/project-voldemort/browse_thread/thread/cb366257d3714da3" rel="nofollow">http://groups.google.com/group/project-voldemort/browse_thre...</a><p>Here's some background: <a href="http://en.wikipedia.org/wiki/Arithmetic_coding" rel="nofollow">http://en.wikipedia.org/wiki/Arithmetic_coding</a> <a href="http://en.wikipedia.org/wiki/Lempel_Ziv" rel="nofollow">http://en.wikipedia.org/wiki/Lempel_Ziv</a><p>(I had the good fortune to take an information theory class during undergrad)
jganetskover 15 years ago
Liked the post!<p>Interesting point about allocating too many Writables. This problem is an indication that *Writable classes are poorly designed. Instead of having public constructors, they should each have some sort of static method, akin to that of a factory class, that implements some sort of intelligent pooling and reuse.<p>Also, NullWritable is awesome! I don't think you mentioned it. Very useful for counters!
评论 #1002862 未加载
houseabsoluteover 15 years ago
Has any non-Googler here actually used Hadoop or any of the other public MapReduce solutions?
评论 #1002387 未加载
评论 #1002279 未加载
评论 #1002843 未加载
brgover 15 years ago
On the topic of MapReduce, does anyone have pointers to articles detailing different implementations of the shuffle phase?