TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

mrjob: Yelp open sources its Elastic MapReduce framework for Python

101 pointsby pretzover 14 years ago

6 comments

stevejohnsonover 14 years ago
This past week I started working on a Python 3 port of this, mostly to learn. No EMR unfortunately, but Hadoop should be possible. I just got back from a trip, so it's still not very far along, just runs the "local" version, but it should get a bit farther next week.<p>I can confirm that it is a <i>great</i> way to learn about MapReduce.<p>Link: <a href="http://github.com/irskep/mrjob/tree/py3k" rel="nofollow">http://github.com/irskep/mrjob/tree/py3k</a><p>I will likely totally restart the py3k port now that I know what I am doing a bit better. I've been writing Python 3 for about, oh, two weeks.
ashikaover 14 years ago
Amazon EMR is an amazing value proposition for virtually any research need, and it's very cool to see wrapper frameworks targeting it directly. Still, for anyone managing their own compute clusters and wanting to do MR in python, I'd suggest checking out Disco.<p>Disco (<a href="http://discoproject.org" rel="nofollow">http://discoproject.org</a>) is a really elegant MR framework implemented in erlang and python, with additional support for jobs in C and Java. I've used it for a little over a year and am convinced it is the superior MR platform (Hadoop's terasort victories notwithstanding). New features are being integrated very quickly, the core platform is rock solid, management is simple and it's extremely flexible.
derwikiover 14 years ago
this was a game changer for us -- instead of everyone contending for the Hadoop cluster, each developer has their own personal arsenal of Hadoop clusters. huge win.
评论 #1849306 未加载
评论 #1849655 未加载
评论 #1849379 未加载
deathfluteover 14 years ago
On this note, does anyone know a good tutorial on map reduce for experienced programmers? Basically, I want to learn how to frame advanced problems in terms of MR - I am particularly interested in expressing my discrete event simulation in terms of MR.
评论 #1851832 未加载
FraaJadover 14 years ago
Nice to see one more production use of Cython.
评论 #1851739 未加载
LiveTheDreamover 14 years ago
So does most of your data live in S3 in JSON format?