科技回声

3 条评论

So, 'Big Data' has been simplified to 'Problems Solvable by MapReduce'? sigh.Not every problem can be reduced into a completely cache-able batch job, trivially parallelizable across all of your data. 'Big Data' isn't about breaking up your batch processing into three layers, it's about being smart enough and knowledgeable enough in compsci, statistics, calculus, text processing, regexes, machine learning, business analysis, et cetera, to design an effective system which harvests useful insights from a large bank of atomic, messy, inconsistent data, with an appropriate level of availability and consistency.The real work is not in using/configuring Hadoop—it's about figuring out what information would bring greater-than-marginal value to a business, and how to compute that efficiently from an existing corpus of data.There's no silver bullet. Remember?EDIT: I think the following is particularly disingenuous: "The lambda architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers"This is such a ridiculous promise, that it put me in a strongly skeptical mood for the rest of the article.

评论 #4997216 未加载

评论 #4997861 未加载

评论 #4997627 未加载

spinron超过 12 年前

Some of you might have missed the perspective of the article's author (perhaps it isn't that clear); you might want to re-examine it from an implementer point of view. In other words, if you have a real-life big data problem (that would benefit from parallel processing via Hadoop) and you actually have to build the thing so that it would work and scale, the decomposition presented by the architecture would make the implementation a lot simpler. Architecting such systems isn't trivial, and this is a solid blue-print to start with. And it's really an architecture, not a model: It doesn't tell you how to formulate algorithms, it rather suggests how to build a complete system around them.I have read the recent draft of the "Big Data" book by the author, which describes the architecture that the article discusses in better detail. Honestly, if you are a beginning practitioner in this field, you can't really go wrong by reading it.

noelwelsh超过 12 年前

I'm pretty bullish on the "speed layer" coming to dominate. I've done a fair bit of work with streaming algorithms [1] and they have advantages beyond just latency, reduced memory usage being the primary one. If you believe data is growing faster than computing power it seems that streaming algorithms must be the way forward.Note that you can do a lot with streaming algorithms (it's not just counting). Also the reduced memory usage (orders of magnitude) makes the complexity of random writes not such a problem as you have less need to go outside a single machine.[1] Slides on streaming algorithms: <a href="http://noelwelsh.com/streaming-algorithms/2012/11/22/streaming-algorithms-scala-exchange-edition/" rel="nofollow">http://noelwelsh.com/streaming-algorithms/2012/11/22/streami...</a>

Big Data Lambda Architecture

3 条评论

Big Data Lambda Architecture

3 条评论