TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Big Data Lambda Architecture

35 点作者 mjbellantoni超过 12 年前

3 条评论

zacharyvoase超过 12 年前
So, 'Big Data' has been simplified to 'Problems Solvable by MapReduce'? <i>sigh</i>.<p>Not every problem can be reduced into a completely cache-able batch job, trivially parallelizable across all of your data. 'Big Data' isn't about breaking up your batch processing into three layers, it's about being smart enough and knowledgeable enough in compsci, statistics, calculus, text processing, regexes, machine learning, business analysis, <i>et cetera</i>, to design an effective system which harvests <i>useful</i> insights from a large bank of atomic, messy, inconsistent data, with an appropriate level of availability and consistency.<p>The real work is not in using/configuring Hadoop—it's about figuring out what information would bring greater-than-marginal value to a business, and how to compute that efficiently from an existing corpus of data.<p>There's no silver bullet. Remember?<p>EDIT: I think the following is particularly disingenuous: "The lambda architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers"<p>This is such a ridiculous promise, that it put me in a strongly skeptical mood for the rest of the article.
评论 #4997216 未加载
评论 #4997861 未加载
评论 #4997627 未加载
spinron超过 12 年前
Some of you might have missed the perspective of the article's author (perhaps it isn't that clear); you might want to re-examine it from an implementer point of view. In other words, if you have a real-life big data problem (that would benefit from parallel processing via Hadoop) and you actually have to build the thing so that it would work and scale, the decomposition presented by the architecture would make the implementation a lot simpler. Architecting such systems isn't trivial, and this is a solid blue-print to start with. And it's really an architecture, not a model: It doesn't tell you how to formulate algorithms, it rather suggests how to build a complete system around them.<p>I have read the recent draft of the "Big Data" book by the author, which describes the architecture that the article discusses in better detail. Honestly, if you are a beginning practitioner in this field, you can't really go wrong by reading it.
noelwelsh超过 12 年前
I'm pretty bullish on the "speed layer" coming to dominate. I've done a fair bit of work with streaming algorithms [1] and they have advantages beyond just latency, reduced memory usage being the primary one. If you believe data is growing faster than computing power it seems that streaming algorithms must be the way forward.<p>Note that you can do a lot with streaming algorithms (it's not just counting). Also the reduced memory usage (orders of magnitude) makes the complexity of random writes not such a problem as you have less need to go outside a single machine.<p>[1] Slides on streaming algorithms: <a href="http://noelwelsh.com/streaming-algorithms/2012/11/22/streaming-algorithms-scala-exchange-edition/" rel="nofollow">http://noelwelsh.com/streaming-algorithms/2012/11/22/streami...</a>
评论 #4997342 未加载