TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How I came to love big data

71 点作者 ejpastorino超过 12 年前

4 条评论

meritt超过 12 年前
Would love to see if indexes and a sane schema were used for the RDBMS case. I've built extremely large reporting databases (Dimensional Modeling techniques from Kimball) that perform exceedingly well for very adhoc queries. If your query patterns are even somewhat predictable and occur frequently, it's far better have a properly structured and indexed database than using the "let's analyze every single data element every single query!" approach that is implicit with Hadoop and MR.<p>Not to mention the massive cost-savings from using the right technology with a small footprint versus using a brute-force approach and a large cluster of machines.
评论 #4764668 未加载
zachrose超过 12 年前
Naive question: What does analyzing big data sets get you that sampling doesn't?
评论 #4764268 未加载
评论 #4763761 未加载
评论 #4763795 未加载
评论 #4763673 未加载
评论 #4764064 未加载
评论 #4764639 未加载
评论 #4763771 未加载
zwass超过 12 年前
I'm confused by the assertion that Hive was "much slower than using MySQL with the same dataset." The author makes this claim, and then provides a table that shows Hive performing ~50% better than MySQL on a variety of datasets (none of which really flex the muscle of Hadoop in operating on data sets going beyond single digit GB).<p>Regardless, Impala sounds like it could be pretty sweet!
xradionut超过 12 年前
"These aren’t scientific benchmarks by any means (nothing’s been especially tuned or optimized)..."<p>I had to smile when I read that. Working with data, sometimes optimization or redesign can yield significant performance gains. (Especially when reworking some of my colleages queries or code...)