TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How I came to love big data

71 pointsby ejpastorinoover 12 years ago

4 comments

merittover 12 years ago
Would love to see if indexes and a sane schema were used for the RDBMS case. I've built extremely large reporting databases (Dimensional Modeling techniques from Kimball) that perform exceedingly well for very adhoc queries. If your query patterns are even somewhat predictable and occur frequently, it's far better have a properly structured and indexed database than using the "let's analyze every single data element every single query!" approach that is implicit with Hadoop and MR.<p>Not to mention the massive cost-savings from using the right technology with a small footprint versus using a brute-force approach and a large cluster of machines.
评论 #4764668 未加载
zachroseover 12 years ago
Naive question: What does analyzing big data sets get you that sampling doesn't?
评论 #4764268 未加载
评论 #4763761 未加载
评论 #4763795 未加载
评论 #4763673 未加载
评论 #4764064 未加载
评论 #4764639 未加载
评论 #4763771 未加载
zwassover 12 years ago
I'm confused by the assertion that Hive was "much slower than using MySQL with the same dataset." The author makes this claim, and then provides a table that shows Hive performing ~50% better than MySQL on a variety of datasets (none of which really flex the muscle of Hadoop in operating on data sets going beyond single digit GB).<p>Regardless, Impala sounds like it could be pretty sweet!
xradionutover 12 years ago
"These aren’t scientific benchmarks by any means (nothing’s been especially tuned or optimized)..."<p>I had to smile when I read that. Working with data, sometimes optimization or redesign can yield significant performance gains. (Especially when reworking some of my colleages queries or code...)