TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Google’s Dremel Makes Big Data Look Small

90 点作者 arunabh将近 13 年前

8 条评论

kamaal将近 13 年前
A small note: Its great to see so many great tools coming up to solve the kind of problems which were earlier difficult/impossible to solve.<p>But however please check your bid data use cases many times before using big data tools. Because frankly 'big data' is becoming a just cool must use tool regardless of use cases people have these days. I've even seen data sizes as small as 10 MB being considered for bid data use cases. Often this gets subjected to a monstrously complex architecture for no good reason.<p>Generally most of these cases can be addressed and solved with as simple a tool like sqlite! And all you generally need is something like Perl with sqlite and ability to write simple SQL queries.<p>People get deceived very easily, When they look at GB scale XML files they think that is what big data is. Yet most of that generally and easily goes into a traditional RDBMS. And the performance is generally is in pretty acceptable limits. Mark up eats a lot of space and data size. When converted to flat file structures like csv's, tsv's and then imported to a RDBMS the data sizes are way smaller. I've some times seen an order of 10x difference.<p>Another annoying thing is abuse of NoSQL databases. Perfectly relational data is being de normalized, force fed in NoSQL databases and access data interfaces are generally bad buggy sub implementations of SQL.<p>This is almost like, people who don't understand SQL are condemned to implement it badly.
评论 #4397133 未加载
评论 #4396942 未加载
评论 #4395980 未加载
评论 #4395609 未加载
d99kris将近 13 年前
Link to the paper describing Dremel [PDF]: <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36632.pdf" rel="nofollow">http://static.googleusercontent.com/external_content/untrust...</a>
评论 #4395652 未加载
egillie将近 13 年前
There's an apache version of this in the works: <a href="http://www.itworld.com/big-datahadoop/290026/new-apache-project-will-drill-big-data-near-real-time" rel="nofollow">http://www.itworld.com/big-datahadoop/290026/new-apache-proj...</a>
评论 #4395466 未加载
iskander将近 13 年前
If I remember correctly, BigQuery only lets you import data via local csv files, uploaded one at a time. That makes importing data sets of relevant size quite a pain.
评论 #4395759 未加载
评论 #4397557 未加载
评论 #4396425 未加载
peterwwillis将近 13 年前
Every time I see a paper with Web-Scale in the title I throw up in my mouth a little.<p>So they're using large amounts of nodes for parallel processing of complex queries with specific data segregated to individual nodes. The fuck does that have to do with the world-wide web or scaling the performance of an application on the web?
评论 #4397026 未加载
sbierwagen将近 13 年前
"Dremel" isn't trademarked by the rotary tool folks?
评论 #4395302 未加载
评论 #4395679 未加载
评论 #4395413 未加载
评论 #4395333 未加载
评论 #4395463 未加载
评论 #4395567 未加载
jdf将近 13 年前
Not sure why Cloudera is part of this article, seems like all the attention here should be on Google and the BigQuery team.<p>Here is an open source project similar to Dremel:<p><a href="http://www.itworld.com/big-datahadoop/290026/new-apache-project-will-drill-big-data-near-real-time" rel="nofollow">http://www.itworld.com/big-datahadoop/290026/new-apache-proj...</a>
majorturd将近 13 年前
From TFA "We discuss the core ideas in the context of a read-only system, for simplicity. Many Dremel queries are one-pass aggregations; there-fore, we focus on explaining those and use them for experiments in the next section. We defer the discussion of joins, indexing, up-dates, etc. to future work." Really, it takes Dremel multiple SECONDS to complete trivial massively parallelized read queries? It must take hours for an UPDATE or JOIN then. Wake me up when you move past the trivial, until then, enjoy your hair.
评论 #4396116 未加载
评论 #4397063 未加载