科技回声

8 条评论

kamaal将近 13 年前

A small note: Its great to see so many great tools coming up to solve the kind of problems which were earlier difficult/impossible to solve.But however please check your bid data use cases many times before using big data tools. Because frankly 'big data' is becoming a just cool must use tool regardless of use cases people have these days. I've even seen data sizes as small as 10 MB being considered for bid data use cases. Often this gets subjected to a monstrously complex architecture for no good reason.Generally most of these cases can be addressed and solved with as simple a tool like sqlite! And all you generally need is something like Perl with sqlite and ability to write simple SQL queries.People get deceived very easily, When they look at GB scale XML files they think that is what big data is. Yet most of that generally and easily goes into a traditional RDBMS. And the performance is generally is in pretty acceptable limits. Mark up eats a lot of space and data size. When converted to flat file structures like csv's, tsv's and then imported to a RDBMS the data sizes are way smaller. I've some times seen an order of 10x difference.Another annoying thing is abuse of NoSQL databases. Perfectly relational data is being de normalized, force fed in NoSQL databases and access data interfaces are generally bad buggy sub implementations of SQL.This is almost like, people who don't understand SQL are condemned to implement it badly.

评论 #4397133 未加载

评论 #4396942 未加载

评论 #4395980 未加载

评论 #4395609 未加载

d99kris将近 13 年前

Link to the paper describing Dremel [PDF]: <a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36632.pdf" rel="nofollow">http://static.googleusercontent.com/external_content/untrust...</a>

评论 #4395652 未加载

egillie将近 13 年前

There's an apache version of this in the works: <a href="http://www.itworld.com/big-datahadoop/290026/new-apache-project-will-drill-big-data-near-real-time" rel="nofollow">http://www.itworld.com/big-datahadoop/290026/new-apache-proj...</a>

评论 #4395466 未加载

iskander将近 13 年前

If I remember correctly, BigQuery only lets you import data via local csv files, uploaded one at a time. That makes importing data sets of relevant size quite a pain.

评论 #4395759 未加载

评论 #4397557 未加载

评论 #4396425 未加载

peterwwillis将近 13 年前

Every time I see a paper with Web-Scale in the title I throw up in my mouth a little.So they're using large amounts of nodes for parallel processing of complex queries with specific data segregated to individual nodes. The fuck does that have to do with the world-wide web or scaling the performance of an application on the web?

评论 #4397026 未加载

sbierwagen将近 13 年前

"Dremel" isn't trademarked by the rotary tool folks?

评论 #4395302 未加载

评论 #4395679 未加载

评论 #4395413 未加载

评论 #4395333 未加载

评论 #4395463 未加载

评论 #4395567 未加载

jdf将近 13 年前

Not sure why Cloudera is part of this article, seems like all the attention here should be on Google and the BigQuery team.Here is an open source project similar to Dremel:<a href="http://www.itworld.com/big-datahadoop/290026/new-apache-project-will-drill-big-data-near-real-time" rel="nofollow">http://www.itworld.com/big-datahadoop/290026/new-apache-proj...</a>

majorturd将近 13 年前

From TFA "We discuss the core ideas in the context of a read-only system, for simplicity. Many Dremel queries are one-pass aggregations; there-fore, we focus on explaining those and use them for experiments in the next section. We defer the discussion of joins, indexing, up-dates, etc. to future work." Really, it takes Dremel multiple SECONDS to complete trivial massively parallelized read queries? It must take hours for an UPDATE or JOIN then. Wake me up when you move past the trivial, until then, enjoy your hair.

评论 #4396116 未加载

评论 #4397063 未加载

8 条评论

kamaal将近 13 年前

评论 #4397133 未加载

评论 #4396942 未加载

评论 #4395980 未加载

评论 #4395609 未加载

d99kris将近 13 年前

评论 #4395652 未加载

egillie将近 13 年前

评论 #4395466 未加载

iskander将近 13 年前

If I remember correctly, BigQuery only lets you import data via local csv files, uploaded one at a time. That makes importing data sets of relevant size quite a pain.

Google’s Dremel Makes Big Data Look Small

8 条评论

Google’s Dremel Makes Big Data Look Small

8 条评论