Snowflake’s response to Databricks’ TPC-DS post

80 pointsby uvdn7over 3 years ago

18 comments

pxcover 3 years ago

Can someone ELI5 what Snowflake and Databricks are? I spent a few minutes on the Databricks website once and couldn't really penetrate the marketing jargon.There are also some technical terms I don't know at all, and when I've searched for them, the top results are all more Azure stuff. Like wtf is a datalake?

评论 #29208208 未加载

评论 #29208439 未加载

评论 #29208440 未加载

评论 #29208282 未加载

评论 #29208203 未加载

评论 #29208084 未加载

评论 #29208365 未加载

kthejoker2over 3 years ago

Snowflake conceding they have a 700% markup between Standard and Premium editons which has zero impact on query performance is ... well, it's something. I'd start squeezing my sales engineers about that, definitely not sustainable...Also proof that lakehouse and spot compute price performance economics are here to stay, that's good for customers.Otherwise, as a vendor blog post with nothing but self-reported performance, this is worthless.Disclaimer: I work at Databricks but I admire Snowflake's product for what it is - iron sharpens iron.

评论 #29214950 未加载

评论 #29221204 未加载

socaldataover 3 years ago

Take all the problems you have had with data warehousing and throw them in a proprietary cloud. That is Snowflake. They are the best today.Databricks started with the cloud datalake, sitting natively on parquet and using cloud native tools, fully open. Recently they added SQL to help democratize the data in the data lake versus moving it back and forth into a proprietary data warehouse.The selling point in Databricks is why move the data around when you can just have it in one place IF performance is the same or better.This is what led to the latest benchmark which in the writing appears to be unbiased.In snowflakes response however, they condemn it but then submit their own fundings. Sound a lot lot trump telling everyone he had billions of people attend his inauguration, doesn’t it?Anyhow, I trust independent studies more than I do coming from vendors. It cannot be argued or debated unless it was unfairly done. I think we are all smart enough to be careful with studies of any kind, but I can see why Databricks was excited about the findings.

评论 #29210311 未加载

评论 #29211094 未加载

michaelhartmover 3 years ago

* Databricks is unethical* Nobody should benchmark anymore, just focus on customers instead* But hey, we just did some benchmarks and we look better than what Databricks claims* Btw, please sign up and do some benchmarks on Snowflake, we actually ship TPC-DS dataset with Snowflake* Btw, we agree with Databricks, let's remove the DeWitt clause, vendors should be able to benchmark each other!* Consistency is more important than anything else!!!

评论 #29210420 未加载

评论 #29208187 未加载

评论 #29208504 未加载

maslamover 3 years ago

Databricks broke the record by 2x) and is 10x more cost effective, in an audited benchmark. Snowflake should participate in the official, audited benchmark. Customers win when businesses are open and transparent…

评论 #29207885 未加载

评论 #29207810 未加载

blobbersover 3 years ago

This is the sort of FUD testing that gets thrown back and forth between companies of all kinds.If you're in networking, it's throughput, latency or fairness. If you're in graphics its your shaders or polygons or hashes. If you're in CPUs its your clock speed. If its cameras, it's megapixels (but nobody talks about lens or real measures of clarity) If you're in silicon it's your die size (None of that has mattered for years, those numbers are like versions not the largest block on your die) If you're in finance, it's about your returns or your drawdowns or your sharpe ratios.I'm a little bit surprised how seriously databricks is taking this, but maybe it's because one of the cofounders laid this claim. Ultimately what you find is one company is not very good at setting up the other company's system, and the result is the benchmarks are less than ideal.So why not have a showdown? Both founders, streamed live, running their benchmarks on the data. NETFLIX SPECIAL!

评论 #29207946 未加载

imslowbutniceover 3 years ago

I dont get still how much optimization was done for the Snowflake TPC-DS power run. This is what I am seeing so far and what i am foggy on -DB1.Databricks generated the TPC-DS datasets from TPC-DS kit before time started. Databricks starts time then generated all queries. Then Databricks loaded from CSV to Delta format (also some delta tables were partitioned delta tables by date) and also computed statistics. Then all of the queries are executed 1-99 for TPCDS 100TBSF1. Databricks generated the TPC-DS datasets from TPC-DS kit before time started. Databricks starts time then generated all queries. Then load from S3 to Snowflake tables by - (i'm not sure about these next parts) - creating external stages and then "copy into" statements I guess? Or maybe just using copy into from an s3 bucket, that part doesnt matter much. But its not clear did they also allow target tables to be partitioned/clustering keys at all? Then all of the queries are executed 1-99 for TPCDS 100TBIts just hard to say exactly what "They were not allowed to apply any optimizations that would require deep understanding of the dataset or queries (as done in the Snowflake pre-baked dataset, with additional clustering columns)" means exactly. Like what does that exactly mean. At a glance though, this looks very impressive for Databricks, but just want to be sure before I submit to an opinion.

aptxkidover 3 years ago

Personally I think it’s a great response and very well written. I didn’t jump on the congrats-Databricks wagon when the result first came out because of the weird front page comparison against snowflake. Both companies are doing great work. Focusing on building a better product for your customer is much more meaningful than making your competitor look bad.

评论 #29209647 未加载

choppafaceover 3 years ago

The audience for these posts are enterprise managers who don’t actually understand their compute needs.For the more technically inclined, don’t let any corporate blog post / comms piece live in your head rent-free. If you’re a customer, make them show you value for their money. If you’re not, make them provide you tools / services for free. Just don’t help them fuel the pissing contest, you’ll end up a bag holder (swag holder?).

falakiover 3 years ago

Linking to the discussion on the follow up from Databricks: <a href="https://news.ycombinator.com/item?id=29232346" rel="nofollow">https://news.ycombinator.com/item?id=29232346</a>

geoduck14over 3 years ago

I've been a customer/user of Snowflake. They make it simple to run SQL. There is a bunch of performance stuff that I don't need to worry about.I'm interested in using Databricks, but I haven't done it yet. I've heard good things about their product.

throwaway984393over 3 years ago

"Posting benchmark results is bad because it quickly becomes a race to the wrong solution. But somebody showed us sucking on a benchmark, so here's our benchmark results showing we're better."

评论 #29210531 未加载

评论 #29208497 未加载

评论 #29224306 未加载

评论 #29208409 未加载

AtlasLionover 3 years ago

The main question I have for DB is, how good is their query optimiser/compiler? It's fun that you can run some predefined set of queries fast. More important is, how good you can run queries in the real world, with suboptimal data models, layers upon layers of badly written views, CTEs, UDFs... That is what matters in the end. Not some synthetic benchmark based on known queries you can optimise specifically for.

评论 #29208719 未加载

hiyerover 3 years ago

Performance is only one part of the story. The major advantage Snowflake (and to some extent Presto/Trino) brings to the table is it's pretty much plug and play. Spark OTOH usually requires a lot of tweaking to work reliably for your workloads.

评论 #29208013 未加载

评论 #29207955 未加载

bpaneuralover 3 years ago

So much to read. TLDR; Databricks still holds the world record and they beat us on price/performance

bjornsingover 3 years ago

> At the end of the script, the overall elapsed time and the geometric mean for all the queries is computed directly by querying the history view of all TPC-DS statements that have executed on the warehouse.The geometric mean? Really? Feels a lot easier to think in terms of arithmetic mean, and perhaps percentiles.

评论 #29208003 未加载

uvdn7over 3 years ago

I genuinely think DeWitt clause is good for the users (bad for researchers). Without it, especially in the context of cooperate competitions, the company with the most marketing power will win. Users can always compare different products themselves. I am likely wrong but please help me understand.

gloglaover 3 years ago

What do you know, here's an article[1] from 2017 about Databricks making an unfortunate mistake that showed Spark Streaming (which they sell) as a better streaming platform to Flink (which they don't sell).I really hope this is not the case again.(yes, I understand my sarcasm is unneeded, I couldn't help myself)[1]: <a href="https://www.ververica.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime" rel="nofollow">https://www.ververica.com/blog/curious-case-broken-benchmark...</a>