TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Databricks response to Snowflake's accusation of lacking integrity

217 pointsby rxinover 3 years ago

25 comments

gnabgibover 3 years ago
Related post (2 days ago, 95 comments): [Snowflake’s response to Databricks’ TPC-DS post](<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29206959" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=29206959</a>)
drejover 3 years ago
What I find hilarious is that companies argue who can query 100 TB faster and try to sell this to people. I&#x27;ve been on the receiving end of offers by both of the companies in question and used both platforms (and sadly migrated some data jobs to them).<p>While they can crunch large datasets, they are laughably slow for the datasets most people have. So while I did propose we use these solutions for our big-ish data projects, management kept pushing for us to migrate our tiny datasets (tens of gigabytes or smaller) and the perf expectedly tanked compared to our other solutions (Postgres, Redshift, pandas etc.), never mind the immense costs to migrate everything and train everyone up.<p>Yes, these are very good products. But PLEASE, for the love of god, don&#x27;t migrate to them unless you know you need them (and by &#x27;need&#x27; I don&#x27;t mean pimping your resume).
评论 #29237260 未加载
评论 #29233419 未加载
评论 #29235909 未加载
评论 #29234295 未加载
评论 #29234816 未加载
scapecastover 3 years ago
The irony here is that what Databricks is doing to Snowflake is exactly what Snowflake did to AWS and Redshift.<p>Same playbook - show that you’re better in a key metric that’s easy to understand (performance) to get the attention, but then pitch the paradigm change.<p>In Snowflake’s case, that was separation of storage and compute.<p>In Databrick’s case, it’s the Lakehouse Architecture.<p>I think the reason why Snowflake is so nervous because they know they can’t win this game.
评论 #29233266 未加载
评论 #29239226 未加载
评论 #29233190 未加载
评论 #29233238 未加载
avipover 3 years ago
I&#x27;ve used both products in production. Both are good++.<p>The blog wars seem extremely ridiculous to me. I don&#x27;t recall ever choosing one over another based on how fast it runs on some imaginary arbitrary dataset.
评论 #29232792 未加载
评论 #29233034 未加载
评论 #29237060 未加载
inetknghtover 3 years ago
Snowflake accuses other companies of lacking integrity?<p>I really wish I could block all of Snowflake&#x27;s domain from my inbox. Sadly, Google encourages spammers to just create a new email address. So I get a few emails each month from Snowflake who ask me to try their products. I&#x27;ve never done business with them and there&#x27;s no unsubscribe link.<p>Fuck Snowflake for thinking it has any room to talk about integrity.
评论 #29239361 未加载
boublepopover 3 years ago
Snowflake must be kicking themselves hard now for letting a story that was “Databricks is a viable alternative” turn into “Snowflake has absolutely no integrity and will fling mud even while they are gaming the statistics”<p>Really can’t see what they can do now short of “bending” to Databricks and entering the competition. And naturally it’s no longer just enough that they show comparable performance. They have to hit their games stats somehow otherwise any news even of they beat Databricks will be reported as “see, we told you they where cheating”
bloodyplonker22over 3 years ago
Databricks is trying to punch up at the market leader. Every decent marketer knows that you should never do the opposite and punch down.
评论 #29233518 未加载
评论 #29233827 未加载
jchwover 3 years ago
Before the Snowflake blog post, I did not know what Snowflake or Databricks were. I can only imagine that this rivalry is great for both of them, even if Databricks is somewhat on the advantage end, at least from a tactical standpoint; I admit though that they seem to be a bit unnecessarily defensive considering the position they&#x27;re in with the exchange.<p>In general though, I&#x27;m still not complaining. It&#x27;s interesting to see a dispute like this unfold.
评论 #29233107 未加载
AdamProutover 3 years ago
I would say that TPC-DS and TPC-H are really table stakes benchmarks for data warehouses at this point in time (maybe they weren&#x27;t 10 years ago). How to build a database that does well on them is well documented in the literature now[1][2][3][4] (maybe a few other papers). Its not easy to build such a database, but its &quot;just&quot; hard work and many companies have the $$ necessary to do that work. There isn&#x27;t any magic or technical moat in the results for databricks (or snowflake, or redshift, etc.).<p>I think Databricks is overly enthusiastic about their results as they have been trying to be competitive with cloud DWs on these benchmarks for a number of years now. They have finally caught up (by building deltalake and their photon query engine which implement a number of standard DW features).<p><pre><code> [1] http:&#x2F;&#x2F;www.vldb.org&#x2F;pvldb&#x2F;vol13&#x2F;p1206-dreseler.pdf [2] https:&#x2F;&#x2F;stratos.seas.harvard.edu&#x2F;files&#x2F;stratos&#x2F;files&#x2F;columnstoresfntdbs.pdf [3] https:&#x2F;&#x2F;web.stanford.edu&#x2F;class&#x2F;cs245&#x2F;readings&#x2F;c- store.pdf [4] http:&#x2F;&#x2F;sites.computer.org&#x2F;debull&#x2F;A12mar&#x2F;vectorwise.pdf</code></pre>
评论 #29235364 未加载
redwoodover 3 years ago
As much as I love seeing competition in the space and am enjoying my popcorn, I really don&#x27;t understand what Databricks is doing here: this feels like a childish foodfight rather than an obsession with the customer...
评论 #29233166 未加载
评论 #29233120 未加载
评论 #29233015 未加载
评论 #29233359 未加载
评论 #29233640 未加载
评论 #29233045 未加载
评论 #29232860 未加载
benjaminwoottonover 3 years ago
Ive been following this and it’s kind of embarrassing to watch.<p>I love working with Databricks and Snowflake. They both knock it out of the park for their respective use case. They’re amazing products.<p>It makes no sense to fall out about this though.<p>For a 100TB dataset with a funky calculation, Spark will trounce Snowflake. For a 1 row dataset, Snowflake will return before the spark job has been serialised.
评论 #29244002 未加载
评论 #29235723 未加载
评论 #29236844 未加载
评论 #29234259 未加载
__MatrixMan__over 3 years ago
Instead of blog posts written but experts in app A based on their experience with app B, I wish there were a platform for this kind of comparison.<p>Some objective third party sets the goal and then each company submits automation (selenium?) that configures their own app to achieve the goal. Entrants are scored by:<p>- time<p>- storage<p>- compute<p>- config complexity<p>No need to waste time making your opponent look bad, just focus on making your self look good, and do it on a level playing field.
评论 #29232796 未加载
评论 #29232837 未加载
评论 #29232872 未加载
michaelhartmover 3 years ago
Data Wars: Snowflake vs Databricks (0 - 2)?
评论 #29235074 未加载
naatteeover 3 years ago
snowflake should just pony up and do a TPC-DS audited benchmark
maslamover 3 years ago
Everyone win when data platforms submit audited benchmarks...
boringgover 3 years ago
And how soon is the S-1 for Databricks dropping?
Normal_gaussianover 3 years ago
so, alternatives?<p>Aside from the Azure&#x2F;GCP&#x2F;AWS internal offeringa I know about Snowflake and Firebolt, Databricks is new to me.
评论 #29233074 未加载
评论 #29233160 未加载
评论 #29232926 未加载
评论 #29232897 未加载
funstuff007over 3 years ago
I guess if anyone suggests &quot;sampling&quot; the data in meeting these days, they get their head blown off.
xiaodaiover 3 years ago
Spark compares itself to Hadoop only on the front page. I wonder how Spark compares to Firebolt.
uvdn7over 3 years ago
Now I see that getting rid of the DeWitt clause is indeed great. Kudos to both companies.
1cvmaskover 3 years ago
This reminds me of the old performance ads of Oracle where they would show you how everything ran better on Oracle. They used to put those ads at airports, business lounges and the back cover of newspapers and magazines read by non-technical executives like the FT and Economist.<p>Everyone technical knew they would game every environment to come out with superior results. I suppose it worked. As the top executives buy big system software and ignore the IT crowd who could easily point out the flaws in the methodology of the&quot;studies&quot;.<p>Breakdown of one of those example ads:<p><a href="https:&#x2F;&#x2F;db2news.wordpress.com&#x2F;2011&#x2F;06&#x2F;08&#x2F;a-closer-examination-of-oracles-database-performance-advertisement&#x2F;" rel="nofollow">https:&#x2F;&#x2F;db2news.wordpress.com&#x2F;2011&#x2F;06&#x2F;08&#x2F;a-closer-examinatio...</a>
评论 #29232966 未加载
评论 #29233747 未加载
评论 #29232853 未加载
falakiover 3 years ago
tl;dr: The data warehouse company used a pre-baked TPC-DS dataset and claimed they have similar performance to Databricks. Turns out if you use the official TPC-DS data generation scripts, you get much worse performance.
评论 #29232802 未加载
评论 #29232791 未加载
评论 #29232757 未加载
xiaodaiover 3 years ago
Lol
dreyfanover 3 years ago
Databricks is a rapidly approaching IPO. Trying to justify their valuation with their overpriced in-memory hadoop.
评论 #29232970 未加载
hello_motoover 3 years ago
Serious question: Databricks, Snowflake, Dremio. All these &quot;Data&quot; platform companies =&gt; which one do you have for your Data Lake and Data Warehouse solution?<p>I&#x27;m sick and tired of these companies Snake Oiling the Data industry by offering &quot;the easiest&quot; platform to satisfy your Data Lake + Warehouse solution only to fall hard whenever you hook it up with your production data (big dataset).<p>PS: Anyone selling Data Lakehouse (Data Lake + Warehouse as one platform) is on meth.
评论 #29233137 未加载