TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Most companies do not need Snowflake or Databricks

179 点作者 whoiskatrin将近 2 年前

40 条评论

catchnear4321将近 2 年前
&gt; The cost for something that can be replicated free and open source is absurd.<p>open source it may be. free it is not. paying an expert to correctly deploy an open source solution takes time and money.<p>oh you want it maintained?<p>the three recommendations sound like those of a consultant. they work great with exec buy-in, and are a joke without.<p>yes, yes, you just have to explain yourself. just help management understand. justification is part of the role. how long does that take, and how much effort? (the answer is subjective and contextual.)<p>to be blunt, this kind of advice is barely more than “do better.” it ignores the situational example of using snowflake cheaply. it acts like most devs aren’t just going to go fire up a postgres rds - you said open source! it oversimplifies all problems, implicitly, to you.<p>&gt; Pay to make a problem disappear.<p>no, pay to change the parameters of the problem. this is a fundamental misunderstanding of how to get things done in a constrained environment. it isn’t either&#x2F;or, and every saas comes with problems. you pay to trade problems. otherwise it wouldn’t be worth a blog post name dropping and shitting on databricks and snowflake costs. in the hands of an untrained user or in a sufficiently constrained environment, they cost a lot - that’s one of the problems you buy. cost management.<p>&gt; a talk with your local Dev Ops Engineer&#x2F; manager to discuss how to secure your implementations<p>again, hope you’ve taken the time to build these bridges. the author does indicate this is most important.<p>and it has a cost. someone is building that bridge, and likely someone from each side.<p>you sometimes pay vendors to deal with technical problems so you can deal with non-technical ones.<p>most importantly, don’t assume a random stranger on the internet has sufficient context to give you worthwhile recommendations.
评论 #37210261 未加载
评论 #37210343 未加载
评论 #37210728 未加载
评论 #37217030 未加载
评论 #37215944 未加载
评论 #37208707 未加载
评论 #37208322 未加载
评论 #37215935 未加载
评论 #37208489 未加载
lr4444lr将近 2 年前
<i>...a talk with your local Dev Ops Engineer&#x2F; manager to discuss how to secure your implementations</i><p>This right here is exactly why Snowflake is a good fit for my org.: we could pay someone&#x27;s salary to install, maintain, and upgrade some open source alternative (and the VPS to run it), or we could just pay for Snowflake and stop wasting engineering&#x27;s time on the Data Science&#x27;s team&#x27;s stuff, which frees them up to move faster on the core platform features bringing in the big bucks.<p>This reads like a flavor of the same argument I&#x27;ve heard many times over. You know what the biggest cost of an org. really is as we head into a recession? Labor. Stop wasting it on stuff done better by specialized companies that free you up to get stuff done.
评论 #37209613 未加载
评论 #37209306 未加载
MrPowers将近 2 年前
I think most companies don&#x27;t understand a large part of the Databricks offering and it should be used by way more organizations. Disclaimer: I was a Databricks user for 6 years and now work at Databricks.<p>Yea, you can create your own Spark deployment, but it will run much slower than the Databricks Runtime (DBR) or the Databricks proprietary Spark Runtime (Photon). Computations that run slower cause you to have a larger cloud compute bill. Databricks rewrote Spark in C++ and it runs really fast and saves a lot on ec2 compute.<p>&gt; Define when you should compact files, when to Z-order<p>Or don&#x27;t consider these issues and use autocompaction &#x2F; the new Liquid clustering. These are great examples of problems the platform should solve, so the user has time to focus on business logic.<p>&gt; If you can sniff out the inefficiencies in your Data early and make architecture that handles your specific data<p>I don&#x27;t know what this means.<p>Are you going to build a deep learning model to make read&#x2F;writes faster like Databricks predictive I&#x2F;O? <a href="https:&#x2F;&#x2F;docs.databricks.com&#x2F;en&#x2F;optimizations&#x2F;predictive-io.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;docs.databricks.com&#x2F;en&#x2F;optimizations&#x2F;predictive-io.h...</a>. Probably not, you have a lot of business problems to solve.<p>&gt; Do the real work. Work with people. The Code will write itself.<p>I&#x27;ve seen lots of DIY data platforms. They&#x27;re horrible to work with and I can assure you that the code does not write itself. The data engineers have a lot less time to write code because they&#x27;re constantly trying to stand the platform back up.
评论 #37210047 未加载
评论 #37211561 未加载
评论 #37210517 未加载
intellectronica将近 2 年前
True from a purely technical point-of-view, but doesn&#x27;t take into account how companies make decisions about adopting platform tech and that they&#x27;re making these choices for good reasons.<p>When companies choose SF or DB they don&#x27;t make a decision to install one of these platforms because there is no combination of open-source components and bespoke engineering work that could replace them and even be more flexible and efficient. They are choosing these platforms because they make it easy to manage data operations holistically, because it&#x27;s easier to hire people who already know them well, because they are opinionated and restrict the range of bad or just strange decisions someone in the organisation could be making if they were not restricted, and of course for enterprise support, available from a single reputable vendor.<p>See also: 80% of companies don&#x27;t need Kubernetes, 80% of companies don&#x27;t need a big-3 cloud, 80% of companies don&#x27;t need managed security solutions, 80% of companies don&#x27;t need SAP, 80% of companies don&#x27;t need Salesforce, etc etc etc...
评论 #37207895 未加载
mrweasel将近 2 年前
Most companies don&#x27;t need a lot of things.<p>I don&#x27;t do a whole lot of data analytics anymore, but I&#x27;d say: Start with figuring out how much data you actually have. We still see companies claim to have vast amounts of data, but in reality they are talking about less than a TB of data, frequently just a few 100GB. When you operate at that scale, just chuck your data into whatever database you&#x27;re comfortable with and do SQL queries, it&#x27;s fine.<p>Once you hit the scale where you have &quot;enough&quot; data I&#x27;d agree with many of the other comments: Managing open source or home grown solutions quickly become more expensive than just paying for a service. Not quite the same, but we considered deploying a few solutions on OpenStack, but once you paid for training and staff it turned out to be cheaper to just give VMWare more money.
评论 #37208547 未加载
评论 #37209184 未加载
评论 #37209116 未加载
tremon将近 2 年前
Of course they do not need it. 80% of companies can also do without AWS, they &quot;just&quot; need to hire the people themselves, and developer a core competency running and administering those services. It doesn&#x27;t automatically mean that it makes financial sense for said company.<p>I guess &quot;don&#x27;t need&quot; == &quot;must not use&quot; when you&#x27;re selling yourself as filling that self-hosted gap.
评论 #37207780 未加载
commandlinefan将近 2 年前
&gt; charging their customers for Ferraris<p>I&#x27;ve seen this sort of analogy before and I don&#x27;t think it&#x27;s the right one here: if I get a Ferrari, I know _exactly_ what I&#x27;m getting. I&#x27;m getting a really fast, really beautiful car.<p>I&#x27;m not too familiar with Snowflake, but I&#x27;ve suffered under Databricks on and off for about a decade now, and as far as I can tell, it&#x27;s just a more expensive, closed way to do what I could do a lot faster and a lot easier if I didn&#x27;t have to work around the obstacles that Databricks puts in my way that don&#x27;t have any value other than being something that Databricks can charge my employer a ton of money for putting in my way.
评论 #37212105 未加载
评论 #37210418 未加载
krasznahorkai将近 2 年前
I work as a solution architect at a consulting firm that builds analytical data platforms for customers. Our company has a partnership with Snowflake, which means all the solutions we build are pushed to use Snowflake. Their sales strategy is very Oracle-like and at least in my circles many Snowflake sales employees are ex-Oracle. This means our sales and Snowflake sales are the best of friends. Formally they&#x27;ll deny kickbacks, but who knows?<p>For all my clients Snowflake is overkill when you look from the perspective of growth and scale. They&#x27;ll never use that part of Snowflake. They might just do as well with DuckDB, Azure Synapse or any other analytical-oriented platform laying around.<p>What I do like with less-than-big use-cases is that (at least at Snowflake) you pay relatively little if you do relatively little data processing. It&#x27;s not free, but it doesn&#x27;t break the bank either.
评论 #37207925 未加载
评论 #37209889 未加载
评论 #37207935 未加载
评论 #37215047 未加载
评论 #37208631 未加载
YetAnotherNick将近 2 年前
Managing data yourself is hard. It is hard not because it can&#x27;t scale, but for more boring reasons like harder DB upgrades and migrations, snapshotting, access control, lacking libraries&#x2F;SDK, lack of documentation, harder training of new employees etc. The number one reason for extended downtime I have seen in companies is that data is in some bad state. And good data engineers who could do all the things authors expects them to do are not easy to find and expensive.
评论 #37207838 未加载
petetnt将近 2 年前
Would be really wild if 20%, a truly whopping number, of the companies did need Showflake or Databricks.
评论 #37207744 未加载
qrios将近 2 年前
&gt; The Fortune 100 have a use case for these companies, the rest are overpaying.<p>There is a difference between &quot;a use case for $data_platform&quot; and &quot;a data use case for $data_platform&quot;. Scope on the first one is the platform in $data_platform&quot; and second is the data specifics requirements in $data_platform.<p>Working on a non-Fortune100 insurance company in Europe, almost all our use cases can be easily done on traditional RDBMS like SQL Server, or on BI tools like SAS. Thankfully with higher granularity over time, Excel usage is fading out constantly. No big data, no heavy computing necessary - at least from my point of view.<p>All setups in place today, can be called self-service platforms. With cautious estimations we have at least 100 of such &quot;platforms&quot; running since years, or even decades.<p>This situation implies direct, we have a use case for a $data_platform itself. Costs are the biggest driver here, mainly due to the hidden costs of keeping these 100 systems up and running. Governance and management of the data, locked-in in all these stores, processed by slow and ugly SQLs nobody understands anymore, and with an unknown state of data quality, is key today.
willvarfar将近 2 年前
My experience is that there is a lot of milage for small and medium enterprises to use your normal RDBMS replication to create a copy of your OLTP DB that drives your business and run your analytics on that. And put up with complaining it can be slow to do inefficient analytics queries.<p>Really, after spending silly amounts of time and hassle (and money) on the fancy snowflake or whatever, you discover its not massively faster for those small and medium businesses. And now you have to keep paying for it and keep maintaining it, which is a often actually a bigger burden than keeping a normal RDBMS replica alive.
评论 #37208742 未加载
slotrans将近 2 年前
Basically correct. If you have less than tens-of-terabytes the big exciting stuff just isn&#x27;t needed and isn&#x27;t particularly helpful.<p>Focus on understanding your data and building a useful model.<p>Use Postgres if you can. Supplement with DuckDB and&#x2F;or ClickHouse as needed. Or, use your cloud&#x27;s columnar DB (Redshift, BQ, Azure whatever) because you can start and stop using it as you please without talking to sales or signing a contract.<p>If your data team doesn&#x27;t have the requisite skills... well... consider that tech selection might not be your problem.
jboggan将近 2 年前
Modifying the argument, I&#x27;d say even the companies that do need Snowflake or Databricks for an analytical use case tend to get run over by the hype train post adoption and start using it everywhere as an anti-pattern and causing much bigger problems operationally.
评论 #37213483 未加载
adam_oxla将近 2 年前
While I do agree that for most of companies those large SaaS solutions are overkill I do not think that DuckDB or similar is sufficient. Nowadays more of the companies really need to process large datasets.<p>I meet regularly companies that used PostgreSQL or something similar up to some point but then they have grown and it is not sufficient anymore. They need something scalable. It does not have to be large SaaS: in many cases small Clickhouse cluster is sufficient. Nevertheless not everything can be done using single server. Also even if customer knows exactly well what are their needs right now does needs will grow and change over time so it is reasonable to build something that is not only good enough for now. Of course building something absolutely &quot;future proof&quot; leads to extremes and high bills.
benjaminwootton将近 2 年前
I would turn this on it’s head and say that most companies do need one or the other.<p>My first premise is that most companies will have a BI or Data and Analytics problem, whether it’s analysing their spend, revenue, operations, customer churn or something more interesting.<p>At that point, having an industry standard, fully managed, fully elastic and resilient platform with consumption based pricing sounds pretty appealing.<p>Yes I can run and administer a warehouse on EC2, but the total cost of manpower and servers with full resilience is going to be high, especially as you’ll have to add in analytics tools, ETL tools etc which Snowflake or Databricks might have included.<p>I’m a huge believer in both Snowflake and Databricks. Snowflake for BI and Databricks for anything more funky. The technology is on point and the business case stacks up for the most part.
datadrivenangel将近 2 年前
The dirty secret of modern data management is that very few people really have big enough data to justify major amounts of data infrastructure.
HorizonXP将近 2 年前
My client is in the middle of migrating their Snowflake data to Databricks Unity Catalog.<p>To this day, I still do not understand why they love Databricks so much. It just looks like a Jupyter notebook to me. I know, it has Spark. So?<p>I’m trying to learn more so I can build something better, maybe.
评论 #37212391 未加载
nottorp将近 2 年前
99.99% of companies are not Google sized and don&#x27;t need Google&#x27;s solutions. Period.
评论 #37207848 未加载
benrutter将近 2 年前
As a data engineer who&#x27;s worked in a bunch of different contexts&#x2F;companies I 100% agree most of the time snowflake&#x2F;databricks is an unnecessary money sink. The main problem is most companies need the security of a managed service for cloud computing, and don&#x27;t want to be locked out of scaling to a very large scale (with distributed compute). Unfortunately, I don&#x27;t think there are a lot of options that meet those pretty simple requirements that <i>aren&#x27;t</i> databricks or snowflake[1].<p>Sure, you could put your compute on single docker containers, but when your data gets too big your stuck and have to get someone expensive in to manage kubernetes, all that time, you can&#x27;t compute on your data. Which is sorta the crux of it: databricks and snowflake are expensive, but not nearly as expensive as finding out you need them and not having them.<p>[1] if you&#x27;re using python on was&#x2F;gcp, I think coiled (coiled.io) is a rare exception to that pattern
评论 #37209059 未加载
bjornsing将近 2 年前
Considering what you get I think Snowflake is rather decently priced. It’s not easy to integrate and operate a bunch of open source tools that replaces it, especially not on a small scale. Also, in most cases you’ll pay a lot more for the people that use Snowflake than you do for Snowflake, so it makes more sense to focus on productivity.
bob1029将近 2 年前
Why not give in and pay for the SQL Server Hyperscale instance? Isn&#x27;t there enough BS to worry about? Why continue to waste time on tired old OLTP&#x2F;OLAP&#x2F;scalability&#x2F;etc. conversations in 2023?<p>Unless your business can continuously write &gt;100 megabytes of transactional data <i>per second</i>, this solution would almost certainly address all of your needs forever and ever. Up to 100TB too. It just works. It offers transactions exactly the way most business expect them to be conducted. No weird code, no weird client libraries, nothing. It works more or less like it has for the last 30+ years.<p>I can tell you for a fact that simply setting up a gigantic shared database called &quot;company&quot; and getting the team connected to it did wonders for us. When you stop worrying about &quot;will it scale&quot; you can start to collaborate and do amazing things again.
评论 #37210982 未加载
anonzzzies将近 2 年前
We make a product that smaller companies use; it’s much simpler to use and no consultants needed. We notice that even in bigger corps, departments import into our system and use that because they find the de facto systems painful to use and they don’t actually need that much power, ever.
dickersnoodle将近 2 年前
Word. Most companies don&#x27;t have <i>nearly</i> enough data to begin to justify them.
code51将近 2 年前
Which use cases cannot do without Snowflake or Databricks?
评论 #37207806 未加载
评论 #37207727 未加载
jojobas将近 2 年前
Most people shouldn&#x27;t need a 3GHz 8-core CPU in their pocket to look at cats or check weather, but here we are. Simplicity by bloat come at a price.
tjhunter将近 2 年前
This article has valid points but does not understand the perspective of companies. Companies do not buy technologies. Companies buy solutions.<p>- Companies do not buy Spark, they buy the ability to process their data and to have multiple personas collaborate (data scientists, data engineers, ...)<p>- You can do it yourself. It will be cheaper but it will require time, expertise and money, all things that companies do not give easily<p>- Snowflake and Databricks are elastic: you can start small and grow as you need. This is much easier than justifying the upfront cost of hiring specialized people or asking for trust that your ad-hoc solution will respect whatever enterprise governance rules<p>(disclaimer: I worked at Databricks for 6 years and talked to hundreds of prospect and actual Databricks users and customers)
tootie将近 2 年前
&gt; The global economy is headed for a recession. That’s not my opinion, that’s the Federal Reserve’s<p>I know that&#x27;s not the topic of the post but OP shows a lack of comprehension here. The Fed warned because that&#x27;s what they do when the risk is non-negligible. They never say it is definitely going to happen because nobody knows.
pradeepchhetri将近 2 年前
I would consider following things before selecting a database offering:<p>- Whether the database vendor is a lock-in. It would be a straight &quot;NO&quot; for me if the database isn&#x27;t open-source with proper license, since I can&#x27;t self-host it in case, I need to move away from their SaaS offering due to various unexpected foreseen reasons: vendor decided to increase pricing, vendor has hidden pricing, vendor has reliability issues etc etc.<p>- How big is the community behind the database. Check their public forums to understand how community feels about the database and how their requirements are considered.<p>- Don&#x27;t believe any random benchmarking post online but by doing my own benchmarking for my use-case.<p>- Check which other companies have adopted that database and read their experiences.
totetsu将近 2 年前
For the non data experts, from elsewhere on medium: “ Semi-structured data tends to evolve over time. Systems that generate data add new columns to accommodate additional information, which requires downstream tables to evolve accordingly. The structure of tables in Snowflake can evolve automatically to support the structure of new data received from the data sources”<p>“When dealing with large datasets, the processing power of individual machines can become limiting, necessitating the use of distributed and parallel processing capabilities provided by platforms such as Databricks“
ouraf将近 2 年前
That&#x27;s common practice with mid sized companies and above, though:<p>- find something that can probably give some competitive advantage or please shareholders - Identify it as &quot;not part of our core business&quot; - Pay a third party company to do it even if it would be cheaper to things in house. If something goes wrong and the shareholders ask questions, the CEO can blame them instead of their own IT department and reassure them they&#x27;re &quot;focusing on what generates most value for the company&quot;.
mxxc将近 2 年前
i am not sure why everyone (comments, this post, etc.) assume there is a one-size-fits-all solution to every problem, even this one that looks quite simple. companies have to align a few things, ranging from the skills of the current employees, hiring plans, investor&#x2F;shareholder management, or how to make sure the CEO really gets that boat he really wants and deserves.<p>not all businesses are the same. businesses with fat contracts but few users won&#x27;t have massive operating costs, and they can use whichever easy and non-scalable technology they want, because once the business scales up, those fat contracts will pay for enough data engineers. a gaming startup will face high server costs right away, without any optimisation, while data platforms (e.g. bigquery) with a tiny bit of optimisation (materialising 2-3 summary tables, for example) will bring the cost down to &quot;laughable&quot; pretty easily.<p>it is true that many of these things are choices, e.g. do you really want to spend a shit ton of money for looker when superset for most users is just as good? are you even able to make that choice? if these choices are hard to make because a potential user (or set of users) in the company really wants something instead of something else, well, that is not a technical choice, and the issue you have has nothing to do with the technology.
thiago_fm将近 2 年前
The thing is, you don&#x27;t understand that a business needs to stay focused and that has a price. When you buy those solutions, you are paying for the price of staying focused.<p>If the price for staying focused is $X, and X has a positive business impact and need, like Snowflake or Databricks could.<p>And it doesn&#x27;t mean you won&#x27;t need to spend on HR, developers, management and so on to run your own OSS solution.<p>It is actually great to give Snowflake and&#x2F;or Databricks your money. It&#x27;s a really expensive service, with a huge markup. That I understand.<p>But the alternative doesn&#x27;t look good at all. A company is better staying focused.<p>Also, if the company sees that it expends a considerable amount of its budget in Snowflake&#x2F;Databricks, they&#x27;ll find solutions, that could be negotiating with them, figuring out how to optimize their use etc.<p>I&#x27;ve worked and seen many companies optimizing their cost structure and run away from Datadog&#x2F;Newrelic, Snowflake&#x2F;Databricks and just fail miserably.<p>The alternatives made developers simply use it less, and also spend more time to do the same that they did.<p>They even sometimes call it successful, execs point out the money saved, but they don&#x27;t see the subjective part: developers being less efficient, wasting their time with BS.<p>There&#x27;s a long tail of costs related to inefficient systems, it&#x27;s wasting developer hours, that you need to manage, hire, train...<p>Not to forget the burden that it is to fire or make people redundant, companies end up losing so much of their culture and soul doing such things. And this is what you need to typically do if you assume those expenses of running things for yourself.<p>The ideal company are platforms that pays for many SaaS and stay focused on its core business, if it can make money, great!<p>There is so much literature on cost optimization in business in general, stuff that have been studied for over a century. If people would read on that, they wouldn&#x27;t be repeating such nonsense as this article.<p>Also, the article says that you are OVERPAYING for it if you aren&#x27;t a Fortune 100. It&#x27;s much more like the opposite, if you have a small company with a small amount of data, it&#x27;s very likely that Snowflake or Databricks won&#x27;t cost you that much.<p>Based also on the other comments on HN, I can only conclude that this is one of those articles that is so full of mistakes and plainly wrong, that there isn&#x27;t even a real debate over here, just people saying how wrong it is.
savrajsingh将近 2 年前
This reminds me of something I&#x27;ve seen too -- companies paying for enterprise wordpress (&gt;$25k a year) when they could easily have say a $100 wordpress plan with all the features they need, behind free or cheap cloudflare.
wnolens将近 2 年前
No one <i>needs</i> to evaluate all options against their current&#x2F;known&#x2F;possible needs when they can just pick a well vetted product and get on with business.
rmbyrro将近 2 年前
Most companies don&#x27;t use Snowflake or Databricks
kfk将近 2 年前
Agreed, not completely on duckdb but we used it for consolidating billing data from 10+ ERP systems and it works, so I see his point. Just to add to his points:<p>- Integrations are still one of the hardest things in Enterprise IT. Snowflake&#x2F;Databricks&#x2F;etc in fact add to the number of systems to integrate, they make this problem worst most of the times<p>- Governance in a self-service data ecosystem gets complicated fast, especially if you need to stay compliant with data privacy, gdpr, etc.. And amazingly again neither Snowflake nor Databricks solve this. In fact, they make it worst by sucking up budget away from governance initiatives
评论 #37207784 未加载
iou将近 2 年前
<a href="https:&#x2F;&#x2F;archive.is&#x2F;xNT28" rel="nofollow noreferrer">https:&#x2F;&#x2F;archive.is&#x2F;xNT28</a>
dna_polymerase将近 2 年前
73.6% of all statistics are made up
评论 #37207721 未加载
评论 #37207716 未加载
joduplessis将近 2 年前
heresy!