Why is Snowflake so expensive

364 点作者 eyeball超过 2 年前

48 条评论

cs702超过 2 年前

Great article. On the surface, it's about Snowflake. At a deeper level, the article is about the perverse incentives motivating SaaS businesses to do seemingly dumb, inefficient things and avoid seemingly obvious optimizations by default.Many SaaS businesses are perfectly happy to let customers shoot themselves in the foot if it generates more revenue. The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

评论 #32555025 未加载

评论 #32556018 未加载

评论 #32553703 未加载

评论 #32554313 未加载

评论 #32553851 未加载

评论 #32553082 未加载

评论 #32554131 未加载

评论 #32556566 未加载

评论 #32553049 未加载

评论 #32558677 未加载

评论 #32553772 未加载

评论 #32553741 未加载

评论 #32559121 未加载

评论 #32552915 未加载

twawaaay超过 2 年前

Snowflake is not expensive. Snowflake is super cheap, IF you know what it is for and how to use it. Compared to if you had to solve the problem on your own.The best way to describe Snowflake is that it is a brute force method to run complex queries without creating indexes.If you have a more traditional database, you will notice you need to set up indexes to be able to get anything from it in finite time. What if you don't know the indexes upfront? What if you want your users to be able to ask arbitrary queries and get answers before bedtime?That's what Snowflake is for. It automates using ENORMOUS amount of hardware to get your query executed fast, very inefficiently.It is not for free though. That inefficiency will cause a lot of resources used for queries. It is meant for those few queries when your users try to get some insight into your data and you can't predict indexes beforehand. Sometimes this is exactly what you want, like when you let your data people in to figure stuff out. Or when you have very rare functionality that allows the user to build their own queries -- which you should avoid like hell (and there are tricks to make it index pretty well) but can't always avoid.For everything else, whenever you can predict your indexes, you always want to use more traditional database that can be very efficient on queries properly supported by indexes.The issue is a lot of people try to use Snowflake as a database or to support frequently executing queries of the same kind. This is bad and it will cost you.

评论 #32555758 未加载

评论 #32556413 未加载

评论 #32555829 未加载

评论 #32555268 未加载

评论 #32560806 未加载

stassajin超过 2 年前

I'm the author of the article. Didn't expect it to blow up. Let me clarify a few points:1. I like Snowflake and I think they brought several innovations to the field: Instant scale out/up, time-travel, unstructured data query support. 2. Snowflake obviously makes innovations and performance improvements, otherwise they would not be the market leader they are. But I'm also suspecting that they make just enough performance improvements to be at par and then use the vendor lock in features to make switching hard.My argument is that their rate of performance innovation has considerably gone down and DataBricks, Firebolt, and open source alternatives just seem more attractive from a cost/performance ratio. I agree that Snowflake is still the best data-warehouse to start with if you have 100k, but not if you truly plan for a multi-year horizon and your usage expands.- Redshift also brought a lot of innovation that allowed people to execute analytical queries 100x-1000x faster than any OLTP that existed out there. I've used Redshift for four years and they kept ignoring performance and features until Snowflake came out. All of a sudden because of competitor pressure, they put more effort into the product to maintain and gain market share. My hope is that Snowflake finds a solution to their innovator's dilemma, since competitors are hot on their tails.- Some people point out that 70% usage growth just shows that Snowflake is useful. Nobody disagrees with that. The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs. At some point, you have to clamp down on costs, which means that you have to look for alternatives to run things more efficiently.

评论 #32556898 未加载

评论 #32559187 未加载

beoberha超过 2 年前

I disagree with the assertion that Snowflake has no incentive to improve performance. While I don’t work for Snowflake, I work for a competitor and we’re constantly looking to improve performance to make customers happy.For the exact reason that the article claims Snowflake wouldn’t innovate, I’d assert that they would. If they are expensive and slow, and a competitor is faster and cheaper, eventually they will see business move to the competitor. We see it all the time.

评论 #32553257 未加载

评论 #32554253 未加载

carlineng超过 2 年前

[Disclaimer: former Snowflake employee]Snowflake is not expensive because of perverse incentives, which is the primary claim of the article. It is expensive because it is a highly differentiated and very sticky product.As others have mentioned, competition is the ultimate incentive to work on performance. Every dollar of Snowflake revenue is a dollar of revenue that Amazon, Google, Microsoft and Databricks are fighting for.

评论 #32553704 未加载

评论 #32556965 未加载

评论 #32556844 未加载

shrimalpreeti超过 2 年前

[Disclaimer: I work for a company that offers a Snowflake Cost Optimizer product] We’re an open-source monitoring & alerting tool and many of our users were using it to set alerts on their warehousing (Snowflake) costs. The problem with Snowflake is particularly worse due to its lack of query level attribution of costs and no in-built features for monitoring or recommendations on improvements. We’re building a Snowflake Cost Optimizer (<a href="https://www.chaosgenius.io/snowflake-cost-optimizer.html" rel="nofollow">https://www.chaosgenius.io/snowflake-cost-optimizer.html</a>) and are hearing the same feedback from our customers as the author mentions. Snowflake is definitely coming up with features towards better cost transparency but I wonder if it’s too little too late.

评论 #32556395 未加载

mritchie712超过 2 年前

I predict[0] we'll see more people choosing Clickhouse over Snowflake in the next 5 years. Clickhouse will get reasonably feature compatible with Snowflake and give people a better escape hatch if they want to self-host their data stack. Clickhouse, Inc is building a cloud product that abstracts away the complexity and there's already companies like Altinity that will spin up a cluster for you in minutes.0 - <a href="https://blog.luabase.com/clickhouse-for-data-nerds/" rel="nofollow">https://blog.luabase.com/clickhouse-for-data-nerds/</a>

评论 #32553139 未加载

评论 #32553448 未加载

brianwawok超过 2 年前

Ran into the same exact thing at CircleCI.Me: My builds are really slowCircleCI: Here are a few very low effort answersMe: git checkout is taking literally 60 seconds, but it takes 3 seconds locally, why?CircleCI: Mumble Mumble.They charge per minute, so why would they care if builds are slow? Was about a year of this getting worse and worse, till I finally cancelled the service last week and built my own server in my basement.I know get 200% faster builds, and the hardware payback time is not very long (6 months of my CircleCI bill?).I think it's a huge red flag anytime the metric you care about is something that being "worse" makes the provider more money.

评论 #32554057 未加载

评论 #32554003 未加载

评论 #32554030 未加载

评论 #32554147 未加载

评论 #32553994 未加载

评论 #32554025 未加载

alberth超过 2 年前

This is all much simpler than the post makes it sound.It's usage-based pricing and customers are using more of it.> a customer that joins a year ago and spends $1 is paying out well over $1.7 a year laterThe entire article is based on this 1.7x "net dollar expansion" statement.After integrating Snowflake, customers have found value in using Snowflake and are using more of it 1 year later.Since Snowflake is billed on usage, that explains the net-dollar expansion.

benjaminwootton超过 2 年前

The monthly bill does make me wince, but Snowflake of course includes all server and compute costs, no installation, initial configuration or upgrades etc. It’s genuine SaaS.It’s also very simple to manage and optimise so less DBA or DevOps type manpower.Then of course you can perfectly right size your instances and pay by the second for compute and by the byte for storage.Expensive, but lower TCO than alternate approaches I suspect.

评论 #32552820 未加载

评论 #32552773 未加载

benreesman超过 2 年前

Alright I’ll bite finally. What do these companies do? Neither Snowflake’s front-facing website, nor the Wikipedia article, nor this post tell me why people pay all this money.I know a bit about the effort involved in chucking around 100 petabyte datasets, and there are numerous niches a SaaS could fill in there, but it’s very murky from the outside.

评论 #32554271 未加载

评论 #32554680 未加载

glenjamin超过 2 年前

I don't know if this is the case at Snowflake, but there are similar seemingly misaligned incentives with CircleCI's build-seconds-based pricing model.However, the generally accepted wisdom there was that improving performance had always led to more builds being run - and so still come out as a net-positive. This had happened a bunch of times as we upgraded CPUs or storage drivers or the version - there'd be a short term drop in direct revenue, but then it would bounce back quickly as people took advantage of being able to do more stuff in the same amount of time.I'm told the revenue and finance people were pretty concerned the first time it happened though!

评论 #32555952 未加载

评论 #32555355 未加载

datadisruptor超过 2 年前

[disclaimer: comment written by one of cofounders of iomete - a YC-backed startup - active in the same market as Snowflake]I think Snowflake is (still) expensive because it is a venture-backed enterprise software company and goes through a typical trajectory...Story goes like this: founders are product-driven and first movers -> find PMF -> need VC funding -> VCs only fund enterprise software ventures with 70%+ gross margins and high retention rates -> product/service gets priced to achieve these metrics -> VCs happy to fund sales & marketing machine needed to obtain sales growth, nobody cares about profitability until after IPO -> startup is everyone’s darling until ~2 years after IPO.Then: economic crisis hits, customers become more price sensitive, competition intensifies. Plus now management is exposed to quarterly pressure of financial markets to deliver on top-line and margin expectations.Meanwhile a bunch of startups are building (lower priced) alternatives. Perhaps not as mature or feature-rich as Snowflake, but good enough for 80% of use cases that Snowflake covers.Therefore the assertion that Snowflake is not optimizing their product sounds a bit crazy to me. It would be optimizing for short-term gain, while jeopardizing its reputation as the leader in the space. Obtaining excessive margins through excessive pricing only works under monopolistic conditions or if they had a truly distinctive product. Both are not the case imo. Also, it's early days. Not exactly sure what Snowflake's market share is, but I bet it is < 5%.. so they haven't locked in everyone yet...I bet that Snowflake will be forced to compete "also on price" in the next five years because free enterprise is a powerful thing. The title of the article could be “Why Snowflake is (still) expensive but will get more affordable over the next few years”..

kjw超过 2 年前

"Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue. In a typical Innovator’s Dilemma, Snowflake prioritizes other things that generate an ever larger menu of compute options, like Snowpark and data apps built on Streamlit, that will bleed your organization dry."This is not true. Snowflake has done just that - it has continuously improved performance resulting in reduced credit consumption and revenue from customers on a unit compute/storage basis. And it has negatively impacted their revenues and stock price. Snowflake's incentive is to strengthen their competitive position and to hopefully generate more long-term revenue from their customers.The CFO forecasted a $97 million dollar short fall when guiding for 2022 revenue resulting from product improvements. Snowflake stock dropped immediately after.See Q4 transcript -- <a href="https://www.fool.com/earnings/call-transcripts/2022/03/02/snowflake-inc-snow-q4-2022-earnings-call-transcrip/" rel="nofollow">https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...</a>"Similarly, phased throughout this year, we are rolling out platform improvements within our cloud deployments. No two customers are the same, but our initial testing has shown performance improvements ranging on average from 10% to 20%. We have assumed an approximately $97 million revenue impact in our full-year forecast, but there is still uncertainty around the full impact these improvements can have. While these efforts negatively impact our revenue in the near term, over time, they lead customers to deploy more workloads to Snowflake due to the improved economics."Also see the Bloomberg article -- <a href="https://www.bloomberg.com/news/articles/2022-03-02/snowflake-plunges-on-slowing-sales-growth-acquisition#:~:text=Fiscal%20fourth%2Dquarter%20revenue%20doubled,the%20period%20a%20year%20earlier" rel="nofollow">https://www.bloomberg.com/news/articles/2022-03-02/snowflake...</a>."Snowflake Inc., a software company that helps businesses organize data in the cloud, dropped the most ever in a single day Thursday after projecting that annual product sales growth would slow from its previous triple-digit-percentage pace.Executives said improvements to the company’s data storage and analysis products will let customers get the same results by spending less, which will hurt revenue in the short term, but attract more clients in the future.“The full-year impact of that next year is quite significant,” Chief Executive Officer Frank Slootman said on a conference call Wednesday after the results were released. But “when customers see their performance per credit get cheaper, they realize they can do other things cheaper in Snowflake and they move more data into us to run more queries.”"

pykello超过 2 年前

(I am not affiliated with Keebo, although I had a recruiting meeting with them earlier this year)FWIW, Keebo (<a href="https://keebo.ai/" rel="nofollow">https://keebo.ai/</a>) tries to solve this problem & reduce your Snowflake bill by using Data Learning techniques. It can be configured to return exact results or approximate results.

评论 #32554807 未加载

georgewfraser超过 2 年前

The core claim of this article, that Snowflake doesn't implement optimizations that would reduce usage, is not true. Search optimized tables, partitioned tables, and per-second billing are all counterexamples.

jjfoooo4超过 2 年前

This is a kind of poor engineering writing in which the author finds a product to not be tailored to his precise tastes and concludes it is because the company is user hostile and/or doomed.The bit about Snowflake not being incentivized to care about costs are trivially untrue. The rest of the article perceives trade offs as simple feature gaps.For example, Snowflake gives the user more latitude to distribute workloads among “warehouses” than other offerings. With poor distribution the author will experience the workload provisioning issues he describes.

ramesh31超过 2 年前

I'm of the mind that Snowflake and Databricks are losing their value prop now that Delta Lake is open source and Iceberg is maturing. What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

评论 #32552826 未加载

评论 #32552853 未加载

mejakethomas超过 2 年前

It's not expensive.What it can do, successfully, with three engineers was previously impossible with dozens.What IS expensive is not being careful with it.

评论 #32557209 未加载

tommyphongs超过 2 年前

The article I doesn't have exprience with Snowflake but with Cloudera's tech stack on on-primise infrastructure. Both Cloudera and Snowflake use same approach: Separating computing and storage with main purpose: trade-of performance for scalability, easily maintaining without knowledge about user data, thus easily selling the solution to a wide range of customers without care about customer cost( maybe this also of them purpose). In my experience with Cloudera's tech stack, it become very complexity bruce-forced system, we need install HDFS for store data( storage layer), and Hive ( basically use Mysql to keep mapping between table and the hdfs file of that table)metadata store to keep HDFS's metadata, Impala to query engine( computing layer). Because computing layer don't know much about how data are organized, It is very limited when we want optimise our system, query like 'select * from TABLE limit 1' lead to scan overall data on many of hdfs file, and because Impala is memory computing engine, scan all table data lead to memory exceed, and because that, DA can't use sampling data to quickly manipulate with our data. Everything leads to the hell, and because many of things can effect to our system: HDFS, Impala, Hive metadata store, etc... so very hard to fix problem when it occurred.

cedricd超过 2 年前

I'm glad the author also points out how customer (mis)use can blow up data warehouse costs too. No matter how efficient Snowflake could get, using the warehouse too much or with unnecessary queries will ultimately have a larger impact.The trend in the data space currently is for usage to increase -- as more companies adopt dbt they're running more and more prebuilt (materialized views) queries on a scheduled basis, rather than on demand. This is overall a good thing in that data is becoming easier to manage and use, but it does come at an increase in warehousing costs.I think eventually the pendulum will swing back to tools that help optimize warehouse usage, as long as they allow for the same increase in productivity as dbt (disclosure - I work for one such company)

awinder超过 2 年前

I think the main metric that this is built on may be too coarse to derive the meaning that the article is. There’s conjecture that what’s driving this is more querying over the same dataset (more streamlit dashboards) but it could just as easily be expanding usage inside of companies. That’s what’s going on at my company right now, more teams using snowflake, more data being pushed in to replace existing workflows, etc.I’m also not sure I understand the dig at streamlit dashboards. If you’re running hardware and introduce new read workflows, eventually you’ll need more read replicas and you’ll pay more for it. Maybe you can argue that snowflake is doing this at a higher cost but the metric data is not available in the sources to make that claim.

falcolas超过 2 年前

Snowflake is a bit generic to easily find - and the article has no hyperlinks - anybody have a one sentence summary?EDIT: There it is: <a href="https://www.snowflake.com/" rel="nofollow">https://www.snowflake.com/</a>Data warehousing, basically.

评论 #32552547 未加载

flyinglizard超过 2 年前

Where does all this data go? It's processed and then what? Sent to decision makers? Used to run automated processes?I'm genuinely curious and would appreciate anyone who could show a real life example of this kind of pipeline where data is accumulated, then processed, then turned into revenue at the other end.I've implemented systems that do this but my experience is that accumulating data is (too) easy, processing it in a meaningful way is slightly more challenging but ultimately driving positive business processes according to this data, which require a lot of friction with employees (training, procedures, maintenance, support) is the most difficult part.

评论 #32553635 未加载

评论 #32554950 未加载

buremba超过 2 年前

I believe they need to focus on the performance at least nowadays because both Databricks & BigQuery are also great products and they push Snowflake in terms of feature-parity and performance.That being said, Snowflake is also pushing for the marketplace model where you publish your app natively to move your code where the customers environment is. If they become successful, the performance might not be the one of the incentives for the companies to go with Snowflake and the switching cost might be higher as companies will move more of their business logic embedded in the system.

epberry超过 2 年前

> Not providing observability to monitor and reduce costsVantage just launched this - <a href="https://www.vantage.sh/blog/vantage-launches-snowflake-support" rel="nofollow">https://www.vantage.sh/blog/vantage-launches-snowflake-suppo...</a>. The problems the author describes are almost exactly what we heard from customers:- list of users/queries that are the most expensive- alerts and notifications for costs- query timeout. Not something a third party can do but there is an interesting 'query tagging' feature for snowflake which Vantage supports.

toto444超过 2 年前

The competition is tough in the data warehousing industry, if Snowflake is expensive people will know. Current customers may not leave but it's going to be harder for them to get new customers.

评论 #32553223 未加载

YouWhy超过 2 年前

I often analyze tools as reduction from the space of problems × resources to the space of outcomes.Let's consider Snowflake in this paradigm- Problems: analytics on data that is not laid out in a way that's directly accessible for analysts.- Resources: SQL analysts, few or no competent data engineers, spare cash- Outcomes: run analytics at an industrial scale without requiring competent engineers or DevOps.Since Snowflake's optimal client gets very easily locked in, it follows up that saving said client's money is not something even the client would care about

teej超过 2 年前

From what I can tell, the author is incorrect about the example given in "Optimizer gremlins". I tested an example on my own data and micro-partition pruning was active.The issue with dbt models in Snowflake is that if you ever perform a full-refresh and don't sort it, you ruin any natural clustering that arises from an incremental model. I've run into this issue many times. Auto-clustering gets too expensive at scale and Snowflake doesn't give you much guidance on alternatives.

darksaints超过 2 年前

> We have 5–6 very good open-source data warehouse alternatives. We have Redshift, DataBricks, Firebolt, BigQuery, and likely a few other enterprise offerings, yet it is surprising how little training most companies have in negotiating and re-negotiating vendor contracts or in pushing for heavily discounted pricing.Small nit: Redshift isn't open source. I would also add Clickhouse, Citus, and TimescaleDB as majorly capable open source technologies with commercial offerings in this space.

jmacd超过 2 年前

Retrospectively, this is very similar to how most SaaS behaved when per user per month billing was first introduced. There were almost never any actual limits on the number of users you could add to the software, but you purchased a license for a certain number. Occasionally your account would be audited and you would be billed of the overage. It was always a significant penalty. The same was true for CPU based licenses for things like IIS, SQL Service, Oracle, etc.

KingOfCoders超过 2 年前

I have no Snowflake experience, but some limited BigQuery experience. And it's very easy for a small company to get to $100k/year bills without massive data.

评论 #32553454 未加载

评论 #32556960 未加载

评论 #32553204 未加载

0xbadcafebee超过 2 年前

> Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue.If they improve performance they can lower the cost to customers, which will make the product more attractive to prospective customers. But if they are already swimming in cash they may not feel the need to gain more customers.Only threats prompt companies to improve things. Threat of a competitor, threat of losing all their money, threat of bad PR, threat of regulation, threat to the stock price, etc.I see this every day in companies that don't care about managing their cloud costs. They waste money like crazy because they literally don't care if they lose money, because some exec doesn't care, or they got enough funding until the next round, etc. A couple years later another exec asks why the CISO/CTO is spending so much money without any ROI, and then everybody has to stop everything they're doing to shave pennies off cloud costs.Companies run by individual executives are insane. I don't understand why people allow companies to be run this way. I think a co-op where employees could be active participants in the running of the company would allow for more sane decision-making.

rnk超过 2 年前

What most commentators are missing here is that Snowflake had a significant revenue reduction when they improved the efficiency of their product, ie they could do more with less customer cost, less cpu use. This is similar to AWS lowering prices for many things steadily over time. Snowflake did this knowing that they would get less revenue, they would have less growth, and I suspect they also knew it would cause their stock price to go down. Here's an article on it from March, <a href="https://www.yahoo.com/now/snowflake-plunges-revenue-growth-outlook-223229218.html" rel="nofollow">https://www.yahoo.com/now/snowflake-plunges-revenue-growth-o...</a>.Certainly snowflake wants to make it easier for people to spend money and solve all their problems on that platform, every company wants that. But it's a very competitive world out there, and snowflake leaders aren't complete idiots - they have to keep lowering their prices when they can, otherwise new people will come along and do things cheaper.

wsostt超过 2 年前

Snowflake is so expensive that Capital One has developed a toolkit for managing your instance.<a href="https://www.capitalone.com/software/solutions/" rel="nofollow">https://www.capitalone.com/software/solutions/</a>

评论 #32557308 未加载

hobs超过 2 年前

I am like 95% sure that the MAX issue he mentions is wrong - I just modified some windowing function based approaches to the one he mentions and its several OOM faster because of partition elimination.Nonetheless I agree with the basic points of the article.

rsweeney21超过 2 年前

This is a great example of misaligned incentives.Another example of misaligned incentives is LinkedIn. LinkedIn charges $3/message. The more messages sent on their platform, the more money they make. They are not incentivized to help sales or recruiters target the right people. It can be a cash cow in the short term, but it creates a negative experience for your users.The fact that it has worked for so long is a testament to how strong network effects are.In the case of Snowflake, high switching costs will protect them for a while.

imwillofficial超过 2 年前

It’s easy to point out ways leaving in foot guns look predatory. But that’s not always the case.I work for AWS in billing, and the way we calculate bills is to try to et the customer the maximum discount.Things like calculating savings plan coverage from smallest to largest to maximize utilization, or turning on Reserved Instance sharing on by default within an org.I would say that the seemingly gouging behavior is more often than not technical or time constraints.

manassolanki超过 2 年前

Snowflake is expensive if not monitored properly, on top of that they provide minimal observability. There are some good features like auto suspend and auto resume for cost savings but still there is scope of optimisations. For ex, they will charge you for minimum 1 minutes even if your query is running only for 2 seconds.

jwie超过 2 年前

You would think they would be saving (and charging the customer!) a bundle not enforcing constraints on their tables.I’d be very interested to hear the Snowflake side of this decision, but to the customer it’s simply unforgivable to have cosmetic constraints on a database.

评论 #32553298 未加载

评论 #32552422 未加载

评论 #32552894 未加载

spullara超过 2 年前

Snowflake increases performance all the time and their customers just use more of it.

wiradikusuma超过 2 年前

So, what is Snowflake? (I assume it's snowflake.com) From Googling it looks like Google's BigQuery. So it's a DB?

评论 #32580780 未加载

throw8383833jj超过 2 年前

it all comes down to the cost of switching and willingness of users to switch. the higher the cost of switching the higher you can make your product's price. Otherwise, with an extremely low cost of switching, the cost will ultimately be driven to near zero as more and more competitors enter the landscape.

dstola超过 2 年前

"optimization gremlin" = dark-pattern to take as much money away from you as possile

tablespoon超过 2 年前

> RevOps managementAnd now "XxxOps" is a meaningless buzzword.

danielodievich超过 2 年前

Interesting article. Some of it accurate. Some not.>"Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue" Completely untrue. There is constant optimization of scheduler, execution process, global services, and compute fabric. The famous "we shipped AWS Graviton and it's like 10%" cheaper was something we did to ourselves. There is work underway to make FoundationDB faster/more efficient too that's totally out of this world. In short, nobody wants to burn extra CPU cycles and bill you for it.>"Disclose Hardware Specs" This isn't hard to find if you work with Snowflake's SE and Services, but it's not going to give you anything. The whole POINT of Snowflake is to hide all this nonsense and make it "just work". You want CPU and SSD metrics, feel free to use Databricks (many do) or whatever.Now, there IS something to be said about some sort of observability into query execution as it is going. There are constant discussions on that, and some of the new upcoming features (like programmatic access to query profiler) can open that up. But yeah, Snowflake is NOT something that will open up what's under the hood and it is super intentional>"Not adopting benchmarks" This goes around and everyone freaks out. Just profile your own work. Whatever. Nobody cares about benchmarks.>"Optimizer gremlins" Snowflake COULD do more to expose some of the internals. My job (and job of 100s of my services and technical SE colleagues) is to help customers understand what's happening under the hood. Some of the company's "make it simple" ethos COULD be a bit more open. However, much of the common things (MP pruning) can be solved by simple user education. I've lost count of how many customers I worked with who had 0 education in Snowflake and even like 20-30 minute intro in it made them open their eyes and go "woah, I get it now". On other hand, dozens of people told me that it was amazingly easy to use without training, and it IS!>"Improve the workload manager to increase throughput" Workload manager is considerably more complex and sophisticated than this guy tells us it is. I saw an internal presentation on its internals that I asked to convert to a confluence article which thankfully happened pretty quickly and lots of people benefitted. There is cost-based scheduling that takes expected resources of queries to schedule and also considers actual resources consumed, all very frequently and for every XP. I wish that article was public but I think it will not be made one, but still, it's definitely not FIFO.>"Not providing observability to monitor and reduce costs" This is valid feedback now and constantly what we do in services. New manageability features are coming to help with this. See CapitalOne or bunches of companies in this ecosystem.>"What companies that use Snowflake could do better? I agree with point about education. Huge portion of people using and abusing Snowflake don't have any formal education. Best think you can do is hire Snowflake PS or get a partner/SI, or just take a damn class, they are REALLY good.Source: 2 years in services at Snowflake with focus on perf, cost, and manageability.

msluyter超过 2 年前

Some of these complaints seem fair to me, some not as much. tl;dr -- Snowflake requires a fair bit of knowledge/effort to use optimally.I spent a number of months last year focused on lowering Snowflake spend. In the process I learned a ton about Snowflake and gained a fair amount of respect for the product. Respect as in "this is really great" as well as respect as in "I need to be on guard here or I'm going to get hurt."I think my biggest misconception at the outset was thinking of Snowflake like it's a relational database. It's not. Or rather, it is with a large number of caveats. Snowflake doesn't have b-tree indexes -- rather it has "clustering keys," which are sort of like coarse grained indexes that colocate data in micropartions, allowing queries to do micropartition pruning. If you have a well clustered table and you're filtering on your clustering keys, things will be great. But if not, or, for example you have to do multi-table joins on non-clustered columns, you'll suffer. So unless you have search optimization enabled (which costs more!), you have to retrain yourself away from "oh, just add an index here or there to make things fast" type of thinking you may have had working with Postgres or whatnot.Regarding the author's complaints about lack of observability, I generally found it pretty easy to analyze what was going on via the query_history table. And the built in query analyzer is quite helpful. We did add tags to our dbt runs, which was pretty easy, and I wrote a handful of queries to find like the most expensive dbt models. It wasn't really that hard.That said, dbt in particular provides a number of foot guns wrt Snowflake. Subqueries, as the author mentions, is one. We created some custom dbt macros to do things like instead of `select * from foo where x in (select * from blah)` -- if blah was small -- do a query on blah and write the query using a literal list, like `select * from foo where x in ('a', 'b', 'c', 'etc...').Another issue we discovered is that in dbt it's trivial to create views. But we found that if views get too deeply nested, Snowflake can't adequately do predicate pushdown. So big stacks of views on views are suboptimal.Another interesting one was tests. Dbt makes it trivial to perform null or uniqueness checks against a column. We found we were spending a lot on those tests that simply were doing something like `select * from blah where col is null`. On non-cluster key columns or complex views, these were causing full table scans. We took a number of steps to mitigate those issues. (Combining queries; changing where we did these checks in the dag). The way tests are scheduled is problematic as well. One "long pole" test will keep your warehouse up and using credits even after the other 99.9% of the tests have completed. After some analysis we separated long pole tests from the others and put them on different warehouses.I could go on and on, actually, but I think that provides a taste of some of the complexities involved. Like almost any tool, you have to really understand it to use it effectively. But it's all too easy for, say, analysts, who may be blissfully unaware of the issues above, to write really poorly performing SQL on Snowflake.

dboreham超过 2 年前

Because someone needs a new boat?