Disclosure: I work on Google Cloud (but my advice isn’t to come to us).<p>Sorry to hear that. I’m sure it’s super stressful, and I hope you pull through. If you can, I’d suggest giving a little more information about your costs / workload to get more help. But, in case you only see yet another guess, mine is below.<p>If your growth has accelerated yielding massive cost, I <i>assume</i> that means you’re doing inference to serve your models. As suggested by others, there are a few great options if you haven’t already:<p>- Try spot instances: while you’ll get preempted, you do get a couple minutes to shut down (so for model serving, you just stop accepting requests, finish the ones you’re handling and exit). This is worth 60-90% of compute reduction.<p>- If you aren’t using the T4 instances, they’re probably the best price/performance for <i>GPU</i> inference. If you’re using a V100 by comparison that’s up to 5-10x more expensive.<p>- However, your models should be taking advantage of int8 if possible. This alone may let you pack more requests per part. (Another 2x+)<p>- You could try to do model pruning. This is perhaps the most delicate, but look at things like how people compress models for mobile. It has a similar-ish effect on trying to pack more weights into smaller GPUs, or alternatively you can do a lot simpler model (less weights and less connections also often means a lot less flops).<p>- But just as much: why do you <i>need</i> a GPU for your models? (Usually it’s to serve a large-ish / expensive model quickly enough). If you’re going to be out of business instead, try cpu inference again on spot instances (like the c5 series). Vectorized inference isn’t bad at all!<p>If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.<p>Remember, your users / customers won’t somehow be happier when you’re out of business in a month. Making all requests suddenly take 3x as long on a cpu or sometimes fail, is better than “always fail, we had to shut down the company”. They’ll understand!