Ask HN: How can I quickly trim my AWS bill?

151 pointsby danicgrossalmost 5 years ago

Hi HN,I work with a company that has a few GPU-intensive ML models. Over the past few weeks, growth has accelerated, and with that costs have skyrocketed. AWS cost is about 80% of revenue, and the company is now almost out of runway.There is likely a lot of low hanging cost-saving-fruit to be reaped, just not enough people to do it. We would love any pointers to anyone who specializes in the area of cost optimization. Blogs, individuals, consultants, or magicians are all welcome.Thank you!

79 comments

boulosalmost 5 years ago

Disclosure: I work on Google Cloud (but my advice isn’t to come to us).Sorry to hear that. I’m sure it’s super stressful, and I hope you pull through. If you can, I’d suggest giving a little more information about your costs / workload to get more help. But, in case you only see yet another guess, mine is below.If your growth has accelerated yielding massive cost, I assume that means you’re doing inference to serve your models. As suggested by others, there are a few great options if you haven’t already:- Try spot instances: while you’ll get preempted, you do get a couple minutes to shut down (so for model serving, you just stop accepting requests, finish the ones you’re handling and exit). This is worth 60-90% of compute reduction.- If you aren’t using the T4 instances, they’re probably the best price/performance for GPU inference. If you’re using a V100 by comparison that’s up to 5-10x more expensive.- However, your models should be taking advantage of int8 if possible. This alone may let you pack more requests per part. (Another 2x+)- You could try to do model pruning. This is perhaps the most delicate, but look at things like how people compress models for mobile. It has a similar-ish effect on trying to pack more weights into smaller GPUs, or alternatively you can do a lot simpler model (less weights and less connections also often means a lot less flops).- But just as much: why do you need a GPU for your models? (Usually it’s to serve a large-ish / expensive model quickly enough). If you’re going to be out of business instead, try cpu inference again on spot instances (like the c5 series). Vectorized inference isn’t bad at all!If instead this is all about training / the volume of your input data: sample it, change your batch sizes, just don’t re-train, whatever you’ve gotta do.Remember, your users / customers won’t somehow be happier when you’re out of business in a month. Making all requests suddenly take 3x as long on a cpu or sometimes fail, is better than “always fail, we had to shut down the company”. They’ll understand!

评论 #23800917 未加载

评论 #23804351 未加载

kkielhofneralmost 5 years ago

AWS/clouds aren't always the best solution for a problem. Often they're the worst (just like any other tool).You don't provide a lot of detail but I imagine at this point you need to get "creative" and move at least some aspect of your operation out of AWS. Some variation of:- Buy some hardware and host it at home/office/etc.- Buy some hardware and put it in a colocation facility.- Buy a lot of hardware and put it in a few places.Etc.Cash and accounting is another problem. Hardware manufacturers offer financing (leasing). Third party finance companies offer lines of credit, special leasing, etc. Even paying cash outright can (in certain cases) be beneficial from a tax standpoint. If you're in the US there's even the best of both worlds: a Section 179 deduction on a lease!<a href="https://www.section179.org/section_179_leases/" rel="nofollow">https://www.section179.org/section_179_leases/</a>You don't even need to get dirty. Last I checked it was pretty easy to get financing from Dell, pay next to nothing to get started, and have hardware shipped directly to a co-location facility. Remote hands rack and configure it for you. You get a notification with a system to log into just like an AWS instance. All in at a fraction of the cost. The dreaded (actually very rare) hardware failure? That's what the warranty is for. Dell will dispatch people to the facility and replace XYZ as needed. You never need to physically touch anything.A little more complicated than creating an AWS account with a credit card number? Of course. More management? Slightly. But at the end of the day it's a fraction of the total cost and probably even advantageous from a taxation standpoint.AWS and public clouds really shine in some use cases and absolutely suck at others (as in suck the cash right out of your pockets).

评论 #23800785 未加载

评论 #23800371 未加载

评论 #23800190 未加载

stratifiedalmost 5 years ago

[DISCLAIMER] I work at AWS, not speaking for my employer.We really need some more details on your infrastructure, but I assume it's EC2 instance cost that skyrocketed?A couple of pointers:- Experiment with different GPU instance types.- Try Inferentia [1], a dedicated ML chip. Most popular ML frameworks are supported by the Neuron compiler.Assuming you manage your instances in an auto scaling group (ASG):- Enable a target tracking scaling policy to reactively scale your fleet. The best scaling metric depends on your inference workload.- If your workload is predictable (e.g. high traffic during the daytime, low traffic during nighttime), enable predictive scaling. [3][1] <a href="https://aws.amazon.com/machine-learning/inferentia/" rel="nofollow">https://aws.amazon.com/machine-learning/inferentia/</a>[2] <a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html" rel="nofollow">https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-sca...</a>[3] <a href="https://docs.aws.amazon.com/autoscaling/plans/userguide/how-it-works.html" rel="nofollow">https://docs.aws.amazon.com/autoscaling/plans/userguide/how-...</a>

评论 #23801153 未加载

solresolalmost 5 years ago

My pitch to help: you can probably replace the GPU-intensive ML model with some incredibly dumb linear model. The difference in accuracy/precision/recall/F1 score might only be a few percentage points, and the linear model training time will be lightning fast. There are enough libraries out there to make it painless in any language.It's unlikely that your users are going to notice the accuracy difference between the linear model and the GPU-intensive one unless you are doing computer vision. If you have small datasets, you might even find the linear model works better.So it won't affect revenue, but it will cut costs to almost nothing.Supporting evidence: I just completed this kind of migration for a bay area client (even though I live in Australia). Training (for all customers simultaneously) runs on a single t3.small now, replacing a very large and complicated set up that was there previously.

评论 #23799842 未加载

评论 #23799642 未加载

评论 #23802597 未加载

pixiemasteralmost 5 years ago

I‘m a CTO of a compute intensive AI SaaS company, so I can relate.One advice: speak to your AWS rep immediately. Get credits to redesign your system and keep you running. you can expect up to 7 digits in credits (for real!) and support for a year for free, they really want to help you in avoiding this.

评论 #23802297 未加载

评论 #23802305 未加载

kureikainalmost 5 years ago

I was in same situation.We bough 2 Dell servers via their financing program. Each server is about 19-25K. We paid AWS $60K per month before that. We pay $600 for co-location.So my advice is try to get hardware via financing of provider Dell had a good program I think.

评论 #23801128 未加载

评论 #23801164 未加载

评论 #23802078 未加载

评论 #23801290 未加载

QuinnyPigalmost 5 years ago

Howdy.I have loud and angry thoughts about this; <a href="https://www.lastweekinaws.com/blog/" rel="nofollow">https://www.lastweekinaws.com/blog/</a> has a bunch of pieces, some of which may be more relevant than others. The slightly-more-serious corporate side of the house is at <a href="https://www.duckbillgroup.com/blog/" rel="nofollow">https://www.duckbillgroup.com/blog/</a>, if you can stomach a slight decline in platypus.

评论 #23800078 未加载

fxtentaclealmost 5 years ago

You might be able to significantly lower your monthly bill in exchange for an upfront payment by purchasing your own servers and then renting co-location space.I'm CTO of an AI image processing company, so I speak from experience here.I personally use Hetzner.de and their Colo plans are very affordable, while still giving you multi GBit internet uplinks per server. If you insist on renting, Hetzner also offers rental plans for customer-specified hardware upon request. The only downside is that if you call a Hetzner tensorflow model from an AWS east frontend instance, you'll have 80-100 ms of roundtrip latency for the rpc/http call. But the insane cost savings over using cloud might make that negligible.Also, have you considered converting your models from GPU to CPU? They might still be almost as fast, and affordable CPU hosting is much easier to find than GPU options.I'm happy to talk with you about the specifics of our / your deployment via email, if that helps. But let me warn you, that my past experience with AWS and Google Cloud performance and pricing, in addition to suffering through low uptime at the hands of them, has made me somewhat of a cloud opponent for compute or data heavy deployments.So unless your spend is high enough to negotiate a custom SLA, I would assume that your cloud uptime isn't any better than halfway good bare metal servers.

staticassertionalmost 5 years ago

I'd suggest reaching out to AWS about this. Explain the situation. AWS has a number of programs for startups that you may be able to apply for, including one that includes 100k worth of credits.Also, if you can't afford to scale to new customers... stop? I'm sure it probably sucks, but like, does it suck more than having no runway? Seems like you'd be best served slowing things down and spending some time with AWS on cost optimization.There aren't a lot of details to go off of here so I don't know what more advice to give.

ssrsalmost 5 years ago

We've managed to reduce our spends by almost 50-60%. Some pointers: 1. Comb through your bill. Inspect every charge and ask "Why do we need this?" for every line item.2. If user latency is not a problem, choose the cheapest regions available and host systems there.3. Identify low usage hours (usually twilight hours) and shut systems off.4. Transition one-off tasks (cron, scheduling etc.) to lambda. We were using entire servers for this one thing that would run once a day. Now we dont.5. Centralize permissions to launch instances etc. within a few people. Make everyone go through these 'choke-points'. You might see reduced instances. Often engineers launch instances to work on something and then 'forget' to shut them off.6. Get AWS support involved. I'm pretty sure with the bills you are racking up you must have some AWS support. Get some of their architects etc. to check out your architecture and advise.7. Consider Savings Plans and Reserved Instances. Often you get massive cost savings.8. Consider moving some of the intensive number crunching to some of AWS' data crunching services. We moved a high-powered ELK stack for analyzing server logs to CloudWatch. A little more expensive in the short term, but we are now looking to optimize it.In my experience, AWS has been very supportive of our efforts at reducing costs. Even after a 50-60% reduction I still feel there is scope for another round of 50-60% reduction from the new baseline. All the best!

jayzalowitzalmost 5 years ago

Here's my deck on this @quinnypig is a great resource elsewhere in this thread. <a href="https://docs.google.com/presentation/d/1sNtFugQp_Mcq62gf4F1n0aJU9IjHmHyMKYOPXYoWYRU/edit" rel="nofollow">https://docs.google.com/presentation/d/1sNtFugQp_Mcq62gf4F1n...</a> Last year I cut 75 million in spend, so you could say I have a track record there.Are you sure you are using the right type for what you need to generate? Can you have your model generator self kill (stop) the instance when it finishes the model?100% If it doesnt need JIT go spot and build models off queuePut in for the activate program. They can give you up to 100k of credits.

评论 #23801719 未加载

sokoloffalmost 5 years ago

Don’t overlook the possibility to use your own physical hardware, running high-end commodity graphics cards (2080Ti, Titan RTX), especially for model training. (I haven’t found this to be overly effort or time intensive and the payoff is enormous on a dollars-basis.)You didn’t give enough details for someone to get really specific. I’m assuming from your text that the issue is inference not training costs, in which case there’s some great advice already posted, but more details might help.

calebkaiseralmost 5 years ago

I maintain an open source ML infra project, where we've spent a ton of time on cost optimization for running GPU-intensive ML models, specifically on AWS: <a href="https://github.com/cortexlabs/cortex" rel="nofollow">https://github.com/cortexlabs/cortex</a>If you've done zero optimization so far, there is likely some real low-hanging fruit:1. If GPU instances are running up a huge EC2 bill, switch to spot instances (a g4dn.xlarge spot is $0.1578/hr in US West (Oregon) vs $0.526/hr on demand).2. If inference costs are high, look into Inferentia ( <a href="https://docs.cortex.dev/deployments/inferentia" rel="nofollow">https://docs.cortex.dev/deployments/inferentia</a> ). For certain models, we've benchmarked over 4x improvements in efficiency. Additionally, autoscaling more conservatively and leveraging batch prediction wherever possible can make a real dent.3. Finally, and likely the lowest hanging fruit of all, talk to your AWS rep. If your situation is dire, there's a very good chance they'll throw some credits your way while you figure things out.If you're interested in trying Cortex out, AI Dungeon wrote a piece on how they used it to bring their spend down ~90%. For context, they serve a 5 GB GPT-2 model to thousands of players every day: <a href="https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-support-over-1-000-000-users-d207d5623de9" rel="nofollow">https://medium.com/@aidungeon/how-we-scaled-ai-dungeon-2-to-...</a>

glenngillenalmost 5 years ago

Speak to your AWS account manager and/or someone on their startup team. Give them the detail on what you’re running, what you want to do, and what/when you’re hoping to reach the next milestone. There’s usually a few different options available to them to try help you out. Including, but not limited to, working out how to reduce the ongoing cost of what you’re trying to do. “Customer obsession” and all that. It’s also just good business. It’s not in anybody’s interest to have companies running out of runway, they’d rather you were still in business and paying for compute 5 years from now.

lmeyerovalmost 5 years ago

Sounds familiar =\- get devs on GPU laptops- for always-on, where doable, switch to an 8a - 6p policy, and reserved. Call aws for a discount.- use g4dn x spot. Check per workload tho, it assumes single vs double.- consider if can switch to fully on-demand if not already , and hybrid via GCP's attachable GPUs- make $ more visible to devs. Often individuals just don't get it, too easy to be sloppy.More probably doable, but increasingly situation dependent

评论 #23800340 未加载

icedchaialmost 5 years ago

Can you use spot instances? If so you can pay a lot less for compute. Your app needs to tolerate being shutdown and restarted, however.Is there anything you can turn off at night? A lot of startups have staging / test systems that do not need to be running all the time.Are you keeping a lot of "junk" around that you don't actually need? Look at S3 objects, EBS snapshots, etc. A few here and there doesn't cost much, but it does add up.Are you using the correct EBS volume type? Maybe you're using provisioned IOPS where you don't need it.S3: make sure your VPC has an S3 endpoint. This isn't the default. Otherwise, you're paying a lot more to transfer data to S3.

tarun_anandalmost 5 years ago

I have replied to some of the comments below. My advice is to get off AWS or any public clouds and avoid them like the plague.They are too expensive for 95% of cases. If you are still not convinced DM me.

评论 #23803502 未加载

quickthrower2almost 5 years ago

While looking at the technical, also look at the commercial. Can you trace revenue sources to aws costs? In other words calculate your variable costs for each client/contract individually?Eg are there some clients losing you money that you can either let go or raise prices for?

aclellandalmost 5 years ago

If you can handle some interruption to your work then spot instances are probably going to be the biggest immediate change you can make.Right now a g4dn.xlarge is $0.526/h on demand but only $0.1578/h as a spot instance.You might also be eligible for a 10k grant from AWS - <a href="https://pages.awscloud.com/GLOBAL-other-LN-accelerated-computing-free-trial-2020-interest.html" rel="nofollow">https://pages.awscloud.com/GLOBAL-other-LN-accelerated-compu...</a>

chmod775almost 5 years ago

If cost is an issue, get off AWS. Immediately. You're paying about 10x what the same hardware/bandwidth would cost you if you just bought dedicated servers.

alFReD-NSHalmost 5 years ago

If you have the time to fix them asap you can follow this route:- use spot or reserved instances or saving plans. - have a look at compute optimizer - understand aws networking costs are and try to optimise it (cross az and internet egress can be costly) - go through the trusted advisor checks: <a href="https://aws.amazon.com/premiumsupport/technology/trusted-advisor/best-practice-checklist/" rel="nofollow">https://aws.amazon.com/premiumsupport/technology/trusted-adv...</a>You can enable trusted adviser checks by enabling business or enterprise support. - try using one of these cost optimisation tools: <a href="https://aws.amazon.com/products/management-tools/partner-solutions/#Resource_and_Cost_Optimization" rel="nofollow">https://aws.amazon.com/products/management-tools/partner-sol...</a>- contact aws for well architected reviewIf you don't have the time, then I suggest contacting AWS to introduce you to a consulting partner. They can come and actually fix whatever is needed.

whb07almost 5 years ago

You train the model locally and push it for inference to the cloud?What exactly are we talking about here?Couldn’t you build a dual NVIDIA 20XX / 32 core / 64 GB for a sub $5k and then save money while training/developing faster?

评论 #23799276 未加载

joshuaellingeralmost 5 years ago

GPU servers and coloc are pretty cheap these days. $1K/m rent per 20A of power. ROI on hardware is usually 3-4 months max (ie - for the cost the machine at AWS for 3-4 months, you can buy the same thing).Lead time might be a problem for you but you can probably do it in a under a month if you take available stock at your vendor. I work with a company called PogoLinux (<a href="http://pogolinux.com" rel="nofollow">http://pogolinux.com</a>) out of Seattle and they sell boxes that have 4 GPUs in them.That said -- the other advice is right. You can probably get by with a much simpler model. The coloc route would probably only be better if you are can't change the models due to people constraints and the ML stuff doesn't have a lot of AWS dependencies. SysAdmins are a lot easier to find and hire than ML specialists.

reilly3000almost 5 years ago

In terms of cost, I would recommend deeply interrogating the bill. Your data transfer cost is likely to be really higher than you expected, and there are lots of ways to mitigate that. GPUs are crazy expensive in the cloud, and really makes sense to host locally. There is also usually some money to be found with looking at S3 tiers - like Infrequent Access can save a lot if its good for your use case. Finally, if EC2 is a big cost driver, spot pricing and savings plans are good places to start.I will say that more generally speaking, there has been a lot of recognition in the industry at large that AI-driven startups all face this challenge, where the cost of compute eats up most of the margin. There is no easy solution to that, other than to make product-level decisions about how to add more value with less GPU time.

speedgoosealmost 5 years ago

AWS is super expensive. Switch to another cloud provider.For example : Scaleway, OVH, or Hetzner.

评论 #23801251 未加载

评论 #23800843 未加载

评论 #23800788 未加载

parsimo2010almost 5 years ago

I don't know how deep you've dug but the very first thing you should be doing is using spot instances instead of on demand instances (unless you absolutely can never wait to train a model). Spot instances are cheaper than on demand instances, with the downside that the price can fluctuate, so you need to build in a precaution for shutting down if the price gets too high. So if the price goes up, you either have to stop training until the price goes back down or to suck it up and pay a higher price.Luckily, it's pretty simple to handle interruptions for neural network like models that train over several iterations. Just save the model state periodically so you can shut the instance down whenever the price is too expensive and start training again when the price is lower.

Havocalmost 5 years ago

If you're running GPU heavy stuff all the time then you're probably better off just buying some GPUs outright and doing that part on-site.Especially if you can keep the own gear busy 24/7. i.e. run those 24/7 and any excess GPU use above that fall back onto cloud for that.

amenghraalmost 5 years ago

Talk to an AWS rep and also different cloud vendors. I know startups which received large amounts of free compute in their early days and then went on to become successful companies. I bet it was win-win for everyone involved.

nickjjalmost 5 years ago

If you're storing a lot of data I talked to someone who went from $3,000 a month to $3 a month by saving older dumps of their database into an S3 bucket instead of keeping many many old RDS snapshots from weeks / months ago around.Here's a direct timestamp link to that point in the podcast where it came from: <a href="https://runninginproduction.com/podcast/33-zego-lets-you-easily-buy-insurance-by-the-hour#55:46" rel="nofollow">https://runninginproduction.com/podcast/33-zego-lets-you-eas...</a>

bscanlanalmost 5 years ago

Segment's blog posts on cost optimisation have plenty of detail and tips on this topic:<a href="https://segment.com/blog/the-million-dollar-eng-problem/" rel="nofollow">https://segment.com/blog/the-million-dollar-eng-problem/</a> <a href="https://segment.com/blog/spotting-a-million-dollars-in-your-aws-account/" rel="nofollow">https://segment.com/blog/spotting-a-million-dollars-in-your-...</a> <a href="https://segment.com/blog/the-10m-engineering-problem/" rel="nofollow">https://segment.com/blog/the-10m-engineering-problem/</a>Similarly this Honeycomb writeup is also excellent: <a href="https://www.honeycomb.io/blog/treading-in-haunted-graveyards/" rel="nofollow">https://www.honeycomb.io/blog/treading-in-haunted-graveyards...</a>By the sounds of it, you need to take drastic action. It sounds like you will not be able to just optimise your AWS spend to get more runway, though you should definitely do some bill optimisation. You will need to optimise your product itself and maybe even getting rid of unprofitable customers.If you are not sure exactly who or what is driving the AWS cost, take a look at Honeycomb to get the ability to dive deep into what is eating up resources.

pavelevstalmost 5 years ago

AWS is one of most expensive hosting solution, I assume many of us somehow start to think that it’s a best one, in my opinion they all kinda same. Moving to other place will require effort but can let you reduce cost to 10-20% of current one. Some easy things that you can do with aws is to resize VMs, it will require to turn it off for a minute or so. Also can change to cheaper tire, eg t2 -> t3. Also can change VMs from ec2 to lifhtsail

godzillabrennusalmost 5 years ago

I help companies find bare metal options for training models. It’s usually 10-20% the cost of cloud.Email me lanec (at) hey (dot) com if you’d like to speak.Last year I took a company spending $24k/month training visual AI and cut that down to $3,500/month with bare metal. I also helped them secure over $100k in cloud credits to cover the monthly costs until the transition could happen.Training in the cloud is generally much more expensive than bare metal.

blickentwapftalmost 5 years ago

Run your own machines.You don’t have to use cloud services.

sandGorgonalmost 5 years ago

Simple answer. But the implementation is trickier.You have to use Spot instances. Or as Google calls them - preemptible instances. These are upto 80% cheaper.The caveat is that they can be killed anytime, so your infrastructure must be resumable.Most likely you will need to do kubernetes. It's the only framework that supports GPU, integrates with spot instance providers and works with Ml platforms (using kubeflow)

dautovrialmost 5 years ago

open source tools:- <a href="https://github.com/antonbabenko/terraform-cost-estimation" rel="nofollow">https://github.com/antonbabenko/terraform-cost-estimation</a>- <a href="https://github.com/cloud-custodian/cloud-custodian" rel="nofollow">https://github.com/cloud-custodian/cloud-custodian</a>- <a href="https://github.com/aws/amazon-ec2-instance-selector" rel="nofollow">https://github.com/aws/amazon-ec2-instance-selector</a>- <a href="https://github.com/rdkls/aws_infra_map_neo4j" rel="nofollow">https://github.com/rdkls/aws_infra_map_neo4j</a>commercial:- <a href="https://www.cloudhealthtech.com" rel="nofollow">https://www.cloudhealthtech.com</a>- <a href="http://densify.com/" rel="nofollow">http://densify.com/</a>- <a href="https://spot.io" rel="nofollow">https://spot.io</a>- <a href="https://www.hpcdlab.com" rel="nofollow">https://www.hpcdlab.com</a>

ramraj07almost 5 years ago

How about you just purchase some motherboards and GPUs and start running them in your office (assuming you're not bandwidth limited or looking for millisecond response times).. I'm always tempted to do this when we have fairly constant workload. Wasn't GPU instance pricing quite insane on AWS compared to actual GPU costs?

lazylizardalmost 5 years ago

This is not exactly it i imagine. But maybe longer term you could consider this.At my place people test on their desktops and run production stuff in the data center.Where are you located? These are prices in singapore..<a href="http://www.fuwell.com.sg/uploads/misc/Fuwell11072020.pdf" rel="nofollow">http://www.fuwell.com.sg/uploads/misc/Fuwell11072020.pdf</a>You're looking for a cpu, board, 64gb ram, maybe 2 x 2080ti, small ssd n psu(1000w?). You can leave these on ikea shelves n skip the casings if need be.. 3 x 2080ti makes the board expensive and psu hard to find...If you want more reliability. Get asus or supermicro. Or even sugon. 4gpu. 2u.So that's like a few kw per machine and you need to think about how much power you can draw per power socket..so usually the 2u stuff end up in datacenters.

JangoStevealmost 5 years ago

Someone else mentioned it already in these comments, but I'll mention again to make sure it's not missed. If you're a startup using AWS, apply for the AWS Activate program. All you need to do is apply, and they'll give you up to $100k AWS credits, which will last for up to 2 years and automatically be applied to your bill until they're used up.<a href="https://aws.amazon.com/activate/" rel="nofollow">https://aws.amazon.com/activate/</a>It's not a solution to the larger problem of business model and percentage of revenue going toward compute costs to provide your service, but there are a lot of other great recommendations and suggestions here for that. This could provide you some time to actually implement the other recommendations.

uchaalmost 5 years ago

nVidia forces cloud providers to use their expensive professional line-up. But other providers that use consumer GPUs are way cheaper, 4x or more. If your models don't need a lot of memory or double precision, providers such as GPUEater or RenderRapidly can be worth looking at.

MaxBarracloughalmost 5 years ago

Related reading: Ask HN: How did you significantly reduce your AWS cost? (2017) [0]The top comment is great. Two easy wins:* Putting an S3 endpoint in your VPC gives any traffic to S3 its own internal route, so it's not billed like public traffic. (It's curious that this isn't the default.)* Auto-shutdown your test servers overnight and on the weekendsSee also this thread from 2 days ago, Show HN: I built a service to help companies save on their AWS bills [1]Those threads aren't specific to GPU instances though.[0] <a href="https://news.ycombinator.com/item?id=15587627" rel="nofollow">https://news.ycombinator.com/item?id=15587627</a>[1] <a href="https://news.ycombinator.com/item?id=23776894" rel="nofollow">https://news.ycombinator.com/item?id=23776894</a>

jorgemfalmost 5 years ago

It is very unlikely that anyone is going to give you a good advice with so little information about your cost structure. There is great people here who can provide invaluable insights about your costs but they need to have more information.We use a lot of GPU intensive models and 80% of revenue goes into AWS, doesn't mean that your AWS cost is mostly GPU. It should mean that, but who knows. Tell us how is your AWS infrastructure, what instances do you have, how much do they cost to you, etc. Because with your information about the costs the best advice you can get is to not use AWS neither GPU-intensive ML models.

red0pointalmost 5 years ago

For the ML models you can also switch to dedicated server providers, such as: <a href="https://www.dedispec.com/gpu.php" rel="nofollow">https://www.dedispec.com/gpu.php</a>For storage, there‘s always Wasabi / B2 with S3 compatible interfaces. If the data itself is not changing that much, so regular backups are possible, just use some dedicated storage servers with hard drives and install MinIO. Do not rely on S3 for outgoing data (much too expensive), use a caching layer on another provider (OVH , Hetzner, ...), or if it fits your workload, Wasabi („free“ egress).

vmurthyalmost 5 years ago

At a startup I worked earlier, we tried two things that helped : 1. Reserved instances (you commit for a year and you can save 20% - charged monthly. AFAIK no upfront costs)2. Like another reader suggested here, there are accelerators/foundations which give away $10k for the 1st year towards cloud usage. We were in healthcare and had a big chip company pay about $10k in credits for a year of AWS. Depending on the domain you are in, there may be a few. If you let me know which domain you work in ( healthcare , media etc.) someone here might be able to point to the right resource

GauntletWizardalmost 5 years ago

Without any idea of what your infrastructure looks like, I can't give you anything actionable, but that might be enough advice in and of itself: go after the low hanging fruit first. What are you spending on? Look at the top two or three services by spend and dig a little deeper.Are you spending on bandwidth? See if there's compression you can enable. Ec2? Can your reduce instance sizes or autoscale down instances you're not using overnight? Elasticache or elastic search? Tune your usage, store smaller keys or expire things out.

curealmost 5 years ago

Start by looking at the breakdown of your costs in the cost analyzer. Look for the categories of your biggest spend. Is it storage? EC2? Something else? For storage; see if you can clean up things you don't need anymore. See if you can move infrequently used data into long-term, cheap storage (but beware retrieval costs!). For EC2, consider changing node types to cheaper ones. Newer classes are can be much better value for the money. Make sure you use spot instances where you can. Focus on the biggest expense first.

kavehkhorramalmost 5 years ago

Disclosure: I am the founder of Usage.aiThe product my team works on, www.usage.ai, automates the process of finding AWS savings using ML (through reserved instances, savings plans, spots, and rightsizing). Every recommendation is shown on a sleek webpage (we spent a lot of resources on UX).We haven't fully explored the ML use case, but I'd love to figure out how we can help you drive down the costs associated with your GPU models. Would you have 15 minutes this week for a discussion?If you're interested, you can reach me at kaveh@usage.ai

atlbeeralmost 5 years ago

Using large data stored in S3?Make sure you are fetching it via a S3 endpoint in your VPC instead of via public HTTP. You are paying for an (expensive) egress cost you don’t need to be paying for.

评论 #23803797 未加载

kichikalmost 5 years ago

Spot instances. Easy and saves a ton.

ISLalmost 5 years ago

If AWS cost is 80% of revenue, and the added cost per customer isn't paying for itself, perhaps one could either charge more or pause customer acquisition?

KaiserProalmost 5 years ago

We had the same problem!We managed to cut our costs by about 2/3rd by doing two things:1) moving to batch (this spins up machines to run docker containers without much hassle. You can also share instances) 2) use spot instances.Spot instances integrate nicely into batch, and depending on how you set it up, you can optimise for speed or cost. for example a p2.xlarge is still $.9 but on spot its about $0.25-0.35

tomcooksalmost 5 years ago

Dedicated server somewhere close to your office.

pvm3almost 5 years ago

This thread from yesterday might be useful <a href="https://news.ycombinator.com/item?id=23776894" rel="nofollow">https://news.ycombinator.com/item?id=23776894</a><a href="https://github.com/similarweb/finala" rel="nofollow">https://github.com/similarweb/finala</a> seems promising

rrrix1almost 5 years ago

I have a strong suspicion the OP is trolling, or at least his motives aren't obvious.Check his profile.He has people, or knows of people that can can likely help with this. He's a CEO of a an Accelerator and is not a newb by any sense.OR... he's using gamification to find someone to hire to actually help solve this problem. If that's the case... Bravo sir!

barraldalmost 5 years ago

Lots of people in here mentioning Reserved Instances, so it's worth mentioning Reserved AI (<a href="https://www.reserved.ai/" rel="nofollow">https://www.reserved.ai/</a>).We're customers and have been very happy with them, it was super quick to get set up and saves us a big chunk.

us0ralmost 5 years ago

A quick Google search for GPU dedicated server is probably going to save you tens of thousands of dollars a year.

stuntalmost 5 years ago

There are quick wins with spot instances and also Fargate. It’s hard to say anything without knowing type of workloads and compute that you have. But there is always opportunity to save there.Other than that, you should also look at your architecture. Often there is opportunity to save there as well.

chris_armstrongalmost 5 years ago

[Disclosure - my company sells a cost optimisation product]1. You are going to get a lot of advice to move to your own hardware - DON'T. Companies use cloud for the flexibility and lower operational overhead, not because it's cheap. Consider if your org is mature enough to run its own servers and has the 6 months it will take to get everything setup.2. Talk to your AWS account manager. They will work their asses off to stop you churning to another provider or to your own hardware, because they know they are losing your revenue entirely for minimum 2 years.3. Switch it off. If you're not using it outside of business hours, you're wasting money. This is the easiest cost saving you will make (my company, GorillaStack, provides a service that makes this easy to setup without scripting and a 14 day trial)4. If you have a baseline of servers that you will constantly need, reserved instances offer great savings. There is a secondary market for these, where you can get shorter periods cheap from other customers who don't need them.5. If you haven't already, look at your bill and the breakdown of services. Cost optimisation consultants (they do exist) will start here, and by attacking the biggest line items first.They are usually EC2 Compute, EBS, Transfer Costs, etc. Prioritise based on size and ease of implementation.You should make a habit of checking it at least every few days to keep on top what is going on.6. Delete unused resources - you need to be ruthless with developers who leave unused EC2 resources around after creation. The key isn't to lock down your environment and stop developers from creating what you need, but enforcing a tagging policy on resources to track who owns what. There are crude scripts that scan your environment and delete untagged resources after a certain period.7. Once things are under control, use CloudWatch Cost Alarms to get notifications when things are crossing a predefined threshold. These can be connected to SNS for receiving emails (and there are simple deployable solutions for receiving these via Slack webhooks, for example).Some further advice: 'right-sizing' is often held up as an important cost saving method, but can often be much more trouble than its worth. If your workload is going to be a pain and require endless planning and regression testing when you switch instance size, reconsider - you will waste more in developer time than the cost savings over a few years.

user5994461almost 5 years ago

Make sure to tag every instance/resource/disk in AWS, with their purpose, team, etc...Then you can go into the AWS costs explorer and see the costs breakdown per tag.Usually there will be a few resources standing out. 80% of the costs is 20% of resources. Find out what they are and cut.

zeckalphaalmost 5 years ago

Lots of comments with tips on “how”, but your last paragraph makes it sound like your are looking for a “who”.I’ve heard good things about <a href="https://www.duckbillgroup.com/" rel="nofollow">https://www.duckbillgroup.com/</a>

goatherdersalmost 5 years ago

If you are growing and/or have funding the other cloud providers will throw credits at you.If you arent growing or have funding then go to a less expensive host. There are TONs of high quality hosts out there that are quite a bit less expensive.

rohanaedalmost 5 years ago

Please check this Show HN thread - <a href="https://news.ycombinator.com/item?id=23776894" rel="nofollow">https://news.ycombinator.com/item?id=23776894</a>A developer has created an app for this same purpose

xendoalmost 5 years ago

Depending on the entropy of your input caching may be a way out. Sometimes if you can't cache end result you can cache some intermediate results.I would assume that if you are big enough you may be able to negotiate some pricing.

textientalmost 5 years ago

1Cloudhub an AWS partner has a very affordable product called Costimize for AWS to help save costs. I will be happy to help. Contact Sankar.nagarajanAT1cloudhub.comSecondly, you can use AWS spot instances for GPUs which costs less.

benjaminwoottonalmost 5 years ago

Reserved instances are usually the lowest hanging fruits depending on your usage profile and how much you can commit to and/or pay for up front. Savings of 30%+ are very achievable.

literallycanceralmost 5 years ago

AWS likely has a retention department that can give you discounts or credits to make you stay. Ask for credits and use the extra time to set up your own hardware.

billmanalmost 5 years ago

If your computational load is spikey, I would suggest looking at fargate and the spot market. Also for storage, I would suggest leveraging S3 whenever possible.

byko3yalmost 5 years ago

The answer is simple: don't use AWS. You will never get out of this hole unless you move from AWS, because AWS is not scalable budget-wise.

postitalmost 5 years ago

Take a look in the cost explorerA low hanging fruit are spot instances if you can manage stateless sessions.If you have multiple snapshots that could cost money as well

pachicoalmost 5 years ago

It would be helpful to know which services you use. Do you use ML services or instances with GPU? Where is most of the cost?

tonymetalmost 5 years ago

Post your bill. You often have unexpected charges for ingress or services you are unaware

tmwedalmost 5 years ago

an acquaintance of mine has a business that specializes in the problem you’re facing. please feel free to reach out to them: <a href="https://www.taloflow.ai/" rel="nofollow">https://www.taloflow.ai/</a>

dvfjsdhgfvalmost 5 years ago

Inless GPUs are an absolute must, just use Hetzner and never look back at AWS.

unixheroalmost 5 years ago

Switch from RDS to lightsail instances for trivial dB workloadsCould also apply for EC2

alzaeemalmost 5 years ago

if using deep learning models, consider using distilled and/or quantized models to reduce the resources required for inference

gshdgalmost 5 years ago

Reserved instances? Instant 20-40% savings.

juskreyalmost 5 years ago

Real hardware with colocation

A21zalmost 5 years ago

Blog article about real-life AWS cost optimization strategy :<a href="https://medium.com/teads-engineering/real-life-aws-cost-optimization-strategy-at-teads-135268b0860f" rel="nofollow">https://medium.com/teads-engineering/real-life-aws-cost-opti...</a>TL;DR:- Monitor your daily costs- Cloudify your workloads- Use reservations, Spot instances, saving plans- Look around for unused infrastructure- Optimize S3 consumption

评论 #23800795 未加载

lumostalmost 5 years ago

These are the biggest ways to lower cost that I've used in the past, with a high burn rate it's important to focus on the things that can change the economics on a short timeline ( think next week ), as well as activities on a longer-timeline ( next year ). You should have a plan in place for your board - and be able to discuss the cost reduction strategy for Cost of Goods Sold in any future financing rounds. Carefully consider the full TCO - buying colo hardware means opting out of ~3 years of future price reductions/hardware improvements in the cloud + opportunity cost.1) Call your provider and find out what options they have to cut your cost. This can take the form of discounts, credits, or increased reservations2) It's not uncommon for ML teams to have excess capacity sitting around for forgotten R&D activities. Make sure that you're team is tearing down hardware, consider giving all scientists their own dedicated workstation for model development activities. You can smoke test the opportunity here by verifying that the GPUs are actually being utilized to ~40-80% average capacity.3) Really dive into whether you need the parameters/model architecture you have. The best model for your company will need to balance latency/cost with accuracy. If you're using a transformer where a CNN or even a logistic regression with smart feature extractors could do with 1% accuracy loss. Then do your customers really need the transformer?4) As others have suggested drill-down on the inference and training costs. Train less frequently/not at all/or sample your data. Generally the benefit of using more data in a model is logarithmic at best vs. the linear training time.5) Buy your own hardware, particularly for GPU inference RTX cards can be purchased in servers for your own colo - but not in clouds. The lead time would be a few months but the payoff could occur within ~2-6 months in a colo.6) Leaving this here as it used to affect Analytics/Ad-Tech and other "big-data" companies. Programming languages are not created equal in performance, and given equal implementations a statically typed language will crunch data between 10 and 1000x faster and cheaper than a dynamically typed language. If your business is COGS pressed then your team will probably spend more time trying to optimize hardware deployments and squeezing perf out of your dynamic language than you gain in productivity. Drill down on your costs and check how much of it is raw data-processing/transaction scheduling/GPU scheduling and make sure that you're on the right tech path for your customers.Lastly at an 80% Cost of Goods Sold(COGS) it's quite possible that your business is either low margin or the pricing structure isn't well aligned as this is a new startup - ask yourself if you expect to raise prices for future non-founding customers. If so then it's possible that your current customers are helping reduce your marketing expenditures, and you may be able to leverage the relationship to help "sell" to future customers.

mlthoughts2018almost 5 years ago

Don’t use GPUs at inference (serving) time unless you prove that you need to.The only consistent case when I’ve found it’s needed (across a variety of NLP & computer vision services that have latency requirements under 50 milliseconds) is for certain very deep RNNs, especially for long input sequence lengths and large vocabulary embeddings.I’ve never found any need for it with deep, huge CNNs for image processing.Also consider a queue system if utilization is a problem switching from GPU. Create batch endpoints that accept small batches, like 8-64 instances, and put a queue system in front to mediate collating and uncollating batch calls from the stream of all incoming requests (this is good for GPU services too).