Mistakes I've Made in AWS

358 pointsby aomsover 3 years ago

25 comments

This is more just "missed optimization opportunities in EC2" than a statement about mistakes in AWS as a whole.If you want to talk systemic AWS mistakes you can make, we accidentally created an infinite event loop between two Lambdas. Racked up a several-hundred-thousand dollar bill in a couple of hours. You can accidentally create this issue across lots of different AWS services if you don't verify you haven't created any loops between resources and don't configure scaling limitations where available. "Infinite" scaling is great until you do it when you didn't mean to.That being said, I think AWS (can't speak for other big providers) does offer a lot of value compared to bare-metal and self-hosting. Their paradigms for things like VPCs, load balancing, and permissions management are something you end up recreating in most every project anyways, so might as well railroad that configuration process. I've experienced how painful companies that tried to run their own infrastructure made things like DB backups and upgrades that it would be hard to go back to a non-managed DB service like RDS for anything other than a personal project.After so many years using AWS at work, I'd never consider anything besides Fargate or Lambda for compute solutions, except maybe Batch if you can't fit scheduled processes into Lambda's time/resource limitations. If you're just going to run VMs on EC2, you're better off with other providers that focus on simple VM hosting.

评论 #28494208 未加载

评论 #28494423 未加载

评论 #28494160 未加载

评论 #28498469 未加载

评论 #28494154 未加载

评论 #28495225 未加载

评论 #28499055 未加载

igammaraysover 3 years ago

AWS is complexity-as-a-service. This is why, as a one-man company, I went baremetal[1]. One flat price, screaming fast performance, and massive scalability if you get a beefy enough machine[2]. I don't have time to fiddle with k8s, try to figure out AWS billing/performance tradeoffs, or deal with untraceable performance issues due to noisy neighbours and VM overhead. My disaster recovery plan is a simple DB dump script to S3, and I know I can get another baremetal server up and running in less than 20 minutes.[1] with IBM Cloud 1 year free startup credits[2] Let's Encrypt and StackOverflow run their entire databases on a single beefy baremetal machine. <a href="https://letsencrypt.org/2021/01/21/next-gen-database-servers.html" rel="nofollow">https://letsencrypt.org/2021/01/21/next-gen-database-servers...</a>

评论 #28492662 未加载

评论 #28491659 未加载

评论 #28494380 未加载

评论 #28491504 未加载

评论 #28491599 未加载

评论 #28506844 未加载

评论 #28494450 未加载

评论 #28494104 未加载

评论 #28495804 未加载

hughrrover 3 years ago

Biggest mistake I’ve made:Shifting any non trivial infrastructure into AWS verbatim is always more expensive than running it yourself. You need to rearchitect it carefully around the PaaS services to make a cost saving or even break even.An extreme example of this is it cousin who works for a small dev company doing LOB stuff. They moved their SQL box into EC2 and it’s costing more to run that single RDS instance than their entire legacy infra cost was per year.I’d still rather use AWS though. The biggest gain is not technology but not having to argue with several vendor sales teams or file a PO and wait for finance to approve it. All I do is click a button and the thing’s there.

评论 #28490786 未加载

评论 #28490615 未加载

评论 #28490609 未加载

评论 #28490803 未加载

评论 #28491174 未加载

danjacover 3 years ago

I've made it a habit to absolutely avoid any and all AWS services for any side projects, unless it's on the employer's dime. I'd rather pay a bit more per month for a flat-fee Digital Ocean droplet. Maybe I'll end up paying a few dollars more than I would with the equivalent AWS setup, but I'll rest easy knowing I won't get a surprise bill thanks to the opaque and byzantine billing. I mean, there are consultancies whose entire premise is expertise on AWS billing, so the chance of AWS newbie-me running up many thousands because I forgot to switch off service A or had the wrong setting for service B is non-zero.And the general advice is "don't worry, call their customer support and they'll refund you". Um, seriously? If I want to spend a morning on hold to deal with a huge unplanned bill I'll call my local tax office, thank you.Which sucks as I learn best by building things in my spare time, but AWS makes that learning process a bit more stressful than I'd prefer.

评论 #28490890 未加载

评论 #28491703 未加载

评论 #28490911 未加载

评论 #28490979 未加载

评论 #28491759 未加载

评论 #28491064 未加载

评论 #28491175 未加载

评论 #28490905 未加载

评论 #28491927 未加载

评论 #28511513 未加载

评论 #28491542 未加载

评论 #28491918 未加载

评论 #28491149 未加载

评论 #28491072 未加载

评论 #28492796 未加载

评论 #28491456 未加载

评论 #28492432 未加载

评论 #28493459 未加载

评论 #28491011 未加载

评论 #28491911 未加载

noir_lordover 3 years ago

I nearly made myself a very nice footgun not long since.So MediaConvert (video transcoding), direct s3 upload to s3 bucket, bucket fires event to my application, my application builds the job and submits it to media convert with the output bucket as the destination.Straight forward enough, unless you happen to be copying a config tired and put your input/output buckets as the same bucket...Fortunately previous-me was paranoid enough to have put in an if check and die if they where the same but otherwise that could have cost a lot of money.

评论 #28492386 未加载

helsinkiandrewover 3 years ago

Nothing for me compares to the time I purchased 2 reserved EC2 instances for about $5K on my personal account rather than companies. I can still remember that sinking feeling as I realized what I'd done.Amazon refunded the next day.

评论 #28490944 未加载

sebazzzover 3 years ago

In summary: Either overprovisioning, or not realising every extra CPU cycle or I/O operation costs extra money.This is, of course, the real way "the cloud" makes money. Carefully tuned, it can no doubt be cheaper than do-it-yourself, however, it is also quite easy to make a lot of costs.

评论 #28490540 未加载

nickjjover 3 years ago

My favorite billing mistake was forgetting to delete an unused elastic IP address and then realizing I was being charged $34 / month for 2 months just to have it exist while doing nothing.Edit: It's exactly $33.62 and I was mistaken on what caused it. It came from having a NAT Gateway just idling which is $0.045 per hour x 747 hours = $33.62 on us-east-1.I know it's not the biggest mistake ever, but these things creep up on you when you use CloudFormation and it continuously fails to delete resources so you're left having to manually trace through a bunch of resources. It's easy to leave things hanging.

评论 #28491354 未加载

tedk-42over 3 years ago

Few easy ones as well:1) Terminating instances that had ephemeral disks with stuff you needed while thinking the EBS volumes would remain2) Leaving NAT gateways lying around or ELBs that do nothing and have no instances attached.3) Public S3 buckets - arguably the most common one that can lead to security incidents4) Debugging security groups/Network ACLs and straight up break networking for something without knowing it. Reverse of that would be you want to fix something quickly and open 0.0.0.0/0 to everyone and never get around to tightening up the firewall later on.

评论 #28491666 未加载

评论 #28503199 未加载

mfrye0over 3 years ago

One of the biggest mistakes I made is not exploring spot instances and reserved instances earlier.I cut my bill by 70-80%% after paying full price for years...If you have an active web server or backend workers with fairly short jobs, spot instances will work for you.

评论 #28494635 未加载

zackmorrisover 3 years ago

I view AWS as a study in doing everything the "bare hands" way. Here are some examples of the old sysadmin ways of doing things vs the modern "web" way:* regions -> self-balancing algorithms like RAFT* roles/permissions -> tokens* IP address filtering -> tokens* CPU clusters -> multicore/containerization/Actor model* S3 -> IPFS or similar content-addressable filesystemsIt's not just AWS having to deal with this stuff either:* CORS -> Subresource Integrity (SRI)* server languages (CGI) -> Server-Side Includes (SSI)* Javascript -> functional reactive, declarative and data-driven components within static HTML* async -> sandbox processes, fork/join, auto-parallelization (seen mostly in vector languages but extendable to higher-level functions)* CSS -> a formal inheritance spec (analogous to knowing set theory vs working around SQL errata)I could go on forever but I'll stop there. We are living at a very interesting time in the evolution of the web. I think that web dev has reached the point where desktop dev was in the mid-1990s and is ripe for disruption. No disruption will come from the big companies though, so this is your chance to do it from your parents' basement!

评论 #28503231 未加载

lysecretover 3 years ago

Ok im going to admit to a mistake revolving around NAT gateways and Lambdas. So, i basically wanted to connect a Lambda to a Postgres / RDS database, for that I had to put into a private VPC, but the lambdas still had to talk to the world (a lot) so i just put a nat gateway around it no biggy. Well, end of the story on one day i produced 2000 Euro in cost for the Nat gateway haha

评论 #28503211 未加载

projectramoover 3 years ago

My biggest mistake: years ago I ended pushing personal credentials to GitHub at night and waking up to a several thousand dollar bill in the morning.Changed credentials and cancelled all the running instances only to find that I’d missed some.It was resolved by the afternoon.

评论 #28494277 未加载

jcimsover 3 years ago

I feel like large enterprises primarily see AWS as a way to outsource capital expenses.

评论 #28491790 未加载

评论 #28494586 未加载

评论 #28491672 未加载

unglaublichover 3 years ago

But what mistakes did he make? Did he screw up the bill? Did he fail to keep services available? I only read facts about the ins and outs of AWS' billing and credits system.

评论 #28490920 未加载

physiclesover 3 years ago

Burst CPU and IOPS has bitten me a couple times over the years. In fact, it’s basically the sole cause of nearly all our downtime in recent history. That’s frustrating. I get that it’s a technical solution to the problem of resource utilization at scale, but they could’ve spent some time making it easier to observe — for example, rescale the CPU or IOPS graphs so that 100% is your max sustained budget, and anything over 100% eats into your quota.

Kiroover 3 years ago

Slightly OT: I love Forge but recently I've started using it for my non-PHP projects which feels... wrong. Are there any similar services that are more agnostic?

评论 #28498276 未加载

steveBK123over 3 years ago

On billing.. they will never do it, but on smaller accounts they could build trust by offering some sort of "prepaid" mode like cell phone services do at the low end.That is - you deposit $X in your account, and AWS nukes your live services if you breach it. The worst that ever happens is you are out sunk cost of the $X you had already deposited.

wly_cdgrover 3 years ago

Heh, I like how Amazon literally took the boost mechanic from arcade racing games for the CPU credits in T2/T3

daneel_wover 3 years ago

"Technically they are a smidgen slower than Intel for certain workloads."In my experience, after migrating several servers with quite varying workloads, they're faster than Intel - and more than a smidgen. Just as is the general case with current AMD Ryzen vs Intel.

StratusBenover 3 years ago

[Disclosure] I'm Co-Founder and CEO of <a href="http://vantage.sh/" rel="nofollow">http://vantage.sh/</a>, a cloud cost platform for AWS. Previously I was a product manager at AWS and DigitalOcean.Since the author and so many people are commenting about AWS costs (and in particular, choosing cheaper EC2 instances and EBS volumes) I thought I'd mention that Vantage has recommendations that look to tell you for these exact things so you don't get tripped up / spend more than you have to.If you have "antiquated" EC2 instances or EBS volumes, Vantage will give you a recommendation for which instance to switch to and how much money you'll save.The first $2,500/month in AWS costs are also tracked for free so people get a lot of value out of the free tier and can save significant parts of their bills when developing on AWS.

评论 #28491794 未加载

评论 #28492111 未加载

dncornholioover 3 years ago

Mistakes? How about the flaws of that what is AWS and there terrible, terrible pricing system that rewards them for your mistakes.

jbverschoorover 3 years ago

Most common made mistake: assuming that your data is safe on an EC2 instance (ephemeral storage)

awinter-pyover 3 years ago

there should be a social media platform just for people to list their mistakes

defaultnameover 3 years ago

On a price sensitive project I almost exclusively used spot instances at a dramatically reduced price over on-demand. It forced me to built high availability elements into the design at the outside, though ultimately spot instances got shut down no less frequently than my experience with on demand maintenance and individual machine outages.Obviously mileage will vary, but going in I was under the impression that spot instances were on the knife's edge, when with a decent pricing strategy they're as robust as on demand at a fraction of the cost.

评论 #28491592 未加载

评论 #28491556 未加载