AWS: the good, the bad and the ugly

226 pointsby brkcmdover 12 years ago

17 comments

blantonlover 12 years ago

This is a great writeup and is completely on target for realistic deployments on AWS.We're a big user of AWS (well, relative, but we run about $10K/month in costs through AWS), so I'd like to supplement this outstanding blog post:* I cannot emphasize enough how awesome Amazon's cost cuts are. It is really nice to wake up in the morning and see that 40% of your costs are now going to drop 20% next month going forward. (Like: Recent S3 cost cuts). In Louisiana, we call this Lagniappe (A little something extra.) We don't plan for it, nor budget for it, so it is a nice surprise every time it happens.* We've also completely abandoned EBS in favor of ephemeral storage except in two places: some NFS and MySQL slaves that function as snapshot/backup hosts only.* If the data you are storing isn't super critical, consider Amazon S3's reduced redundancy storage. When you approach the 30-50TB level, it makes a difference in costs.* RDS is still just a dream for us, since we still don't have a comfort level with performance.* Elasticache has definitely been a winner and allowed us to replace our dedicated memcache instances.* We're doing some initial testing with Route 53 (Amazon's DNS services) and so far so good, with great flexibility and a nice API into DNS.* We're scared to death of AWS SNS - we currently use SendGrid and a long trusted existing server for email delivery. Twillo will is our first choice for an upcoming SMS alerting project.* If you are doing anything with streaming or high bandwidth work, AWS bandwidth is VERY expensive. We've opted to go with unmetered ports on a cluster of bare metal boxes with 1000TB.com. That easily saves us $1000's a month in bandwidth costs, and if we need overflow or have an outage there we can spin up temporary instances on AWS to provide short term coverage.

评论 #4941079 未加载

评论 #4940882 未加载

评论 #4940704 未加载

评论 #4940796 未加载

评论 #4941528 未加载

评论 #4941038 未加载

评论 #4941967 未加载

评论 #4940868 未加载

评论 #4940947 未加载

trotskyover 12 years ago

The elasticity, no capex and api management are great. But bending over backwards to deal with their terrible SAN, saturated layer 2 and low reliability easily wastes at least as much engineer time as it saves in ops. If the OP's 100 boxes are similar to most of the other AWS deployments I'm familiar with it's likely that'd be 10 or less actual machines to manage.Don't get me wrong - the idea is great and it's great for really low end workloads where the gotchas don't matter or really big systems where you'd be managing against most of those issues anyway.But in the middle of the two there is a dead space that I bet a lot of shops are stuck in - double digit instance numbers running a workload that a 2-4 servers could handle with plenty of leg room.If you're deciding on a cloud provider in 2012 I think it makes a lot of sense to shop around. There are lots of people doing the on demand api deployment thing now with different trade offs. I like joyent a lot (local reliable io) or providers with cloud and a colo area even if it's exorbitant - as paying five hundred dollars a month for instnaces one ssd could replace sucks.

评论 #4941123 未加载

评论 #4943109 未加载

leocover 12 years ago

> The failure mode of EBS on Ubuntu is extremely severe: because EBS volumes are network drives masquerading as block devices <a href="http://joyent.com/blog/magical-block-store-when-abstractions-fail-us" rel="nofollow">http://joyent.com/blog/magical-block-store-when-abstractions...</a> , they break abstractions in the Linux operating system. This has led to really terrible failure scenarios for us, where a failing EBS volume causes an entire box to lock up, leaving it inaccessible and affecting even operations that don’t have any direct requirement of disk activity.Assuming that I/O is reliable and fast-ish was a bad OS design decision in the 1970s, as the Unix guys themelves soon realised, though it was an understandable mistake at the time. Continuing to develop and use OSes with an I/O interface that assumes I/O is reliable and fast-ish, in 2012 - it's pretty bad, isn't it?

评论 #4941706 未加载

dogasover 12 years ago

We here at PipelineDeals also abandoned EBS-backed instances after their 2nd outage.Instead we rely on instances that use an instance-store root device. During the EBS outage, our instance store servers did not have any issues, while our EBS-backed servers really struggled throughout the day, with crazy high loads.<a href="http://devblog.pipelinedeals.com/pipelinedeals-dev-blog/2012/12/5/what-it-means-to-be-truly-geographically-redundant-on-aws.html" rel="nofollow">http://devblog.pipelinedeals.com/pipelinedeals-dev-blog/2012...</a>

评论 #4941620 未加载

评论 #4940536 未加载

patrickgzillover 12 years ago

Have you done any calculations as to what it would cost to rent say, 20 x $100 a month dedicated servers spread across multiple datacenters, that can do virtualization with OpenVZ, Xen, or KVM (takes care of network, power, bandwidth, hardware issues) vs. what you spend monthly with AWS?Bluntly it seems like you must have spent some dev or ops time learning all this and migrating away from EBS etc. even if you didn't hire someone.Frankly, if your bills are greater than $3K per month with AWS I question whether you are truly saving anything.(I figured that midsized instances vs. dedicated servers are about 5:1 in terms of performance)

评论 #4941249 未加载

评论 #4940472 未加载

malachismithover 12 years ago

Thank you so much for sharing this. It's about time people started pointing out that EBS is a disaster. If we all do, maybe Amazon will finally fix it.

jwilliamsover 12 years ago

Nice read, but I wish they had included the location (and AZs) that are in use. I've used Oregon, California and Virginia with different results.The comment around Ubuntu is interesting and I wish there was more detail there.We use mdadm to run RAID across multiple EBSes. mdadm is great, but has a kink that it will boot to a recovery console if there the volume is "degraded" (i.e. any failure). This is even if the volume is still completely viable due to redundancy. This is obviously very bad, as you've got no way of accessing the console. It's an unfortunate way to completely hose an instance.It's an easy one to miss, as you rarely test a boot process with a degraded volume. When it happens though - hurts a lot.(If you'd like to check on this, make sure you have "BOOT_DEGRADED=yes" in /etc/initramfs-tools/conf.d/mdadm).

评论 #4944241 未加载

IgorPartolaover 12 years ago

> For these reasons, and our strong focus on uptime, we abandoned EBS entirely, starting about six months ago, at some considerable cost in operational complexity (mostly around how we do backups and restores). So far, it has been absolutely worth it in terms of observed external uptime.So what do you use now for your persistent storage? This might be the most interesting part.

评论 #4940221 未加载

dirkthemanover 12 years ago

Excellent post, and it reflects my experience with AWS, too. I really like the concept, and love S3 (and Glacier), EBS is just too unreliable for us to rely on for our entire business. Nice read, thanks!

crcsmnkyover 12 years ago

I'd be curious to hear about their backup/restore procedures with just ephemeral storage.

评论 #4940211 未加载

评论 #4940509 未加载

dschiptsovover 12 years ago

Yes, ability to clone a new box you don't own in a minutes the only real advantage.And for that very uncommon requirements you have to pay per Kb and per hour rates for having no service or guarantee at all.And no, you still need a sysadmin who understand how AWS works and what to do when AWS says "Oops, your "_____" isn't available".)

daemon13over 12 years ago

From alestic.com>> Both EBS boot and instance-store AMI ids are listed, but I recommend you start with EBS boot AMIs.Why two opposite recommendations from alestic.com [authority on AWS] and practitioners?Not a flame - I am planning my AWS deploy strategy and need to make a decision between these two approaches.

评论 #4941689 未加载

FireBeyondover 12 years ago

Good article - it discusses things in a good way to get an overall quick look into how the ecosystem works.I'm yet to play with AWS on any significant level, but this is the kind of thing to bookmark.Further to that, are there any recommendations for books/sites/entries that discuss more best practices?

评论 #4940216 未加载

Shorelover 12 years ago

To me the decision is simple: if you need less than n servers, AWS is better and cheaper.OTOH, if you need equal or more than n servers, rolling your own dedicated servers is better and cheaper.The only issue is to determine the current value of n.

pjungwirover 12 years ago

I've run a few projects on AWS and agree with how much it simplifies your life, but like the OP the big sticking point is EBS. I wasn't aware that ELB relies on EBS. Good to know!I've also looked into ephemeral storage, but ultimately I decided to just rent a dedicated machine from elsewhere. Building a B2B site, I'm not as worried about massively scaling on a dime. The project has still used transient EC2 instances for a few odd things, though. It's nice to have that option when you need it!

jiggy2011over 12 years ago

How much does AWS save in terms of admin vs a standard VPS?I assume it runs a standard linux distro, therefor patches , firewalls, dependencies etc are still an issue surely?

评论 #4943142 未加载

评论 #4943329 未加载

novaleafover 12 years ago

from reading the "bad" parts in the article, man, what a pain in the butt. I guess i'm glad to be on Google app engine and not have to worry about this stuff.