AWS Tips, Tricks, and Techniques

295 pointsby sehropeover 11 years ago

15 comments

thomseddonover 11 years ago

Here's another one for you: distribute your s3 paths/namesBecause of the way s3 is designed, the place files are stored on the physical infrastructure is dependant on the prefix of the key name. I'm not exactly sure how much of the key name is used, but for example if you prefixed all you images with imges/....jpg it's highly likely they will all be stored on the same physical hardware.I know of at least two companies for whom this has caused large problems for, one of them is netflix. Imagine all the videos in a single bucket with key names "/video/breaking_bad_s1_e1.mp4" (a crude example I know), all requests hit the same physical hardware and under high load the hardware just can't keep up and this exact issue has apparently been the cause of more than one Netflix outage.The solution is simple, ensure your files have a random prefix ({uuid}.breaking_bad_s1_e1.mp4) and they will be spread around the datacentre :)

评论 #7146381 未加载

评论 #7149593 未加载

评论 #7152462 未加载

评论 #7146486 未加载

grosskurover 11 years ago

Another tip: IAM roles for EC2 instances.<a href="http://aws.typepad.com/aws/2012/06/iam-roles-for-ec2-instances-simplified-secure-access-to-aws-service-apis-from-ec2.html" rel="nofollow">http://aws.typepad.com/aws/2012/06/iam-roles-for-ec2-instanc...</a>Basically, the apps you run on EC2 often need to access other AWS services. So you need to get AWS credentials onto your EC2 instances somehow, which is a nontrivial problem if you are automatically spinning up servers. IAM roles solve this by providing each EC2 instance with a temporary set of credentials in the instance metadata that gets automatically rotated. Libraries like boto know to transparently fetch, cache, and refresh the temporary credentials before making API calls.When you create an IAM role, you give it access to only the things it needs, e.g., read/write access to a specific S3 bucket, rather than access to everything.

评论 #7147222 未加载

teraflopover 11 years ago

I would temper the suggestion to use Glacier with a warning: make sure you thoroughly understand the pricing structure. If you don't read the details about retrieval fees in the FAQ, it's easy to shoot yourself in the foot and run up a bill that's vastly higher than necessary. You get charged based on the peak retrieval rate, not just the total amount you retrieve. Details here: <a href="http://aws.amazon.com/glacier/faqs/" rel="nofollow">http://aws.amazon.com/glacier/faqs/</a>For example, suppose you have 1TB of data in Glacier, and you need to restore a 100GB backup file. If you have the luxury of spreading out the retrieval in small chunks over a week, you only pay about $4. But if you request it all at once, the charge is more like $175.In the worst case, you might have a single multi-terabyte archive and ask for it to be retrieved in a single chunk. I've never been foolhardy enough to test this, but according to the docs, Amazon will happily bill you tens of thousands of dollars for that single HTTP request.

评论 #7147325 未加载

sehropeover 11 years ago

OP here. Really cool to see people enjoying the write up.It started off as a list of misc AWS topics that I found myself repeatedly explaining to people. It seemed like a good idea to write them down.I'm planning on listing out more in follow up posts.

评论 #7146236 未加载

评论 #7146497 未加载

评论 #7147816 未加载

评论 #7147474 未加载

dmouratiover 11 years ago

This is all good stuff. AWS takes a while to grok but once you do, it offers so many new possibilities.The Aha! moment for me came when playing with SimianArmy, the wonderful Netflix OSS project and in particular, Chaos Monkey.Rather than build redundancy into your system, build failure in and force failures early and often. This will surface architectural problems better than any whiteboard.<a href="https://github.com/Netflix/SimianArmy" rel="nofollow">https://github.com/Netflix/SimianArmy</a>Also, check out boto and aws cli.<a href="http://aws.amazon.com/sdkforpython/" rel="nofollow">http://aws.amazon.com/sdkforpython/</a><a href="http://aws.amazon.com/cli/" rel="nofollow">http://aws.amazon.com/cli/</a>

评论 #7149559 未加载

ultimooover 11 years ago

I for one think that S3's server side encryption is amazing for most if not all data. We backup hundreds of gigabytes of data every day to S3 and enabling server side encryption enables us to not worry about generating, rotating, and managing keys and encryption strategies. It also saves us time since we don't have to compute the encryption or the decryption. The best part is that the AWS AES-256 server side Encryption at Rest suffices for compliance.Of course, the data that we store, while confidential, isn't super-mission-critical-sensitive data. We trust AWS to not peek into it, but nothing will be lost if they do.

评论 #7146234 未加载

评论 #7146204 未加载

评论 #7146872 未加载

kmfrkover 11 years ago

Cloud66 is a good cautionary tale of refusing the temptation to double-dip, when it comes to your AWS tokens:<a href="https://news.ycombinator.com/item?id=5685406" rel="nofollow">https://news.ycombinator.com/item?id=5685406</a><a href="https://news.ycombinator.com/item?id=5669315" rel="nofollow">https://news.ycombinator.com/item?id=5669315</a>

kuduover 11 years ago

I know this is about AWS, but it might be helpful to mention that DreamObjects (<a href="http://www.dreamhost.com/cloud/dreamobjects/" rel="nofollow">http://www.dreamhost.com/cloud/dreamobjects/</a>) is an API-compatible cheaper alternative to S3.

geoffroyover 11 years ago

Very good article, thanks. I did not know about S3 temporary Urls, might use that in the future.

评论 #7146713 未加载

mrfusionover 11 years ago

Question about the underlying SaaS product. I'm not understanding how a database client on the cloud can connect to a database server on my own machine?Or am I misunderstanding what it does?

评论 #7146435 未加载

评论 #7146280 未加载

aerlingerover 11 years ago

If you're running an application that runs on more than one server it's definitely worth checking out AWS OpsWorks. It's a huge time saver and extremely useful in integrating and managing setup, configuration, and deployment across a server/db/cache etc without any loss of control or customization.

sandGorgonover 11 years ago

So, in older reddit threads, I read about how you need to build Raid-1 EBS for all your ec2 servers as well as test your EBS storage, because they could be really bad.Is anybody doing this for their EC2 deployments and more importantly, automating this?

rschmittyover 11 years ago

Does anyone have experience with Reduced Redundancy Storage on S3?How often do you lose files? Do you run a daily job to check in on them?

评论 #7149672 未加载

alimoeenyover 11 years ago

thanks for the <a href="http://www.port25.com/" rel="nofollow">http://www.port25.com/</a> tip,

sparkzillaover 11 years ago

The first tip should be to use something cheaper and faster.

评论 #7145968 未加载