Persisting state between AWS EC2 spot instances

109 pointsby p8donaldover 7 years ago

19 comments

Persistent storage remains a complicated problem. Attaching volumes on the fly with docker volume abstraction works well enough for most cloud workloads, whether on-demand or spot, but it's still easy to run into problems.This is leading to rapid progress in clustered/distributed filesystems and it's even built into the Linux kernel now with OrangeFS [1]. There are also commercial companies like Avere [2] who make filers that run on object storage with sophisticated caching to provide a fast networked but durable filesystem.Kubernetes is also changing the game with container-native storage. This seems to be the most promising model for the future as K8S can take care of orchestrating all the complexities of replicas and stateful containers while storage is just another container-based service using whatever volumes are available to the nodes underneath. Portworx [3] is the great commercial option today with Rook and OpenEBS [4] catching up quickly.1. <a href="http://www.orangefs.org" rel="nofollow">http://www.orangefs.org</a>2. <a href="http://www.averesystems.com/products/products-overview" rel="nofollow">http://www.averesystems.com/products/products-overview</a>3. <a href="https://portworx.com" rel="nofollow">https://portworx.com</a>4. <a href="https://github.com/openebs/openebs" rel="nofollow">https://github.com/openebs/openebs</a>

评论 #15429306 未加载

评论 #15431312 未加载

solaticover 7 years ago

OP is offering some very dangerous advice.Twenty years ago, software was hosted on fragile single-node servers with fragile, physical hard disks. Programmers would read and write files directly from and to the disk, and learn the hard way that this left their systems susceptible to corruption in case things crashed in the middle of a write. So behold! People began to use relational databases which offered ACID guarantees and were designed from the ground up to solve that problem.Now we have a resource (spot instances) whose unreliability is a featured design constraint and OP's advice is to just mount the block storage over the network and everything will be fine?Here's hoping OP is taking frequent snapshots of their volumes because it sure sounds like data corruption is practically a statistical guarantee if you take OP's advice without considering exactly how state is being saved on that EBS volume.

评论 #15429382 未加载

评论 #15429208 未加载

评论 #15429139 未加载

bdcravensover 7 years ago

Spot instances can now "stop" instead of "terminate" when you get priced out, persisting the attached EBS volumes:<a href="https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec2-spot-can-now-stop-and-start-your-spot-instances/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...</a>

评论 #15429986 未加载

otterleyover 7 years ago

Even if you don't use spot instances, the technique of using separate EBS volumes to hold state is useful (and well-known). Ordinary on-demand instances can also be terminated prematurely due to hardware failure or other issues, so storing state on a non-root volume should be considered a best current practice for any instance type.

fulafelover 7 years ago

There's a mechanism exactly for this purpouse in Linux: pivot_root. It's used in the standard boot process to switch from the initrd (initial ramdisk) environment to the real system root.ec2-spotter classic uses this, but you can also make a pivoting AMI of your favourite Linux distribution.One thing to watch out for is how to keep the OS automatic kernel updates working. AMIs are rarely updated and you're going to have a "damn vulnerable linux" if you don't get the updates just after booting a new image.

js4allover 7 years ago

When you are using Kubernetes, you won't have to deal with this yourself. The Cluster will move pods from nodes that are stopped because the spot price is exceeded. Ideally place nodes at different bids. So there will be a performance hit but no outage. With the new AWS start/stop feature [1] nodes will come up again when the spot price sinks.1) <a href="https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec2-spot-can-now-stop-and-start-your-spot-instances/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec...</a>

yjftsjthsd-hover 7 years ago

TLDR: Attach EBS volume and use that to store Docker containers.I suppose it's a decent solution if you don't want to deal with prefixes.

Pirate-of-SVover 7 years ago

To make this even more streamlined you'd tag the volumes and discover the volumes with `aws ec2 describe-volumes` and filter unattached volumes with the magic tag.

评论 #15428417 未加载

stonewhiteover 7 years ago

We normally utilize spots with Spotinst + Elasticbeanstalk. Our billing looked great ever since.This solution looks good, yet only applies to single instance scenarios. I presume this kind of thinking might move forward with EFS + chroot for an actual scalable solution that cannot be ran on Elasticbeanstalk.

archgoonover 7 years ago

So I was pleasantly surprised to discover that for the last several years, spot instances have provided a mechanism that give you 2 minutes notice prior to shutdown:<a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html" rel="nofollow">http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-inte...</a>Learn something new everyday. :)<a href="https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termination-notices/" rel="nofollow">https://aws.amazon.com/blogs/aws/new-ec2-spot-instance-termi...</a>

评论 #15431631 未加载

sciurusover 7 years ago

The author goes to great lengths to come up with a way for the software that was running on a terminated spot instance to be relaunched using the same root filesystem on a new spot instance, but they never explain why they need to do exactly this. Maybe they already ran everything in Docker containers on CoreOS, so their solution isn't a big shift, but I strongly suspect they could find a simpler way to save and restore state if they got over this obsession with preserving the root filesystem their software sees.

olegkikinover 7 years ago

If you don't care about reliability, why not just get a cheap and powerful VPS? Paying $90/month for that machine is madness. I pay $6/month for 6GB RAM, 4 cores, 50GB disk.

评论 #15428362 未加载

评论 #15428196 未加载

评论 #15428214 未加载

评论 #15428213 未加载

ramananover 7 years ago

Well, one easy way when using Ubuntu-like distributions is to simply place your `/home` folder on a separate (persistent) EBS volume [1].With a few on-boot scripts to attach-volumes / start-containers, it should be fairly easy to get going as well.[1] <a href="https://engineering.semantics3.com/the-instance-is-dead-long-live-the-instance-8b159f25f70a" rel="nofollow">https://engineering.semantics3.com/the-instance-is-dead-long...</a>

评论 #15429767 未加载

likelynewover 7 years ago

I don't know why all the comments are saying this is bad idea. For me, one of thing for I use EC2 is deep learning. I just use spot GPU instance, attach overlayroot volume and launch jupyter notebook in it. Other things like google dataflow is not useful to me due to the price and the process of installing packages. I can also think of many other use cases for using some persistence volume for some manual task.

amqover 7 years ago

Wouldn't it be simpler to have the smallest possible instance run an NFS server? This would also have an additional bonus of scalability.Edit: or use AWS EFS

评论 #15428410 未加载

评论 #15428318 未加载

评论 #15428440 未加载

raverbashingover 7 years ago

Is it just me or to me spot instances should deal with work and not storage, and hence your (stateful) units of work should be in a Queue/DB? (in a non-spot instance)Attaching and detaching volumes is a good idea but I wouldn't use that to keep state

tuananhover 7 years ago

we use k8s at work. i just have to create PVC and when spot instance terminated along with the container; new container will be created and mount the PVC again automatically.

jdchernofskyover 7 years ago

Or you could just use Spotinst: <a href="https://spotinst.com/" rel="nofollow">https://spotinst.com/</a>

alex_dufover 7 years ago

It sounds wrong to try to keep the state across two ec2 instances. If you find yourself in that situation, try pushing your state outside the ec2 instance a bit harder. (dynamodb, s3 etc...)You will get a lot of benefit out of it, but may lose in performance, which is fine in 99% of the cases.