Race-condition-free deployment with the "symlink replacement" trick

63 点作者 Deejahll超过 12 年前

15 条评论

viraptor超过 12 年前

Ok, one problem solved. Now what's left is - schema change, making sure ongoing process flows can automatically migrate from the previous version to the new one, resources referenced from the previous version are still valid, nothing tries to read files with the code via the link (it can change mid-request).I got the strange feeling from that article as if changing the code files was the hardest thing about yes upgrade.

评论 #4686951 未加载

Firehed超过 12 年前

This is a good system at face value, but can present other problems. Any code that has a stat cache (I know PHP does, and I'd be surprised if other common languages don't) suddenly doesn't realize that your paths are going to a different place. Because /var/www as your "base" directory is symlinked to /var/www.a, www.a is cached and when you swap your symlink to www.b to deploy the next version, anything relying on that stat cache (include directives, autoloaders, etc) suddenly starts pulling in the wrong version of the file.Solving this in a way that doesn't require restarting any services and doesn't introduce any more race conditions is nontrivial, although it tends to work pretty reliably so long as you have a front controller and it very rarely changes. Basically it comes down to a symlink change detector by hitting a special (internal) URL after your deploy script which kills the stat cache. If people are interested I can post a more concrete example.

评论 #4682267 未加载

评论 #4682290 未加载

troels超过 12 年前

I wonder what problem this is really solving. I mean, delete+create in a script will happen pretty fast after each other, so the moment of inconsistency is really very short. If this is an actual problem for you, chances are that your setup is rather large and you have multiple nodes behind a load balancer. In that case, you have bigger issues, such as making sure the individual nodes are updated at the same time. Usually this would be solved by taking them out of rotation while updating, in which case the atomic symlink switch becomes moot.

评论 #4682391 未加载

评论 #4683178 未加载

praptak超过 12 年前

Yeah, mv is atomic but I believe that a crash can still leave your data inconsistent due to write reordering, see:<a href="http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/25120" rel="nofollow">http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/2...</a>I would do a fsync() before switching the symlink to the new dir.

评论 #4682266 未加载

mjs超过 12 年前

This system ensures the server is always in a consistent state, but client race conditions are still possible if the "old" index.html references an asset that isn't available after the deployment has occurred. Is there any good way of dealing with this? (I just ignore it...)

评论 #4683509 未加载

评论 #4682931 未加载

评论 #4683195 未加载

评论 #4682495 未加载

Negitivefrags超过 12 年前

We use use this technique at Grinding Gear Games for our deployments but here are a few random assorted details about how we are set up.The first is that we find it's a good idea to have your release directories on the server named after tags from your VCS. Each time we want to do a deploy we just make a tag and the deployment script just takes the name of the tag to deploy as it's argument. It's very easy to see what version is deployed on a server by just looking at the address of the symlink.The second is that you should use rsync with the --link-dest option. --link-dest allows you to specify a previous directory that rsync can use to create hard links from for files that haven't changed. For example, if you a new version to deploy in a directory called "0.9.10/2" and on the remote server you have "0.9.10/1" currently deployed, you can "rsync 0.9.10/2 server:0.9.10/2 --link-dest 0.9.10/1". What this does is create a new dir tree in /2 with all the files that didn't change from /1 hard linked but with new copies for the files that did. This saves a lot of disk space and it means you can keep versions around on the server for as long as you feel the need to.As our deployment is ~8GB this is quite important for us. This means that we actually have releases sitting on the server for quite a while back.The third thing is setting something up so you can have simple versioning of your deployment scripts.We have a script that drives this whole process called "./realmctl". Deployment is split in to a 4 step process. You find scripts like this in each release dir like this:./0.9.10/1/prepare (create/upload new release)./0.9.10/1/stop (stop existing servers)./0.9.10/1/deploy (change symlinks over to this release)./0.9.10/1/start (start servers)Each of the releases contains it's own version of the script. That means if you issue a command like "./realmctl restart --release=0.9.10/2" then the script can find the stop script for the current version then run the deploy and start scripts for the new version. In this way if your deployment process changes between versions then you can still freely move around between versions without needing to worry about the version of your deployment scripts.The last thing is that it's really nice if your writing something similar for your scripts to have some idea about different parts of your infrastructure so that they can be controlled independently. It's really useful to be able to say something like "./realmctl restart all poe_webserver" (restart webserver processes on all servers) or "./realmctl stop ggg4 poe_instance" (stop the game instance servers on ggg4). Those kind of commands are really useful during an emergency.

评论 #4682915 未加载

sausagefeet超过 12 年前

I'd prefer to just bring the machine out of rotation, redeploy, bring it back.

Gigablah超过 12 年前

Rex [1] already supports deployment (and rollbacks) using symlink replacement.[1] <a href="http://rexify.org/modules/application_deployment.html" rel="nofollow">http://rexify.org/modules/application_deployment.html</a>

daenney超过 12 年前

Isn't this guy reinventing a simplified wheel? We already have tools like Capistrano and Fabric or Rex (which does quite a bit more than just application deployments).

评论 #4682434 未加载

评论 #4683789 未加载

rll超过 12 年前

This seems a bit oversimplified. Sure, there are no race conditions if there are no interactions between files, but if there are then swapping the symlink mid-request on requests that are already in progress will cause all sorts of race conditions.

评论 #4682242 未加载

standel超过 12 年前

mv is relying on operation "similar" to rename() defined by POSIX which specifies that it should be atomic.So, the assumption "On Unix, mv is atomic operation" is not true. If your underlying FS is fully POSIX-compliant, mv will be an atomic operation.I think it's important to stress it because there are some distributed FS that might even try to be POSIX-compliant but which are not guaranteeing atomic rename's and therefore this trick would not work well.

njharman超过 12 年前

Any deployment tool fab, capistrano, etc. should do this.My preferred layout is./releases/<datetime or rev or whatever makes sense for you stamped>./current symlink to ./releases/<foo>Keeping releases in directory by themselves make it easy to list them, archive old ones etc.

ralph超过 12 年前

Telling the server of the new root instead, which needs the relevant permissions as the author points out, would remove the symlink traversal on every access?

code_duck超过 12 年前

If you are on a shared host and can't restart your webserver, you probably don't have the ability to set the document root, either.

j_baker超过 12 年前

Is it just me, or does this seem like a lot of work just to avoid having assets be inconsistent for "some number of milliseconds"?

评论 #4683828 未加载