TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Race-condition-free deployment with the "symlink replacement" trick

63 pointsby Deejahllover 12 years ago

15 comments

viraptorover 12 years ago
Ok, one problem solved. Now what's left is - schema change, making sure ongoing process flows can automatically migrate from the previous version to the new one, resources referenced from the previous version are still valid, nothing tries to read files with the code via the link (it can change mid-request).<p>I got the strange feeling from that article as if changing the code files was the hardest thing about yes upgrade.
评论 #4686951 未加载
Firehedover 12 years ago
This is a good system at face value, but can present other problems. Any code that has a stat cache (I know PHP does, and I'd be surprised if other common languages don't) suddenly doesn't realize that your paths are going to a different place. Because /var/www as your "base" directory is symlinked to /var/www.a, www.a is cached and when you swap your symlink to www.b to deploy the next version, anything relying on that stat cache (include directives, autoloaders, etc) suddenly starts pulling in the wrong version of the file.<p>Solving this in a way that doesn't require restarting any services and doesn't introduce any more race conditions is nontrivial, although it tends to work pretty reliably so long as you have a front controller and it very rarely changes. Basically it comes down to a symlink change detector by hitting a special (internal) URL after your deploy script which kills the stat cache. If people are interested I can post a more concrete example.
评论 #4682267 未加载
评论 #4682290 未加载
troelsover 12 years ago
I wonder what problem this is really solving. I mean, delete+create in a script will happen pretty fast after each other, so the moment of inconsistency is really very short. If this is an actual problem for you, chances are that your setup is rather large and you have multiple nodes behind a load balancer. In that case, you have bigger issues, such as making sure the individual nodes are updated at the same time. Usually this would be solved by taking them out of rotation while updating, in which case the atomic symlink switch becomes moot.
评论 #4682391 未加载
评论 #4683178 未加载
praptakover 12 years ago
Yeah, mv is atomic but I believe that a crash can still leave your data inconsistent due to write reordering, see:<p><a href="http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/25120" rel="nofollow">http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/2...</a><p>I would do a fsync() before switching the symlink to the new dir.
评论 #4682266 未加载
mjsover 12 years ago
This system ensures the <i>server</i> is always in a consistent state, but client race conditions are still possible if the "old" index.html references an asset that isn't available after the deployment has occurred. Is there any good way of dealing with this? (I just ignore it...)
评论 #4683509 未加载
评论 #4682931 未加载
评论 #4683195 未加载
评论 #4682495 未加载
Negitivefragsover 12 years ago
We use use this technique at Grinding Gear Games for our deployments but here are a few random assorted details about how we are set up.<p>The first is that we find it's a good idea to have your release directories on the server named after tags from your VCS. Each time we want to do a deploy we just make a tag and the deployment script just takes the name of the tag to deploy as it's argument. It's very easy to see what version is deployed on a server by just looking at the address of the symlink.<p>The second is that you should use rsync with the --link-dest option. --link-dest allows you to specify a previous directory that rsync can use to create hard links from for files that haven't changed. For example, if you a new version to deploy in a directory called "0.9.10/2" and on the remote server you have "0.9.10/1" currently deployed, you can "rsync 0.9.10/2 server:0.9.10/2 --link-dest 0.9.10/1". What this does is create a new dir tree in /2 with all the files that didn't change from /1 hard linked but with new copies for the files that did. This saves a lot of disk space and it means you can keep versions around on the server for as long as you feel the need to.<p>As our deployment is ~8GB this is quite important for us. This means that we actually have releases sitting on the server for quite a while back.<p>The third thing is setting something up so you can have simple versioning of your deployment scripts.<p>We have a script that drives this whole process called "./realmctl". Deployment is split in to a 4 step process. You find scripts like this in each release dir like this:<p>./0.9.10/1/prepare (create/upload new release)<p>./0.9.10/1/stop (stop existing servers)<p>./0.9.10/1/deploy (change symlinks over to this release)<p>./0.9.10/1/start (start servers)<p>Each of the releases contains it's own version of the script. That means if you issue a command like "./realmctl restart --release=0.9.10/2" then the script can find the stop script for the current version then run the deploy and start scripts for the new version. In this way if your deployment process changes between versions then you can still freely move around between versions without needing to worry about the version of your deployment scripts.<p>The last thing is that it's really nice if your writing something similar for your scripts to have some idea about different parts of your infrastructure so that they can be controlled independently. It's really useful to be able to say something like "./realmctl restart all poe_webserver" (restart webserver processes on all servers) or "./realmctl stop ggg4 poe_instance" (stop the game instance servers on ggg4). Those kind of commands are really useful during an emergency.
评论 #4682915 未加载
sausagefeetover 12 years ago
I'd prefer to just bring the machine out of rotation, redeploy, bring it back.
Gigablahover 12 years ago
Rex [1] already supports deployment (and rollbacks) using symlink replacement.<p>[1] <a href="http://rexify.org/modules/application_deployment.html" rel="nofollow">http://rexify.org/modules/application_deployment.html</a>
daenneyover 12 years ago
Isn't this guy reinventing a simplified wheel? We already have tools like Capistrano and Fabric or Rex (which does quite a bit more than just application deployments).
评论 #4682434 未加载
评论 #4683789 未加载
rllover 12 years ago
This seems a bit oversimplified. Sure, there are no race conditions if there are no interactions between files, but if there are then swapping the symlink mid-request on requests that are already in progress will cause all sorts of race conditions.
评论 #4682242 未加载
standelover 12 years ago
mv is relying on operation "similar" to rename() defined by POSIX which specifies that it should be atomic.<p>So, the assumption "On Unix, mv is atomic operation" is not true. If your underlying FS is fully POSIX-compliant, mv will be an atomic operation.<p>I think it's important to stress it because there are some distributed FS that might even try to be POSIX-compliant but which are not guaranteeing atomic rename's and therefore this trick would not work well.
njharmanover 12 years ago
Any deployment tool fab, capistrano, etc. should do this.<p>My preferred layout is<p>./releases/&#60;datetime or rev or whatever makes sense for you stamped&#62;<p>./current symlink to ./releases/&#60;foo&#62;<p>Keeping releases in directory by themselves make it easy to list them, archive old ones etc.
ralphover 12 years ago
Telling the server of the new root instead, which needs the relevant permissions as the author points out, would remove the symlink traversal on every access?
code_duckover 12 years ago
If you are on a shared host and can't restart your webserver, you probably don't have the ability to set the document root, either.
j_bakerover 12 years ago
Is it just me, or does this seem like a lot of work just to avoid having assets be inconsistent for "some number of milliseconds"?
评论 #4683828 未加载