TechEcho

7 comments

spoondanover 10 years ago

This is a really good write-up.In consulting and mentoring on this topic, I've found a lot of engineers push back against how "dirty" it is to have multiple copies of the data around in different formats. It feels wrong to not have a single, authoritative data format at any given instant. If the idea is to change the column type, why not just `ALTER TABLE ... ALTER COLUMN` instead of `ALTER TABLE ... ADD`?But if you think about it, excepting trivial cases, once you're migrating data, there are parallel realities at least for the duration of the migration and deployment. It's not a question of whether you create divergence by versioning/staging (in some fashion) your data. It's a question of whether you manage the divergence and convergence of the parallel realities that already exist as part of a migration. If you don't, you either incur downtime or risk data corruption.One big win here is that, by being disciplined about your code and data changes, you can cleanly separate deployment from release. You can deploy a feature but have it disabled or only enabled for a subset of users. Releasing a feature means enabling its feature flag, not orchestrating a set of migrations, replications, and deployments.

nostrademonsover 10 years ago

When I was at Google this was the single worst problem we had in engineering, at least in terms of engineer-hours consumed. We came up with a bunch of solutions, a few of which (like protobufs) are open-sourced and many of which are just in the heads of the engineers who did them, but there's unfortunately no general solution to the problem. Sometimes I dream about a programming language that has thought through all these issues and includes "evolvability" as a first-class design constraint, but oftentimes these problems show up in multi-process situations where you may be using multiple programming languages.

tcopelandover 10 years ago

<pre><code> When you introduce a new API endpoint or format for data at rest, think hard </code></pre> Yup. I've added columns where I've used a datetime where a date would have sufficed and then regretted it later once tons of data was already in the table. Or added a varchar(255) and only later realized that that wasn't big enough. Sometimes the wrongness of a type only becomes clear down the road.<pre><code> If you're designing an experimental server-side feature, see if you can store the data off to the side (e.g., in a different location, rather than together with currently critical data) so you can just delete it if the experiment fails rather than being saddled with this data forever without a huge migration project. </code></pre> Yup, sometimes an extra join or lazy-load is well worth the isolation.

评论 #8363816 未加载

shykesover 10 years ago

When we introduced pluggable storage drivers in Docker 0.7, we wanted all existing data to work as usual (full compatibility of data at rest), but we also wanted to migrate the layout of the legacy storage system (based on AUFS) so that it would be "just another driver" instead of a perpetual special case. At the same time, we didn't have the luxury of a full-stop mandatory migration, because if anything went wrong, the upgrade would fail and the user would be stuck in a hairy half-migrated situation. Keep in mind we are not talking about a relational database, but directories used to mount the root filesystems of live containers. That means that some of those directories may be mounted and therefore unmovable. So we had to accomodate partial migration failure, and the possibility of a partially migrated install.So we shipped a migration routine which ran at startup every time and gave up (gracefully and atomically) at the slightest sign of trouble. Over time, we reasoned, each install would converge towards full migration, and the huge majority of containers would be migrated within seconds of the upgrade. The rest would be much easier to deal with if anybody had any trouble.Of course we had the luxury of a data structure which allowed this.

jamessantiagoover 10 years ago

At least for the server size, entity framework partially solves this with code first migrations. Using the code first model you can define your data types in code then have entity framework generate the appropriate sql code. If you change your data structure down the road you can autogenerate a migration that changes the database from one version to another. If you deploy a version that is a few migrations ahead then it will execute the proper migrations one after the other.For the client side it's usually a good idea to specify a versioning relationship between server and client. AWS, for example, you request the API version you want to use: <a href="http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGuide/APIUsage.html" rel="nofollow">http://docs.aws.amazon.com/AmazonSimpleDB/latest/DeveloperGu...</a>

tieTYTover 10 years ago

How does Erlang deal with these problems? It often touts minimal downtime and the ability to run updates to your code while it's running.I think that means you can have Process V1 and Process V2 running on the same server simultaneously. If they read from the same database, won't you run into issues?

评论 #8364995 未加载

评论 #8363952 未加载

w01feover 10 years ago

Author here, would love feedback on this, and also happy to answer any questions.

评论 #8396254 未加载

7 comments

spoondanover 10 years ago

nostrademonsover 10 years ago

tcopelandover 10 years ago

评论 #8363816 未加载

shykesover 10 years ago

jamessantiagoover 10 years ago

tieTYTover 10 years ago

评论 #8364995 未加载

评论 #8363952 未加载

w01feover 10 years ago

Author here, would love feedback on this, and also happy to answer any questions.

评论 #8396254 未加载

Migrations and Future Proofing

7 comments

Migrations and Future Proofing

7 comments