Having no-down-time when doing migrations is not an easy topic.<p>> These defer drops will happen at least 30 minutes after the particular migration referenced ran (in the next migration cycle), giving us peace of mind that the new application code is in place.<p>Pretty much that's what I've seen before. You do series of deployments + migration scripts.<p>Usually :<p>- Migrate with backward-compatible modifications of the schema<p>- Release application logic that takes advantage of both migrated / to-be-deprecated tables<p>- Possible extra migration to sync the new / old tables.<p>- Release application logic that stops using the to-be-deprecated tables.<p>- Migrate with destructive modifications to remove the old table.<p>Usually that's a huge complicated process which might be replaced with one migration and a few minutes of downtime. Sadly some companies can't afford it, so they do such changes with weeks planing.<p>I usually vote for having some planned non-working-hours downtime.
At GitLab we do the following:<p>We have two migration directories: db/migrate and db/post_migrate. If a
migration adds something (e.g. a table or a column) or migrates data that
existing code can deal with (e.g. populating a new table) then it goes in
db/migrate. If a migration removes or updates something that first requires a
code deploy, it goes in db/post_migrate. We combine this with a variety of
helpers in various places to allow for zero downtime upgrades (if you're using
PostgreSQL). For example, to remove a column we take these steps:<p>1. Deploy a new version of the code that ignores the column we will drop (this
is a matter of adding `ignore_column :column_name` in the right class).<p>2. Run a post-deployment migration that removes said column.<p>3. In the next release we can remove the `ignore_column` line (we require that
users upgrade at most one minor version for online upgrades).<p>The migrations in db/post_migrate are executed by default but you can opt-out by
setting an environment variable. In case of GitLab.com this results in the
following deployment procedure:<p>1. Deploy code with this variable set (so we don't run the post-deployment
migrations)<p>2. Re-run `rake db:migrate` on a particular host without setting this
environment variable.<p>For big data migrations we use Sidekiq to run them in the background. This
removes the need for a deployment procedure taking hours, though it comes with
some additional complexity.<p>More information about this can be found at the following places:<p>1. <a href="https://docs.gitlab.com/ee/update/README.html#upgrading-without-downtime" rel="nofollow">https://docs.gitlab.com/ee/update/README.html#upgrading-with...</a><p>2. <a href="https://docs.gitlab.com/ee/development/what_requires_downtime.html#doc-nav" rel="nofollow">https://docs.gitlab.com/ee/development/what_requires_downtim...</a><p>3. <a href="https://docs.gitlab.com/ee/development/background_migrations.html" rel="nofollow">https://docs.gitlab.com/ee/development/background_migrations...</a>
Researcher/author of a tool [0,1] also attempting to tackle this problem here.<p>Unfortunately zero-downtime schema changes are even more complex than suggested here. Although the expand-contract method as described in the post is a good approach to tackling this problem, the mere act of altering a database table that is in active use is a dangerous one. I've already found that some trivial operations such as adding a new column to an existing table can block database clients from reading from that table through full table locks for the duration of the schema operation [2].<p>In many cases it's safer to create a new table, copy data over from the old table to the new table, and switch clients over. However this introduces a whole new set of problems: keeping data in sync between tables, "fixing" foreign key constraints, etc.<p>If there are others researching/building tooling for this problem, I'd love to hear from you.<p>[0] <a href="http://github.com/quantumdb/quantumdb" rel="nofollow">http://github.com/quantumdb/quantumdb</a><p>[1] <a href="https://speakerdeck.com/michaeldejong/icse-17-zero-downtime-sql-database-schema-evolution-for-continuous-deployment-1" rel="nofollow">https://speakerdeck.com/michaeldejong/icse-17-zero-downtime-...</a><p>[2] <a href="http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-ddl-statements-mysqls-return/" rel="nofollow">http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-...</a>
We're using FlywayDB [0] and many of the ideas in Martin Fowler's _Evolutionary DB_ article [1]. When database changes are breaking, we use a pre-deploy and post-deploy "pair" to create an intermediate DB state that both application versions will run against.<p>[0] <a href="https://flywaydb.org/documentation/migrations" rel="nofollow">https://flywaydb.org/documentation/migrations</a><p>[1] <a href="https://www.martinfowler.com/articles/evodb.html" rel="nofollow">https://www.martinfowler.com/articles/evodb.html</a>
There is also anchor modeling [1] with quite a few publications [2] and even an online modelling tool [3] to play around with. It is essentially a method that yields a highly normalized bitemporal database model so that every prior database state, schema and data, is a subset of the current database state.<p>I was personally not able to use it yet but I really like the idea behind it. It is probably not the ideal choice in general due to its append only nature ever increasing the consumed storage space and due to its high degree of normalization which may or may not have a negative impact on performance when implemented on top of a rational database depending on the nature of your queries.<p>But if you do not have to deal with excessive amounts of data, if you expect a lot of schema evolution, if you need traceability of all changes for auditing purposes or such, then using anchor modeling might be worth a shoot.<p>[1] <a href="https://en.wikipedia.org/wiki/Anchor_modeling" rel="nofollow">https://en.wikipedia.org/wiki/Anchor_modeling</a><p>[2] <a href="http://www.anchormodeling.com/?page_id=360" rel="nofollow">http://www.anchormodeling.com/?page_id=360</a><p>[3] <a href="http://www.anchormodeling.com/modeler/latest/" rel="nofollow">http://www.anchormodeling.com/modeler/latest/</a>
This looks like a helpful approach to a tricky topic! I'm curious how the post-deploy migrations work when you run all your migrations at once, from the beginning, e.g. in Circle CI or when onboarding a new developer?<p>Also, do you do anything to protect yourself against migrations written at time <i>t_0</i> and then run much later at time <i>t_n</i>? I've seen a lot of problems there when migrations use application code (e.g. ActiveRecord models), and then that code changes. (My solution is to never call model code from migrations, but there are other ways.) This isn't really specific to your article I guess, but does your approach making managing that harder? Easier?<p>Does having that details table help at all when you have migrations on a long-lived branch that is merged after other migrations have been added? Or is the solution there just to rebase the branch and test before merging it in?<p>EDIT: I think this blog post by the Google SRE folks is great reading for people thinking about migrations and deployment:<p><a href="https://cloudplatform.googleblog.com/2017/03/reliable-releases-and-rollbacks-CRE-life-lessons.html" rel="nofollow">https://cloudplatform.googleblog.com/2017/03/reliable-releas...</a><p>I've never been comfortable with rolling back migrations in production, but their plan changed my mind that it can be done safely. Is your approach compatible with theirs?
This is cool but I've started reducing my need for migrations by just serializing most of the structured pieces of the record into a text string (json for example), and only have separate columns for things that need to be joined (usually an id column) or selected on/indexed. I find this approach prevents me from needing to do migrations nearly as often. If you find that you really do need to index on a new field, adding a column is the easiest kind of migration to do.<p>Of course sometimes you do still need to which is where strategies like this come in.
>Migrate database to new schema... spin up new instance<p>Sadly this is more of a pain in Django, since it does not write default values to the database. To be able to safely run the migration first, one must manually rewrite the SQL for the migration. [1]<p>[1]: <a href="http://pankrat.github.io/2015/django-migrations-without-downtimes/" rel="nofollow">http://pankrat.github.io/2015/django-migrations-without-down...</a>
Sigh, wishing there was some magic sauce for this. But currently will have to sit through a 14 HOUR downtime while my team upgrades our SonarQube DB to 6.7.
Just put your database schema in a VCS and get on with life: <a href="http://www.liquibase.org/" rel="nofollow">http://www.liquibase.org/</a>