Managing database schema changes without downtime

267 pointsby jbaviatabout 7 years ago

13 comments

drinchevabout 7 years ago

Having no-down-time when doing migrations is not an easy topic.> These defer drops will happen at least 30 minutes after the particular migration referenced ran (in the next migration cycle), giving us peace of mind that the new application code is in place.Pretty much that's what I've seen before. You do series of deployments + migration scripts.Usually :- Migrate with backward-compatible modifications of the schema- Release application logic that takes advantage of both migrated / to-be-deprecated tables- Possible extra migration to sync the new / old tables.- Release application logic that stops using the to-be-deprecated tables.- Migrate with destructive modifications to remove the old table.Usually that's a huge complicated process which might be replaced with one migration and a few minutes of downtime. Sadly some companies can't afford it, so they do such changes with weeks planing.I usually vote for having some planned non-working-hours downtime.

评论 #16668774 未加载

评论 #16666394 未加载

评论 #16666431 未加载

评论 #16670450 未加载

评论 #16672058 未加载

评论 #16667752 未加载

评论 #16672042 未加载

YorickPeterseabout 7 years ago

At GitLab we do the following:We have two migration directories: db/migrate and db/post_migrate. If a migration adds something (e.g. a table or a column) or migrates data that existing code can deal with (e.g. populating a new table) then it goes in db/migrate. If a migration removes or updates something that first requires a code deploy, it goes in db/post_migrate. We combine this with a variety of helpers in various places to allow for zero downtime upgrades (if you're using PostgreSQL). For example, to remove a column we take these steps:1. Deploy a new version of the code that ignores the column we will drop (this is a matter of adding `ignore_column :column_name` in the right class).2. Run a post-deployment migration that removes said column.3. In the next release we can remove the `ignore_column` line (we require that users upgrade at most one minor version for online upgrades).The migrations in db/post_migrate are executed by default but you can opt-out by setting an environment variable. In case of GitLab.com this results in the following deployment procedure:1. Deploy code with this variable set (so we don't run the post-deployment migrations)2. Re-run `rake db:migrate` on a particular host without setting this environment variable.For big data migrations we use Sidekiq to run them in the background. This removes the need for a deployment procedure taking hours, though it comes with some additional complexity.More information about this can be found at the following places:1. <a href="https://docs.gitlab.com/ee/update/README.html#upgrading-without-downtime" rel="nofollow">https://docs.gitlab.com/ee/update/README.html#upgrading-with...</a>2. <a href="https://docs.gitlab.com/ee/development/what_requires_downtime.html#doc-nav" rel="nofollow">https://docs.gitlab.com/ee/development/what_requires_downtim...</a>3. <a href="https://docs.gitlab.com/ee/development/background_migrations.html" rel="nofollow">https://docs.gitlab.com/ee/development/background_migrations...</a>

评论 #16670342 未加载

michaeldejongabout 7 years ago

Researcher/author of a tool [0,1] also attempting to tackle this problem here.Unfortunately zero-downtime schema changes are even more complex than suggested here. Although the expand-contract method as described in the post is a good approach to tackling this problem, the mere act of altering a database table that is in active use is a dangerous one. I've already found that some trivial operations such as adding a new column to an existing table can block database clients from reading from that table through full table locks for the duration of the schema operation [2].In many cases it's safer to create a new table, copy data over from the old table to the new table, and switch clients over. However this introduces a whole new set of problems: keeping data in sync between tables, "fixing" foreign key constraints, etc.If there are others researching/building tooling for this problem, I'd love to hear from you.[0] <a href="http://github.com/quantumdb/quantumdb" rel="nofollow">http://github.com/quantumdb/quantumdb</a>[1] <a href="https://speakerdeck.com/michaeldejong/icse-17-zero-downtime-sql-database-schema-evolution-for-continuous-deployment-1" rel="nofollow">https://speakerdeck.com/michaeldejong/icse-17-zero-downtime-...</a>[2] <a href="http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-ddl-statements-mysqls-return/" rel="nofollow">http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-...</a>

评论 #16666435 未加载

评论 #16666720 未加载

评论 #16667515 未加载

评论 #16666295 未加载

smoyerabout 7 years ago

We're using FlywayDB [0] and many of the ideas in Martin Fowler's _Evolutionary DB_ article [1]. When database changes are breaking, we use a pre-deploy and post-deploy "pair" to create an intermediate DB state that both application versions will run against.[0] <a href="https://flywaydb.org/documentation/migrations" rel="nofollow">https://flywaydb.org/documentation/migrations</a>[1] <a href="https://www.martinfowler.com/articles/evodb.html" rel="nofollow">https://www.martinfowler.com/articles/evodb.html</a>

评论 #16666440 未加载

danbrucabout 7 years ago

There is also anchor modeling [1] with quite a few publications [2] and even an online modelling tool [3] to play around with. It is essentially a method that yields a highly normalized bitemporal database model so that every prior database state, schema and data, is a subset of the current database state.I was personally not able to use it yet but I really like the idea behind it. It is probably not the ideal choice in general due to its append only nature ever increasing the consumed storage space and due to its high degree of normalization which may or may not have a negative impact on performance when implemented on top of a rational database depending on the nature of your queries.But if you do not have to deal with excessive amounts of data, if you expect a lot of schema evolution, if you need traceability of all changes for auditing purposes or such, then using anchor modeling might be worth a shoot.[1] <a href="https://en.wikipedia.org/wiki/Anchor_modeling" rel="nofollow">https://en.wikipedia.org/wiki/Anchor_modeling</a>[2] <a href="http://www.anchormodeling.com/?page_id=360" rel="nofollow">http://www.anchormodeling.com/?page_id=360</a>[3] <a href="http://www.anchormodeling.com/modeler/latest/" rel="nofollow">http://www.anchormodeling.com/modeler/latest/</a>

pjungwirabout 7 years ago

This looks like a helpful approach to a tricky topic! I'm curious how the post-deploy migrations work when you run all your migrations at once, from the beginning, e.g. in Circle CI or when onboarding a new developer?Also, do you do anything to protect yourself against migrations written at time t_0 and then run much later at time t_n? I've seen a lot of problems there when migrations use application code (e.g. ActiveRecord models), and then that code changes. (My solution is to never call model code from migrations, but there are other ways.) This isn't really specific to your article I guess, but does your approach making managing that harder? Easier?Does having that details table help at all when you have migrations on a long-lived branch that is merged after other migrations have been added? Or is the solution there just to rebase the branch and test before merging it in?EDIT: I think this blog post by the Google SRE folks is great reading for people thinking about migrations and deployment:<a href="https://cloudplatform.googleblog.com/2017/03/reliable-releases-and-rollbacks-CRE-life-lessons.html" rel="nofollow">https://cloudplatform.googleblog.com/2017/03/reliable-releas...</a>I've never been comfortable with rolling back migrations in production, but their plan changed my mind that it can be done safely. Is your approach compatible with theirs?

jerkstateabout 7 years ago

This is cool but I've started reducing my need for migrations by just serializing most of the structured pieces of the record into a text string (json for example), and only have separate columns for things that need to be joined (usually an id column) or selected on/indexed. I find this approach prevents me from needing to do migrations nearly as often. If you find that you really do need to index on a new field, adding a column is the easiest kind of migration to do.Of course sometimes you do still need to which is where strategies like this come in.

theptipabout 7 years ago

>Migrate database to new schema... spin up new instanceSadly this is more of a pain in Django, since it does not write default values to the database. To be able to safely run the migration first, one must manually rewrite the SQL for the migration. [1][1]: <a href="http://pankrat.github.io/2015/django-migrations-without-downtimes/" rel="nofollow">http://pankrat.github.io/2015/django-migrations-without-down...</a>

stuff4benabout 7 years ago

Sigh, wishing there was some magic sauce for this. But currently will have to sit through a 14 HOUR downtime while my team upgrades our SonarQube DB to 6.7.

zieabout 7 years ago

Just put your database schema in a VCS and get on with life: <a href="http://www.liquibase.org/" rel="nofollow">http://www.liquibase.org/</a>

PunchTornadoabout 7 years ago

is it really 0 downtime as the title suggest, or minimal downtime?

评论 #16667309 未加载

xstartupabout 7 years ago

I wonder how it is done in Dynamodb or Mongo.

lndolentabout 7 years ago

I’d love to see an article about letting your users fucking deal with it, because no one really needs this much uptime anymore.

评论 #16667654 未加载

评论 #16667854 未加载