While all theory sounds nice in theory in practice it just fails. If you are only concerned with empty databases during development there is not much to be gained by keeping deltas of different database schema versions - just having one evolving database create script will do the job. Then just grab the version belonging to your application version and you are good to go. Once you have a non-empty database and have to provide data migration scripts everything becomes a one-way street in all but the most trivial cases because updates are destructive.<p>My favorite toy example are firstname and lastname. If you decide to merge them into a single name column (without separator between firstname and lastname) you can not automatically go back because there is ambiguity when ever either firstname or lastname is missing or when at least one of firstname or lastname consists of two or more words. If you do the opposite - split the name into firstname and lastname - and somehow manage to come up with a satisfying heuristic how to perform the split, it is easy to go back by just merging firstname and lastname again. But then going forward again will not return you to the initial state in all the cases where your heuristic fails.<p>Currently I think the way to go may be using a highly normalized database schema - like 5th or 6th normal form - and perform only non-destructive schema evolution. I am not yet sure how this will turn out in practice with respect to limitations imposed on the possibilities to evolve the schema and how bad it may become because of abandoned database schema artifacts and the contained data. Anyway I think it looks promising and after doing some more research, thinking and prototyping I will most likely give it a try in an upcoming project.
Schema rollback doesn't really work, because once you have data in the database DROP TABLE is not the inverse of CREATE TABLE (going back and forth deletes the data).<p>I've never been in situation where losing production data is acceptable. I've stopped bothering with rollback statements that nobody ever cares to test and nobody will ever use in production. Schema changes are forward-only.
Have to admit I use simpler tools such as DbDeploy, Flyway or Play!'s evolutions. They do the trick for smaller databases, but are difficult to retroactively fit in as the OP mentions.<p>A great tip I also received once was to split migrations across releases when they are destructive. ie. if a column is to be removed, first release code that don't use that column, if all okay, rename to column in the next release db migration, and if still ok, in a release later on actually delete the column. If this was done in one release and didn't work your rollback would be difficult.<p>Other defensive actions e.g. when deleting rows of data, is to copy the affected data to a temporary table first before deleting it in the original table(s), and only in a later release when its proven you did not need to rollback should you delete the temporary table.<p>And if for some reason you don't have any automated database versioning, at least create a versions table to manually keep track of changes and dates for some sort of sane auditing.
This is an area I have experimented with in large enterprise environments, usually with Oracle databases.<p>In one job I created a process where we used Rails style migrations with some Ruby code I put together. As well as schema migrations, I had it handle applying stored procedures to the database too.<p>Since then I created a little application called dbgeni (database generic installer) and have been using it successfully to create Dev and Test environments in another job. It changes the schema with migrations and applies changed stored procedures too.<p>One difference with dbgeni and other tools, is that dbgeni uses plain SQL files and applies them using the database CLI - ie sqlplus for Oracle, the mysql command for MySql and isql for Sybase. This means that if you take the tool away, the SQL scripts are still useful - that allows me to use the tool for applying changes to dev and test environments, but since the production DBAs don't want to use it, I can just bundle up the same scripts and pass them over.<p>The project is on github <a href="https://github.com/sodonnel/dbgeni" rel="nofollow">https://github.com/sodonnel/dbgeni</a> and it installs as a Ruby gem. I have supporting website <a href="http://dbgeni.appsintheopen.com" rel="nofollow">http://dbgeni.appsintheopen.com</a> with more details.
Everyone who does DB update should understand idempotence and how to undo their changes. I like the approach the author provides, but calling it version control is inaccurate. This is change management and it's a completely different discipline with its own body of knowledge.
every article about db version control should mention sqitch. it's a great tool that works similar to and in conjunction with git.<p><a href="http://sqitch.org/" rel="nofollow">http://sqitch.org/</a>
Here you go. It's easy with a bit of Category Theory.
Just implement something along the lines of David Spivak's
"On The Relational Foundations Of Functorial Data Migration".<p><pre><code> http://arxiv.org/abs/1212.5303
</code></pre>
Original motivation, from v1 of the paper:<p><pre><code> I asked whether the difficulty in changing data types to
allow for longer integers was simply a storage issue or
something more fundamental. They said that it was more
fundamental, and in fact that my question showed how
little I knew about the horrors of using databases in
the real world!</code></pre>
> Every script must start with a integer which must be the next available integer.<p>Based on experience you should not do 01_script_a.py, 02_script_b.py, ... But rather do 010_, 020_, 030_, ... . This way you can squeeze in some initially unaccounted script in the correct location without renaming the others.
Microsoft's MVC database migration system has worked well for me.<p>The package manager handles making the "code behind" the database changes (which is committed to the CVS), and you can revert and update as needs be.
I prefer to think of db version control and db migrations as two separate things.<p>Version control achieved by keeping the SQL scripts to create the database in your VCS along with the code that depends on it.<p>Migration scripts are created later as a pre-deployment step. I prefer not to store these scripts in VCS because in my experience they are typically run once and never used again. There are several tools that can generate the migration scripts by comparing two schemas e.g. dev-->prod.<p>I wrote an open source command line tool that can be used to create VCS friendly create scripts from a SQL Server database. It's on github at <a href="https://github.com/sethreno/schemazen#schema-zen---script-and-create-sql-server-objects-quickly" rel="nofollow">https://github.com/sethreno/schemazen#schema-zen---script-an...</a><p>Also, here are a few tools that I've used to generate migrations scripts:
* <a href="http://opendbiff.codeplex.com/" rel="nofollow">http://opendbiff.codeplex.com/</a>
* <a href="http://www.sqldbtools.com/Tools.aspx?ProductId=1" rel="nofollow">http://www.sqldbtools.com/Tools.aspx?ProductId=1</a>
* <a href="http://www.red-gate.com/products/sql-development/sql-compare/" rel="nofollow">http://www.red-gate.com/products/sql-development/sql-compare...</a>
As others have said and the author mentioned with the 3Tb example - it is the data that is the tricky part. Rather, the tricky part is <i>state</i>. Code (and schema) versioning is fundamentally easy because it is internally stateless. Once you add external state to the mix all bets are off. I really wish there was a silver bullet but there is not. If you can solve the issue of versioned persistence of large amounts of state then you will solve a lot of problems in computer science.
Does anyone know what is the production-worthy mechanism of working with database versioning in Django, Node and Go ?<p>It always feels strange to me (coming from Rails) that other frameworks treat versioning as optional - I mean Django is getting migrations (not as South, but as part of the default toolkit) only now ?<p>Does that mean there is another way to collaborate and work on database changes ?
Very interested to see the link to Liquibase[0] in that article. Must check it out. Does anyone have experience with that sort of tool that they'd like to share, either good or bad?<p>[0]: <a href="http://www.liquibase.org/" rel="nofollow">http://www.liquibase.org/</a>
I've been struggling with how to get an Oracle database contents under Git control. Our developers use an older rapid development environment, in which the front end tool stores its form layout, triggers, and source code all in the database. It doesn't provide an option to develop out of regular text files. So there are a couple approaches I could take:<p>1) Use triggers so that every time a GIT command is run, it either imports or exports the given database table. But this takes simple, fast operations and makes them expensive.<p>2) Write a Fuse plugin that can expose a database schema as a filesystem object, and have a git checkout trigger that mount it in the appropriate place.<p>Has anyone already done either of these two?
I've used the manual migration script approaches before (Active Record, DBDeploy), which are really reliable providing you use them religiously to perform all changes. However it's a lot of work and it's easy to accidentally make a change directly to the local dev DB and then either forget to commit a change script or commit a change that isn't quite what you did.<p>At the other end the automated comparison/migration tools are much more convenient. However they just can't work in 100% of cases, for example thinking a rename is a drop and a create, or splitting data.<p>I've always ended up with a hybrid approach in the past
This is something EVERYONE struggles with at some point, and everyone has a different solution that works.<p>It's still alpha (I'm introducing breaking changes soon), but I have been working on Mite (<a href="https://github.com/jdc0589/mite-node" rel="nofollow">https://github.com/jdc0589/mite-node</a>) for a few weeks now; its a re-implementation of an internal tool we used in the past. The documentation is incomplete/out-of date though; it will be getting some love soon.<p>Mite's goal is kind of similar to sqitch, but it aims to be a little simpler; very little configurations and migrations are plain old sql
Another interesting solution is to use a functional database, like Datomic. Since the database looks like a giant timeseries, you have an implicit versioning built into the database.
Didn't like the article. Has your typical handwaving about "using an ORM" that will resolve all your complexities. Object Relational Mapping is a problem completely orthogonal to database migrations.<p>I like flywaydb. For "rolling back" you need to use your regular disaster recovery, as migrations will modify data aswell as the schema. Use transaction logging.
I wrote about this idea few years ago, from a non-ORM perspective: <a href="http://robertelwell.info/blog/future-proof-db-changelog/" rel="nofollow">http://robertelwell.info/blog/future-proof-db-changelog/</a>
Solved by Doctrine2 project way ago. Using a much better timestamp versioning. I actually miss its schema diff and migration tools when I go back to such languages like C# or C++...