Database Versioning: The ignored aspect of version control

72 pointsby jlemoineabout 11 years ago

23 comments

danbrucabout 11 years ago

While all theory sounds nice in theory in practice it just fails. If you are only concerned with empty databases during development there is not much to be gained by keeping deltas of different database schema versions - just having one evolving database create script will do the job. Then just grab the version belonging to your application version and you are good to go. Once you have a non-empty database and have to provide data migration scripts everything becomes a one-way street in all but the most trivial cases because updates are destructive.My favorite toy example are firstname and lastname. If you decide to merge them into a single name column (without separator between firstname and lastname) you can not automatically go back because there is ambiguity when ever either firstname or lastname is missing or when at least one of firstname or lastname consists of two or more words. If you do the opposite - split the name into firstname and lastname - and somehow manage to come up with a satisfying heuristic how to perform the split, it is easy to go back by just merging firstname and lastname again. But then going forward again will not return you to the initial state in all the cases where your heuristic fails.Currently I think the way to go may be using a highly normalized database schema - like 5th or 6th normal form - and perform only non-destructive schema evolution. I am not yet sure how this will turn out in practice with respect to limitations imposed on the possibilities to evolve the schema and how bad it may become because of abandoned database schema artifacts and the contained data. Anyway I think it looks promising and after doing some more research, thinking and prototyping I will most likely give it a try in an upcoming project.

评论 #7399041 未加载

评论 #7398486 未加载

pornelabout 11 years ago

Schema rollback doesn't really work, because once you have data in the database DROP TABLE is not the inverse of CREATE TABLE (going back and forth deletes the data).I've never been in situation where losing production data is acceptable. I've stopped bothering with rollback statements that nobody ever cares to test and nobody will ever use in production. Schema changes are forward-only.

评论 #7398046 未加载

评论 #7399464 未加载

评论 #7398143 未加载

评论 #7398055 未加载

flurdyabout 11 years ago

Have to admit I use simpler tools such as DbDeploy, Flyway or Play!'s evolutions. They do the trick for smaller databases, but are difficult to retroactively fit in as the OP mentions.A great tip I also received once was to split migrations across releases when they are destructive. ie. if a column is to be removed, first release code that don't use that column, if all okay, rename to column in the next release db migration, and if still ok, in a release later on actually delete the column. If this was done in one release and didn't work your rollback would be difficult.Other defensive actions e.g. when deleting rows of data, is to copy the affected data to a temporary table first before deleting it in the original table(s), and only in a later release when its proven you did not need to rollback should you delete the temporary table.And if for some reason you don't have any automated database versioning, at least create a versions table to manually keep track of changes and dates for some sort of sane auditing.

fendaleabout 11 years ago

This is an area I have experimented with in large enterprise environments, usually with Oracle databases.In one job I created a process where we used Rails style migrations with some Ruby code I put together. As well as schema migrations, I had it handle applying stored procedures to the database too.Since then I created a little application called dbgeni (database generic installer) and have been using it successfully to create Dev and Test environments in another job. It changes the schema with migrations and applies changed stored procedures too.One difference with dbgeni and other tools, is that dbgeni uses plain SQL files and applies them using the database CLI - ie sqlplus for Oracle, the mysql command for MySql and isql for Sybase. This means that if you take the tool away, the SQL scripts are still useful - that allows me to use the tool for applying changes to dev and test environments, but since the production DBAs don't want to use it, I can just bundle up the same scripts and pass them over.The project is on github <a href="https://github.com/sodonnel/dbgeni" rel="nofollow">https://github.com/sodonnel/dbgeni</a> and it installs as a Ruby gem. I have supporting website <a href="http://dbgeni.appsintheopen.com" rel="nofollow">http://dbgeni.appsintheopen.com</a> with more details.

tiquorsjabout 11 years ago

Everyone who does DB update should understand idempotence and how to undo their changes. I like the approach the author provides, but calling it version control is inaccurate. This is change management and it's a completely different discipline with its own body of knowledge.

评论 #7397732 未加载

ibottyabout 11 years ago

every article about db version control should mention sqitch. it's a great tool that works similar to and in conjunction with git.<a href="http://sqitch.org/" rel="nofollow">http://sqitch.org/</a>

评论 #7398312 未加载

评论 #7398621 未加载

评论 #7411827 未加载

emmelaichabout 11 years ago

Here you go. It's easy with a bit of Category Theory. Just implement something along the lines of David Spivak's "On The Relational Foundations Of Functorial Data Migration".<pre><code> http://arxiv.org/abs/1212.5303 </code></pre> Original motivation, from v1 of the paper:<pre><code> I asked whether the difficulty in changing data types to allow for longer integers was simply a storage issue or something more fundamental. They said that it was more fundamental, and in fact that my question showed how little I knew about the horrors of using databases in the real world!</code></pre>

yoodenvranxabout 11 years ago

> Every script must start with a integer which must be the next available integer.Based on experience you should not do 01_script_a.py, 02_script_b.py, ... But rather do 010_, 020_, 030_, ... . This way you can squeeze in some initially unaccounted script in the correct location without renaming the others.

评论 #7397710 未加载

评论 #7397760 未加载

评论 #7398335 未加载

评论 #7398142 未加载

catmanjanabout 11 years ago

Microsoft's MVC database migration system has worked well for me.The package manager handles making the "code behind" the database changes (which is committed to the CVS), and you can revert and update as needs be.

评论 #7398466 未加载

sethrenoabout 11 years ago

I prefer to think of db version control and db migrations as two separate things.Version control achieved by keeping the SQL scripts to create the database in your VCS along with the code that depends on it.Migration scripts are created later as a pre-deployment step. I prefer not to store these scripts in VCS because in my experience they are typically run once and never used again. There are several tools that can generate the migration scripts by comparing two schemas e.g. dev-->prod.I wrote an open source command line tool that can be used to create VCS friendly create scripts from a SQL Server database. It's on github at <a href="https://github.com/sethreno/schemazen#schema-zen---script-and-create-sql-server-objects-quickly" rel="nofollow">https://github.com/sethreno/schemazen#schema-zen---script-an...</a>Also, here are a few tools that I've used to generate migrations scripts: * <a href="http://opendbiff.codeplex.com/" rel="nofollow">http://opendbiff.codeplex.com/</a> * <a href="http://www.sqldbtools.com/Tools.aspx?ProductId=1" rel="nofollow">http://www.sqldbtools.com/Tools.aspx?ProductId=1</a> * <a href="http://www.red-gate.com/products/sql-development/sql-compare/" rel="nofollow">http://www.red-gate.com/products/sql-development/sql-compare...</a>

binarymaxabout 11 years ago

As others have said and the author mentioned with the 3Tb example - it is the data that is the tricky part. Rather, the tricky part is state. Code (and schema) versioning is fundamentally easy because it is internally stateless. Once you add external state to the mix all bets are off. I really wish there was a silver bullet but there is not. If you can solve the issue of versioned persistence of large amounts of state then you will solve a lot of problems in computer science.

评论 #7398763 未加载

sandGorgonabout 11 years ago

Does anyone know what is the production-worthy mechanism of working with database versioning in Django, Node and Go ?It always feels strange to me (coming from Rails) that other frameworks treat versioning as optional - I mean Django is getting migrations (not as South, but as part of the default toolkit) only now ?Does that mean there is another way to collaborate and work on database changes ?

评论 #7397956 未加载

andrewaylettabout 11 years ago

Very interested to see the link to Liquibase[0] in that article. Must check it out. Does anyone have experience with that sort of tool that they'd like to share, either good or bad?[0]: <a href="http://www.liquibase.org/" rel="nofollow">http://www.liquibase.org/</a>

评论 #7398253 未加载

评论 #7398014 未加载

derekp7about 11 years ago

I've been struggling with how to get an Oracle database contents under Git control. Our developers use an older rapid development environment, in which the front end tool stores its form layout, triggers, and source code all in the database. It doesn't provide an option to develop out of regular text files. So there are a couple approaches I could take:1) Use triggers so that every time a GIT command is run, it either imports or exports the given database table. But this takes simple, fast operations and makes them expensive.2) Write a Fuse plugin that can expose a database schema as a filesystem object, and have a git checkout trigger that mount it in the appropriate place.Has anyone already done either of these two?

share_and_enjoyabout 11 years ago

I've used the manual migration script approaches before (Active Record, DBDeploy), which are really reliable providing you use them religiously to perform all changes. However it's a lot of work and it's easy to accidentally make a change directly to the local dev DB and then either forget to commit a change script or commit a change that isn't quite what you did.At the other end the automated comparison/migration tools are much more convenient. However they just can't work in 100% of cases, for example thinking a rename is a drop and a create, or splitting data.I've always ended up with a hybrid approach in the past

jdc0589about 11 years ago

This is something EVERYONE struggles with at some point, and everyone has a different solution that works.It's still alpha (I'm introducing breaking changes soon), but I have been working on Mite (<a href="https://github.com/jdc0589/mite-node" rel="nofollow">https://github.com/jdc0589/mite-node</a>) for a few weeks now; its a re-implementation of an internal tool we used in the past. The documentation is incomplete/out-of date though; it will be getting some love soon.Mite's goal is kind of similar to sqitch, but it aims to be a little simpler; very little configurations and migrations are plain old sql

dgrnbrgabout 11 years ago

Another interesting solution is to use a functional database, like Datomic. Since the database looks like a giant timeseries, you have an implicit versioning built into the database.

bniabout 11 years ago

Didn't like the article. Has your typical handwaving about "using an ORM" that will resolve all your complexities. Object Relational Mapping is a problem completely orthogonal to database migrations.I like flywaydb. For "rolling back" you need to use your regular disaster recovery, as migrations will modify data aswell as the schema. Use transaction logging.

languagehackerabout 11 years ago

I wrote about this idea few years ago, from a non-ORM perspective: <a href="http://robertelwell.info/blog/future-proof-db-changelog/" rel="nofollow">http://robertelwell.info/blog/future-proof-db-changelog/</a>

nercuryabout 11 years ago

Solved by Doctrine2 project way ago. Using a much better timestamp versioning. I actually miss its schema diff and migration tools when I go back to such languages like C# or C++...

评论 #7397906 未加载

pjmlpabout 11 years ago

If I remember correctly there were a few vendors that used to sell such tooling in the 90's.

dsugarmanabout 11 years ago

I would love to see Github come out with some decent postgres versioning

mukundmrabout 11 years ago

flywaydb works fine for Java.

评论 #7398052 未加载