Database Change Control
Monday, February 4th, 2008K. Scott Allen recently published a five part series on the importance of version control in creating and maintaining the database behind your product. This starts with something pretty important and fundamental, three rules for database work. Rules #2 and #3 are vital. No argument there. Rule #1 I’ll come to later. Jeff Atwood has also written on the subject before, and highlighted his previous post and a few further comments in support of what K. Scott Allen had written in an apparently unscheduled post on his blog. Looking at the trackbacks and comments on the posts, this appears to have generated a lot of interest, and I feel compelled to critique the posts.
Firstly, you must get your database under version control if you ever plan on releasing more than one version. You must have an authoritative source of schema and procedures etc, you must do versioned releases. This is not in contention. But other points in K’s series and Jeff’s unquestioning support for this are.
Secondly, some background, I’m a Development Manager for a Microsoft Technology Stack web based application which is maintained and released as a product. We have a lot of tables, a lot of stored procedures and other database entities. And a lot of developers.
Taking the approaches outlined by K in his posts on Change Scripts and Views, Stored Procedures and the Like on face value, they are hopeless. They’ll get you into trouble. Fast.
K’s process is one script for the database schema and one script per stored procedure in the master definition. He then references Phil Haack’s Bullet Proof Sql Change Scripts post for an idea of how to provide better change programs than his simplistic ones.
Phil’s stuff is good in principle, making sure that your SQL Change Scripts can execute many times, but, they are laborious. We’ve rolled all the checks you could possibly want into a set of functions dbo.ColumnExists(TableName, ColumnName). dbo.IndexExists(IndexName, TableName) etc.
But with a system like ours with thousands of stored procedures (literally) and doing a re-deploy of all sps is painful. So we script only the updates and roll them out. But that will be a lot of updates in a release designed to take any version of our oldest supported version up to our newest major release (iApplication XP Panoramic Web 2.0 Version). Executing each of these individually and tracking the results is a hassle.
The next step is to roll these scripts together into one to make a quick deployment, just get the installer/dba to execute “Update_1.1.0.1.sql” on the target database and there you go. *
Only that’s when it gets really problematic. SQL Query Optimiser pulls the rug out from under your feet. What it will do is change the order of execution to optimise it. It’ll create new tables, add rows and script stored procedures at times to suit it’s own optimisation desires. And then everything will fail. Columns won’t exist, so stored procedures won’t compile. Views will fail. Commands to add indexes will explode. There will be bits of schema all over the place.
So then you have to wrap every alter table command, every DROP usp_ and CREATE usp_ in sp_executeSQL. Which means you have to escape every ‘ character correctly. So then you need to write a tool to generate the change scripts cleanly.
Then, you find that if you are executing hundreds of changes in batch in a single change program, you’ll find that it’s hard to stop a change logging itself when only part has fallen over, so then you have to wrap each item in a check for @@ERROR to see if anything went wrong. And wrap the whole lot in a transaction and roll that back if any errors happened at all.
Right, so now we have something approaching bullet proof. Now we need to make sure our team of developers do it right every time. We write a process document and run a training session. We explain how each development starts with getting the latest SP/Table definition from Source Control making your changes, generating your change program. That your change program must be tested and shown to execute cleanly leaving the full trace etc, and must contain the right assigned version number.
Whoever writes this change script will test it thoroughly and against a variety of test data, then commit the change script into source control. The schema change is officially published. The schema change will start to appear in developer workspaces as they update from source control, and on test machines as new builds are pushed into QA and beyond.
That just doesn’t work. You will despair as even your best guru programmers, the ones who really care about their craft, make mistakes and take short-cuts because the process is onerous and is seen as a tax. So then you need to have a nightly build process that restores a clean database, executes every change against it, checks the results and the database. That runs a parse on the change program beyond the SQL Execution to make sure the rules are adhered to and Change_1_0_1_0_2.sql actually records itself as 1.0.1.0.2 and not 2.0.1.3.4 which is it’s number on a different branch etc etc. K’s text is incredibly hand-wavy.
1. Never use a shared database server for development work.
The convenience of a shared database is tempting. All developers point their workstations to a single database server where they can test and make schema changes. The shared server functions as an authoritative source for the database schema, and schema changes appear immediately to all team members. The shared database also serves as a central repository for test data.
Like many conveniences in software development, a shared database is a tar pit waiting to fossilize a project. Developers overwrite each other’s changes. The changes I make on the server break the code on your development machine. Remote development is slow and difficult.
Avoid using a shared database at all costs, as they ultimately waste time and help produce bugs.
So having a single instance per developer on their local machine is a panacea for all your “shared database” problems. All those bugs created by people over-writing each other’s schema changes etc.
What about when a developer neglects to update their local instance and produces a fix based on an out-dated schema or stored procedure? Same thing. How do you ensure all your developers are keeping their local instance sufficiently up to date? Auto-update their instances with each commit as it’s stabilised? What if that wipes out their changes?
I’m not saying a shared database solves these issues, or that you won’t run into the issues he mentions on your shared database. But it’s only one point to control. We periodically re-stabilise our test and development databases where developers are patching their work for peer testing. We’re working on improving this process all the time. Some development requires an isolated environment to avoid breaking everyone’s ability to work, and they do have isolated local environments, but only for the length of that development.
Database version management is an incredibly hard problem to resolve. And although getting people started with it as K and Jeff have done is good, it’s not enough. You have to go further. And if you’ve gone further than us, please tell me where we go next!
* - (Using isql with the right voodoo to supress all the line numbers and pointless messages just leaving us the completion state, piping the output to a text file which we can parse to ensure that everything completed AOK and the database is not left in an inconsistent state, prior to us then needing to validate that the database IS actually in a decent state and the scripts haven’t falsely reported success…)
Popularity: 46% [?]

