Archive for the 'Methodology' Category

Database Change Control

Monday, February 4th, 2008

K. Scott Allen recently published a five part series on the importance of version control in creating and maintaining the database behind your product. This starts with something pretty important and fundamental, three rules for database work. Rules #2 and #3 are vital. No argument there. Rule #1 I’ll come to later. Jeff Atwood has also written on the subject before, and highlighted his previous post and a few further comments in support of what K. Scott Allen had written in an apparently unscheduled post on his blog. Looking at the trackbacks and comments on the posts, this appears to have generated a lot of interest, and I feel compelled to critique the posts.

Firstly, you must get your database under version control if you ever plan on releasing more than one version. You must have an authoritative source of schema and procedures etc, you must do versioned releases. This is not in contention. But other points in K’s series and Jeff’s unquestioning support for this are.

Secondly, some background, I’m a Development Manager for a Microsoft Technology Stack web based application which is maintained and released as a product. We have a lot of tables, a lot of stored procedures and other database entities. And a lot of developers.

Taking the approaches outlined by K in his posts on Change Scripts and Views, Stored Procedures and the Like on face value, they are hopeless. They’ll get you into trouble. Fast.

K’s process is one script for the database schema and one script per stored procedure in the master definition. He then references Phil Haack’s Bullet Proof Sql Change Scripts post for an idea of how to provide better change programs than his simplistic ones.

Phil’s stuff is good in principle, making sure that your SQL Change Scripts can execute many times, but, they are laborious. We’ve rolled all the checks you could possibly want into a set of functions dbo.ColumnExists(TableName, ColumnName). dbo.IndexExists(IndexName, TableName) etc.

But with a system like ours with thousands of stored procedures (literally) and doing a re-deploy of all sps is painful. So we script only the updates and roll them out. But that will be a lot of updates in a release designed to take any version of our oldest supported version up to our newest major release (iApplication XP Panoramic Web 2.0 Version). Executing each of these individually and tracking the results is a hassle.

The next step is to roll these scripts together into one to make a quick deployment, just get the installer/dba to execute “Update_1.1.0.1.sql” on the target database and there you go. *

Only that’s when it gets really problematic. SQL Query Optimiser pulls the rug out from under your feet. What it will do is change the order of execution to optimise it. It’ll create new tables, add rows and script stored procedures at times to suit it’s own optimisation desires. And then everything will fail. Columns won’t exist, so stored procedures won’t compile. Views will fail. Commands to add indexes will explode. There will be bits of schema all over the place.

So then you have to wrap every alter table command, every DROP usp_ and CREATE usp_ in sp_executeSQL. Which means you have to escape every ‘ character correctly. So then you need to write a tool to generate the change scripts cleanly.

Then, you find that if you are executing hundreds of changes in batch in a single change program, you’ll find that it’s hard to stop a change logging itself when only part has fallen over, so then you have to wrap each item in a check for @@ERROR to see if anything went wrong. And wrap the whole lot in a transaction and roll that back if any errors happened at all.

Right, so now we have something approaching bullet proof. Now we need to make sure our team of developers do it right every time. We write a process document and run a training session. We explain how each development starts with getting the latest SP/Table definition from Source Control making your changes, generating your change program. That your change program must be tested and shown to execute cleanly leaving the full trace etc, and must contain the right assigned version number.

Whoever writes this change script will test it thoroughly and against a variety of test data, then commit the change script into source control. The schema change is officially published. The schema change will start to appear in developer workspaces as they update from source control, and on test machines as new builds are pushed into QA and beyond.

That just doesn’t work. You will despair as even your best guru programmers, the ones who really care about their craft, make mistakes and take short-cuts because the process is onerous and is seen as a tax. So then you need to have a nightly build process that restores a clean database, executes every change against it, checks the results and the database. That runs a parse on the change program beyond the SQL Execution to make sure the rules are adhered to and Change_1_0_1_0_2.sql actually records itself as 1.0.1.0.2 and not 2.0.1.3.4 which is it’s number on a different branch etc etc. K’s text is incredibly hand-wavy.

1. Never use a shared database server for development work.

The convenience of a shared database is tempting. All developers point their workstations to a single database server where they can test and make schema changes. The shared server functions as an authoritative source for the database schema, and schema changes appear immediately to all team members. The shared database also serves as a central repository for test data.

Like many conveniences in software development, a shared database is a tar pit waiting to fossilize a project. Developers overwrite each other’s changes. The changes I make on the server break the code on your development machine. Remote development is slow and difficult.

Avoid using a shared database at all costs, as they ultimately waste time and help produce bugs.

So having a single instance per developer on their local machine is a panacea for all your “shared database” problems. All those bugs created by people over-writing each other’s schema changes etc.

What about when a developer neglects to update their local instance and produces a fix based on an out-dated schema or stored procedure? Same thing. How do you ensure all your developers are keeping their local instance sufficiently up to date? Auto-update their instances with each commit as it’s stabilised? What if that wipes out their changes?

I’m not saying a shared database solves these issues, or that you won’t run into the issues he mentions on your shared database. But it’s only one point to control. We periodically re-stabilise our test and development databases where developers are patching their work for peer testing. We’re working on improving this process all the time. Some development requires an isolated environment to avoid breaking everyone’s ability to work, and they do have isolated local environments, but only for the length of that development.

Database version management is an incredibly hard problem to resolve. And although getting people started with it as K and Jeff have done is good, it’s not enough. You have to go further. And if you’ve gone further than us, please tell me where we go next!

* - (Using isql with the right voodoo to supress all the line numbers and pointless messages just leaving us the completion state, piping the output to a text file which we can parse to ensure that everything completed AOK and the database is not left in an inconsistent state, prior to us then needing to validate that the database IS actually in a decent state and the scripts haven’t falsely reported success…)

Popularity: 22% [?]

Software Design Approach

Saturday, April 14th, 2007

So I’ve decided on a platform for development. PHP/Apache/MySQL. Next we need to think a little bit about how we’re going to build this thing.

Right now, downstairs, I have a builder working on a new extension to my living room and a carpenter fitting a new kitchen. I don’t want to trivialise the problems and issues they face, but, the building industry is thousands of years old. Materials and techniques do change. But not that much. Lighter stronger materials. But, a brick is a brick. And it takes a certain amount of time to build a wall out of them. And the calculations required to ensure it meets requirements are well known and understood.

There is always the uncertainty of weather, and suprises like (random attempt to appear to know something about building…) dryrot to deal with. But generally, there aren’t that many suprises and new challenges in building. You do have some aspects of interpretting the customer’s desires, but, blueprints, 3d modelling of what’s being planned in the kitchen sort all that stuff out with a pretty exact level of detail.

Software specification, design and development on the other hand is another thing entirely.

Software is not a science, it’s not even really engineering. Software has been likened to a collaborative sport. I quite like this article I read recently where one parallel drawn is that software development is like rock climbing. This rang very true for me, as I’m a software developer and ex-rock climber.

Techniques and methods are changing all the time, we still haven’t managed to reach the point where we have such a set of tools, techniques and materials that we can get a bunch of people to go through an apprenticship in purely practical things and follow a basic recipie to turn out a repeatable delivery on time of a finished product, like we can with building (troublesome builders who run over, turn up late etc asside).

Software development (commercial, enterprise grade) takes highly skilled and educated people to deliver the mess we currently deliver. We’re far behind the building industry, which is delivered by (sorry, but it is) the less bright, less able, less educated people in society. Becuase they can! Becuase those of us with degrees in software engineering are crap at it and are too busy being crap at writing software. But crap in that certain special way.

Anyway, opinionated rant asside, things are getting better. We’re starting to establish some common, re-useable techniques to get software delivered better. There are a whole bunch of generic approaches to development, each one with a myriad of different interpretations.

I’m going to talk about Object Oriented software development, because, in my opinion that is the best approach.

The basic concepts of Object Oriented software design and development are covered well in a number of places. My general preference as the font of all knowledge is usually Wikipedia, and as it’s a software topic, they give Object Orientation a pretty good coverage.

Basically though, in object oriented software, everything is should be an “object”. And by that we mean the literal term for an object, in the sense of the fact a car is an object. That object should wrap up all the attributes of that object, such as it’s colour, and all the things you can do to it, such as start the engine.

In procedural programming, you would have a whole bunch of unrelated variables, such as $carColour and routines such as function startCarEngine() that lived somewhere in that spagetti of code you call a system. Object Orientation keeps things clean and tidy. You have your Car class, and you have a variable $myCar that represents a specific car. You can find out what colour that car is ($myCar->Colour) and you can start it ($myCar->StartEngine()) without worrying about the mess of variables.

But that’s just the start. What if you have different types of car. Old fashioned cars needed a crank handle to start them. But they still had wheels, tyres and so forth. In the procedural world, we’d have effectively duplicated code for old fashioned car starting and new modern car starting. The difference to the user is turning the key, or cranking the handle. But internally, much of the implementation may be the same (pump fuel, fire spark plugs etc).

That’s where object orientation gets useful. Objects can be built of other objects. This is called inheritance. Basically, you define a basic car class that defines all the features common to all cars. Then from that you inherit all that code, and extend it or replace it to provide an implementation of a type of car. Such as a Crank Handle Car or a Modern Car. From that you can inherit down and down until you have a class that defines a Ford Mondeo 1.8 Deisel, because it has details that are different to the Ford Mondeo 1.8i Petrol.

Indeed, your base car class, could have inherited common properties from the Vehical class. In some implementations of object orientation you can inherit multiple things. So it may have inherited “Internal Combustion”, “Wheeled Transport” and several other base classes.

You code once, and use or extend that code in many other places.

This is important, because it means you fix a root bug once and fix it everywhere.

OO is important for Multiblog, because Multiblog needs to post to many different web services. But, each web service will have certain things in common. At the very least, I’ll need the ability to make an HTTP POST operation to a URL. If I write my blog service implementations as classes, I can derive common functionality from a BlogService base class.

Furthermore, with Object Orientation there is a rich collection of Design Patterns. These are as close as software engineering gets to fixed, repeatable recipies for development. They give us ways to do things.

The typical object oriented patterns make it really simple to layout and engineer your software in such a way it will be easy to change, expand and manage your development. We’ll see that in action later on (I hope!)

Popularity: 13% [?]