In my last post I painted a pretty Utopian picture of an imaginary delivery process, but it’s not that far from reality. I’ve experienced a process pretty close to this on several different engagements.
How did we get here?
The first thing that went well is that your developers relied on a source and build management system to get them on the same page. Your process has captured all of the code and configuration, and associated them strongly with a particular deployment. There’s a one-to-one mapping between what’s on the server right now and some group of files that everybody knows exactly how to get to. When you say “Install build 432” people immediately know what commands to run to do that, no hand holding, no waiting in line. Five people can install the code exactly as fast as 2 people can do it, which means you can dog pile on the problem.
The second thing that happened is that you had a few developers with enough of a mental model of your system that they could begin forming theories about what’s wrong – even if they didn’t write the code. This came from collaboration on the initial coding, transparency, and probably from code reviews.
The third thing that happened is that your developers were able to make a very small change to the code to affect a fix. In order to do this they need to be confident in their ability to refactor code. This is both a matter of code organization and practice. Lots and lots of practice. So much practice in fact that the fathers of Extreme Programming used to refer to this as ‘relentless refactoring’. Do it until it’s boring, then keep doing it.
The fourth thing that happened is that the build process gave you a pretty good idea of how effective the change was. You had a set of verification tools that always says ‘NO’ when something is wrong, and only says ‘YES’ when quite a lot of things are going right. Managers want confidence intervals on risky, last-minute code pushes, and confidence can be measured in test automation, eyeballs, and discipline. I feel good, the tests look good, and 3 people checked my work. Ship it.
The fifth thing that happened is a restatement of the first: Once a new version exists, everyone needs to be able to get it, test it, and potentially deploy it. The possibility for human error has mostly been removed from this process. Tooling handles most of it, and repetition handles most of the rest.
The biggest thing that happened is also the most valuable. Nobody panicked. Since everyone already knew what to do, people did their part as best they could and with a sense of urgency. At the end of the day people might be a little tired, but nobody was spent, or exhausted. No one person had to be the hero, and the load was shared. Tomorrow, real work will get done because people will have recovered.
The final thing that happens is that a plan is put in place to try to avoid repeating the problem that just occurred. No need to have this sort of emergency again if we can help it. The process is adjusted and estimates change to reflect any extra work. This extra work will be cheaper than having your whole team distracted by a fire drill while they were busy trying to perform delicate surgery on the code for your next release.
Why So Serious?
This sounds like a lot of discipline, but it’s no different than any emergency response team. People drill and drill so that even when they’re not expecting an emergency, even when they’re not at their best, even when several valuable team members are missing, they can react in a responsible and useful manner.
Perfect Storms Aren’t Rare
The funny thing about Releases is that they usually precede times of flux in the project. After a release, people start doing things they’ve put off for a while. Upgrading tools, planning major features, restructuring the project, going to the optometrist, visiting family, going on vacation, switching to a new project, or leaving the company.
The person who knows the piece of code might not be around. They might be elsewhere, or they may be gone. The tools you use today might be different than the old ones. It’s not enough to be able to upgrade, you have to be able to downgrade too. So everything has to be written down accurately somewhere, and several other people need to be familiar with your work.
It’s all about preparation.