Since we’re working backward through the development process, the last step in software development is software that’s already deployed. So let’s start there.
When Production Software is going well, and I mean really well, something cool happens: Nothing much at all.
When a release goes off without a hitch, you can immediately turn your attention to the next release. People might relax a bit for a week, but then it’s right back into the action. They are focused and committed to the new features, not looking over their shoulder wondering when the next emergency will happen. With the major exception of any big technology upgrades that may be coming, their productivity more closely resembles the middle of the release than the typical start-of-milestone head scratch that usually occurs. People are being respectful and generous to each other, and good things are already happening.
Something Bad Happens
Even when things go a little sideways, your senior people know the drill. If the bug is severe, it may be all hands on deck, but the plan always looks pretty similar so the conversation is mercifully short. Now is the time for action, not recrimination. Your tools and version control are in good enough shape that several people can get the production version of the entire environment running on test rigs (or their own machine) within 15 minutes. A few minutes after that a set of repro steps are available, and while one or two people start debugging the current version, someone else starts installing older versions to see how old the bug is.
If the fix is trivial, it’s shared between a handful of devs and a few QA people to verify that there are no other new regressions. At the same time, test automation is running. The automated tests are checking the most tedious and difficult to replicate scenarios. At some point enough thumbs up are gathered, and the code is ready to go to production.
If however the fix eludes your best people, at some point the person inspecting older versions will chime in with which version was the one before the bug was introduced. Everyone knows what functionality was introduced since that build, so they can have a discussion of whether rolling back would make things worse or better. The Ops people know (roughly) how to reverse out the changes and get that version out onto the servers if it comes to that. Meanwhile your troubleshooters are looking at all of the commits that have happened since the last good build, trying to narrow down possibilities and form new theories. Once the light clicks on you can go back to the happy path.
Everyone knows more or less how long a redeploy takes, so once the announcement goes out that the fix – or workaround – is going live, everyone sees the light at the end of the tunnel. It may take 5 minutes or half an hour, but there are no more surprises because your build and deploy environment is repeatable. A couple people babysit the deployment while everyone else gets back to working on the next release.
This may sound fantastical to some, or like a Tuesday to others. Next up we’ll cover what qualities lead the team to this situation.