How Does DevOps Handle Change Management?
Tools today are wonderful. They help ease your task load, letting you get things done with only a few clicks. I remember reading a potential horror story about a developer promoting a change in the database to a production environment. He was trying to add a column to a table, and a tool generated the script with a “drop table” statement at the beginning. The developer didn’t notice it because he tested the script locally. Luckily for him, they had a DBA that always reviewed developer’s scripts. When he discovered that the script was going to recreate the table just to add a new column, he reported that back and no data was lost that day. Phew!
That’s change management (CM), the process by which you can understand why a change is needed and how to minimize risk as much as possible. My first encounter with CM was horrible. I was used to deploying new changes rapidly, without a formal announcement because the team was small. The problem with that was I didn’t have free time; when something was failing, I was the only one who’d know what was causing the problem. And that meant it was me that had to connect remotely and fix it.
I always thought that CM was only for big companies, not startups. After all, startups have to stay light and lean! But the idea behind CM isn’t to slow down company initiatives. Instead, one of its main purposes is to reduce risk when releasing new changes, which is something everyone, including startups, can use. A key point of CM is to provide documentation. And CM is also about having a human to review, analyze, and approve changes—to make judgments. After all, people are better able to make decisions than computers. CMs aren’t meant to prevent failure, but what if we can automate the things that can be done by a computer? We can have feedback frequently and rapidly, letting us humans take on fewer—and more interesting—decisions.
With the speed that DevOps brings, it’s only logical to think that more problems will arise and our systems will become riskier. DevOps isn’t meant to supplant CM. In fact, its purpose is to replace the need for human intervention. So CM and DevOps can live together. When we merge speed with less risk, the results are incredible. So how does DevOps handle change management?
Treating Changes Differently
Not all changes have the same impact. You can’t measure how risky a change is by counting the lines of code modified. It’s so easy for little things to create big problems, like just changing a font color in a web page. For example, ITIL has three type of changes we can use as a base to classify and, based on that, apply DevOps principles. Those changes are standard, normal, and emergency.
- A standard change is a low-risk change that’s already well known, and it follows a strict procedure. Everyone loves these types of changes.
- A normal change is, well, normal. You don’t know how risky these changes are; they’re either new or they’re those changes everyone is afraid to make because there’s always a problem with them. For these type of changes, you usually request approval from a change advisory board (CAB). The CAB analyzes it before going live, trying to deduce the impact on the system. Usually, IT Ops would like to avoid these types of changes.
- Emergency changes are those that need immediate attention. You can’t wait for them to go through the formal approval process. Everyone would like to avoid these, but sometimes they’re inevitable.
Having this knowledge as a base, we’ll need to practice making those changes. In the beginning, you won’t have many standard changes. Most of them will be normal changes. But the end goal is to transform normal changes into standard changes by having automation in place.
DevOps will start treating your changes differently, with the purpose of improving lead time and reducing blockers by introducing automation all the way down. The CAB could reject a change because there’s missing documentation, and a much-needed fix could be delayed because of this. Next time, you can automate this process by automatically filling in the documentation from the user stories, emails, or whatever you use to document requirements.
Integrating With Existing Tools and Processes
You don’t need to stop doing what you’re currently doing and drop all your existing tools. Hack them. Find a way that blockers stop being blockers. The problem most of the time is lack of documentation and poor communication. You don’t always remember what steps are needed unless you do it all the time. Humans are awful at at repeatable tasks. Seek to have an improved process in which, after the developer pushes a change in the code, all the machinery is triggered and starts creating requests for changes (RFC) and JIRA tickets, sending emails requesting approval with links or summaries, etc. This will leave humans free to do what they’re best at it: applying judgment.
The main goal is to prove we have everything we need to make the change operate as expected when published—especially when the business hits the “publish” button. The CAB will appreciate all the effort we went to give them enough information and context for a change. And this new automated process will give them all the proof they want to see before releasing a change to the users.
Leaving Tracks All the Way Down
When you have automation in place, it becomes a trivial task to incorporate audit trails into the pipeline. This brings big benefits. Anyone who wants to can know how much time went into a recent change going live, why it was needed, who approved it, and whether all checkmarks were ticked off in previous steps. For example, the next time an auditor requests evidence that a change followed your process, it’ll just be a matter of following the trail backward. All information will be available. But along with these big benefits come big challenges. That’s especially true when you need to skip the process and make manual changes for an emergency change, for example.
Avoiding Manual Changes
When you understand this, you’ll gain so much confidence with deployments that you’ll never want to do manual changes again. In a perfect and immutable world, let’s say you eliminate the temptation to implement manual changes by locking down the doors and forcing everyone to avoid SSH’ing to servers. There may still be times when you will need to break the rules, but you’ll want to make that the exception, not the norm, and automate it. Having problems and need to see logs? Go to the log management tool. Running out of space? Ok, fix it. But next time, add an alert and trigger an automated process to increase storage. See? It’s doable.
Transporting From Dramatic to Non-Dramatic Changes
Now we’ll have enough information to prove that a pipeline is trustworthy. There’s proof that you’ve had fewer incidents with automation. The CAB will start seeing those types of changes as less risky, and less and less human intervention will be needed, even to approve changes. Why? Because to have a solid deployment pipeline, you have to have all types of evolving tests included. There are some companies that will still want to have the approval gate for business purposes (e.g., a change will go live on the announced date, not before).
The idea is to transport a normal change to a standard change and make emergency changes need less and less human intervention. This will take time, sure. But be patient and seek to have a green-light pipeline most of the time. Make sure that if it goes red, it’s because a test failed and not because there was something wrong with the automation process.
Use Your Weapons Wisely
All the things that have been blockers—you’ll transform them in a way that makes them easier to live with. (The “aha” moment for me was when I read The Phoenix Project, and I can’t recommend a better book for understanding why CM is needed and how DevOps helped a company escape the hole they dug themselves into.)
Meanwhile, relax. DevOps is not trying to remove CM. It’s trying to make it more powerful. You can still use your current processes and tools, but you can do it in a more strategic way. Focus on trying to have more standard changes. Start with one—the least problematic one—and prove the theory. ITIL practices have been in IT for quite a long time, and it’s for a reason. Embrace change management, but do it smartly by having an expected and predictable delivery pipeline.