How to Perform DevOps Health Checks
The DevOps movement has grown tremendously during the last few years. Continuous integration (CI) and continuous delivery are key terms when discussing DevOps practices. Container management tooling, such as Kubernetes, has enabled and improved continuous delivery processes, making it even easier to quickly deploy containers using your delivery pipeline.
However, all this growth in tooling and technologies requires regular DevOps Health Checks. This post explains why and what to check. More sophisticated monitoring tools that enable distributed tracing are now available. These tools are especially useful for the microservices movement, in which each individual service produces logs, and each service also requires monitoring so developers can regularly perform a DevOps health check. For example, you might want to monitor CPU usage or memory allocation for each service.
In short, organizations want to know how they are performing:
- Did we reduce the time needed from finalizing a feature to deploying it to the production environment?
- Do we have less downtime for the most critical services?
- What’s the average time needed to resolve a problem?
- Did we shift to proactive monitoring to prevent problems before they even occur?
- How’s the DevOps team doing culture-wise?
All these questions can be answered by regularly performing DevOps Health Checks. These checks help your organization evaluate the quality of your DevOps implementation and help answer the above questions.
Let’s take a look at some guidelines for when you perform a DevOps Health Check.
DevOps Health Checks – 5 Evaluation Criteria
Here’s a list of five evaluation criteria to determine the current state of your DevOps implementation.
#1. Team Cohesion
Evaluate the cohesion of your DevOps team members. Do they regularly work together and share knowledge? Sharing knowledge is important for creating an open work environment where people can ask questions if they don’t know something.
Make sure your DevOps team has a clear vision and defined high-level goals. In other words, validate the DevOps strategy for the upcoming months. It’s important to have a clear direction and goals to work toward.
Additionally, it’s also worth checking if you have the right mix of skills in your team. Are you missing any key skills, such as a testing engineer who can optimize the testing steps for your CI pipeline?
Next, let’s evaluate your tools and processes.
#2. Tools and Processes
What tools does your DevOps team use and do they support the already defined DevOps processes? It might happen that you define ambitious processes, but you don’t have the right tools to deliver the data to support you in those processes. For example, you want to measure a metric like service availability, but you don’t have any monitoring tools in place that can provide you with service availability data.
Furthermore, it’s also important to evaluate whether your tools are still relevant. Maybe better monitoring tools exist from which your organization can benefit. Don’t be afraid to innovate or experiment with new tools.
#3. Bottlenecks
Identify any bottlenecks that slow down the DevOps or development team. Most importantly, look at your CI pipeline to see if anything can be improved.
For example, you might find out that you aren’t caching dependencies for your CI pipeline. This means you are downloading the same dependencies over and over for every new build. Of course, that adds a huge delay to your pipeline that can be easily overcome by caching the dependencies for your project.
#4. Service Levels
Service levels are very important to set standards for evaluating the DevOps team’s work. For example, service availability is one of the key metrics that you should include in your service level agreement. Commonly, a 99.9% availability of services is required. Evaluate if your DevOps team has met this strict metric.
#5. Mean Time to Repair (MTTR)
Lastly, the MTTR metric is key for understanding how quickly a DevOps team can resolve incidents. To resolve a problem quickly, observability is important. Does a team have the right logs and data to understand a problem? How quickly can they pinpoint the exact problem?
All of this contributes to the MTTR metric and it’s such an important metric to evaluate.
Conclusion: Why a DevOps Health Check Matters
As you can see, DevOps Health Checks matter most to evaluate the processes and the people that enable DevOps. This post gave you five evaluation criteria, but you can probably think of more. Team culture is important, and so is clarity about the processes the team is involved with.
Broken processes lead to lost employees. You want your team members to feel productive and be satisfied with the work they do. A regular DevOps Health Check helps to validate your current DevOps implementation, but more importantly, it evaluates how your DevOps engineers are doing.
Ask them about what irritates them or what’s not clear to them. Losing a DevOps engineer is far worse than not meeting a specific target like the number of errors you’ve handled in a month.
In short, focus on both processes and employees.