DevOps Metrics that Actually Matter
Recently, I have been working with several teams to help them with metrics. It seems that metrics are one of those things that can help or hurt an organization, depending on how they are applied. Within an Agile/DevOps context, metrics are critical to quantifying the change in team and organizational performance. Without measuring how things are today, it is impossible to know whether the changes we worked so hard to make actually made a difference at all to things that matter. The same idea goes with any team that aspires to reach the “high-performance” status. We need metrics to understand where we stand, how much we have changed as a team, and how much more work we need to do to attain our objective.
The challenge with selecting metrics is usually not figuring out what metrics make sense, but how to apply them. With help from today’s highly sophisticated tools, it is usually not too difficult to begin collecting metrics. What is done with this data is where teams tend to struggle. Let’s take a look at a few popular metrics that are often used for DevOps initiatives and explore a few ways to apply them in a meaningful way.
Metric #1 – Lead Time
Within a DevOps domain, arguably the most critical metric is how quickly value is delivered to the customer, which is usually captured by Lead Time or Cycle Time. Many organizations fail to accurately measure their Lead Time because they do not fully understand their value stream, or applied an inaccurate value stream that ends prematurely and does not end with the customer delivery. Lead Time is an important measure of a team’s ability to deliver products or services, which has a direct correlation to customer satisfaction. Measuring trends in Lead Time will enable teams to assess how to reduce wasteful delays and optimize the process flow.
Metric #2 – Deployment Frequency
How often a team (or organization) is able to deliver value is also a direct measure of this team’s ability to respond rapidly to changing conditions. This is a key principle of a high-performing Agile/DevOps team. While the utopian state for most modern technology companies is to be able to deploy hundreds of software builds per day, similar to Amazon, most companies can achieve successful outcomes without reaching this level of maturity.
Metric #3 – Change Failure Rate
While deploying frequently is usually a positive thing, quality is still important; if most of the deployments are unsuccessful, high frequency will not make your customer happy. Hence, the success/failure rate for software deployments is usually an eye-opening metric to monitor and aggressively track, as it can offer critical insights into the team’s performance. A downward trend in failure rate is desirable, as expected.
Metric #4 – Mean Time To Restore
Deployment will fail sooner or later, no matter how mature or how experienced your team is. This is not the end of the world, especially if you have plans in place to resolve deployment failures. Automated deployments may benefit from automated rollback processes that shorten the time required to restore a service to the customers. The ability to fallback and minimize negative impact to customers will provide the team with a higher degree of confidence to deploy more frequently.
In summary, DevOps processes rely heavily on metrics in order to ensure visibility into what is going on within the end-to-end flow. Although it is not necessary to instrument every single process immediately, it is usually advantageous to monitor at least one or two key metrics closely and incrementally evolve over time. This will instill discipline and foster a mindset continuous improvement, which will empower your team to reach elite status much more quickly.