Update: Since this article was first published, we've released our Engineering Effectiveness strategy framework that goes well beyond the four key metrics, and proposes critical leading indicators, and often hidden forms of waste, that "elite" organizations have used to drive immense productivity improvements.
"Measures shouldn’t become our goals", commonplace and sound advice when it comes to metrics. But what if there was a set of metrics that accurately captures how well your software team is delivering? And that this set of four key metrics can help drive effective behaviors and practices that will improve the performance of your software team, and ultimately your bottom line?
It might sound too good to be true but the concepts behind the ‘Four Key Metrics’ are backed by robust research. As the team behind the landmark book Accelerate has shown, organizations with teams who perform well on these Four Key Metrics also have higher rates of profitability, market share and customer satisfaction.
The ideas behind the Four Key Metrics emerged from work by the research group (now part of Google) DevOps Research and Assessment (DORA), which explored the business impact of the so-called DevOps movement. That work was further refined by Dr. Nicole Forsgren, Jez Humble and Gene Kim in the aforementioned Accelerate. Those metrics were identified as: deployment frequency, change lead time, change fail percentage and mean time to restore.
To the uninitiated, the metrics might not be easy to understand right off the bat. But they essentially cover how effectively — in terms of speed and stability — software teams can roll out change to those live systems for end users, be that customers, employees or partners.
Unpacking the Four Key Metrics
According to the DORA research, speed (or throughput) is best measured by:
Deployment frequency. How often changes are made to the live environment — the more frequently, the better. This seems counterintuitive for organizations where every software release is associated with a lot of risk. However, the DORA data shows what many tech practitioners have preached for years: The more frequently your developers deploy into the live environment, the more stable their deployments and releases to users eventually become. To score highly against this metric, teams have to continuously work on reducing those pains and blockers that are part of changing software systems. Typically, teams aim to streamline the process until it becomes so highly automated that changes are routine. And as a result, the decision on whether to release an update into the live environment becomes simply a business decision.
Change lead time. The time taken from when a developer changes the software code, to when that change is deployed into the live environment — the shorter, the better. To improve this metric, teams will need to introduce a high level of automation into their software development and testing process. They’ll continuously ensure that the changes they make individually still work well in combination.
However, it is important to measure stability of these speedy changes at the same time, to make sure that speed does not come at the price of quality. Stability is measured by:
Change fail percentage. The ratio of live environment changes that lead to service disruption for your end users (incidents), versus changes that go out without problems — the lower, the better. To improve here, software teams rely on highly automated tests and other automated quality measures to review software builds. To be really effective, teams should focus investments to reduce the change fail percentage in those systems that are most critical to end users.
- Mean time to restore (MTTR). The time it takes to resolve those critical end-user service disruptions when they inevitably happen — the shorter, the better. Many organizations still focus on measuring the number of incidents, or “mean time between failure” (MTBF). For instance, how often does our payment engine go down? MTBF tells execs some important truths, but focusing on it too much leads to inordinately high risk aversion, where teams concentrate on avoiding incidents or downtime at all costs, which often results in a fear of experimentation and innovation. A shift to MTTR incentivizes teams to invest in quick recovery from errors — not how often they fail, but how quickly they get systems back to normal in the event of an incident. This in turn gives them more safety to experiment, take risks, and increase speed.
What does good look like?
DORA’s research has also benchmarked companies against these metrics. According to this benchmark, “elite” performers distinguish themselves by on-demand deployment frequency, a change lead time of less than one day, a change fail percentage of 0-15%, and an MTTR of less than one hour. In the 2021 “State of DevOps Report”, 26% of participating organizations fell into the elite category, up from 7% in the 2018 report.
Is your organization “elite” yet? Start measuring and find out.
Three tips for using the four key metrics
Like most things in life, the devil is always in the details. While the Four Key Metrics provide a lodestar for organizations looking to drive improvements, you need to exercise a degree of caution in how you actually implement these ideas in real life.
Here are three good practices you can follow to ensure that the steps you take to improve key metric performance have the right impact on your people, your organization and your customers.
- Use the metrics to shift your perception of risky
There are two things in these metrics that are still counterintuitive for many execs when it comes to risk assessment. First, the fact that rolling out changes more frequently ultimately reduces risk of failures. If your teams want to be always ready to deploy, they will necessarily have to adopt behaviors that facilitate this, such as automated testing throughout the development process.
And secondly, the shift towards a focus on “mean time to restore”, away from a pure focus on low incident numbers. This means a mindshift for leaders as well as teams. Have you started incentivizing teams for these metrics, and are you praising them for quick restore times?
- Don’t wait for tools to start measuring
We have seen many organizations delay using the metrics while looking for a tool to do this, or even building one. The appeal of a tool is that it can provide a convenient dashboard with relevant performance stats. But actually choosing a tool isn’t so straightforward: a number of different tools have already emerged, each with different strengths and weaknesses, many even using the wrong data to measure. You may not need tools to get started because you can already get pretty meaningful data by surveying the teams regularly, using the questions in DORA’s quick check tool. Even though the precision is not as high, this method gives you a first indication of where your organization stands, and what the trends are. Don’t let yourself get put off because of a lack of tooling.
- Consider context when comparing performance across internal teams
The four key metrics provide a strong framework for understanding development and deployment performance that can be used to broadly compare teams to one another. However, when doing so, it’s important to keep in mind that each of your software teams is subject to different conditions and requirements that can influence their priorities — and in turn, their performance on these key metrics.
Teams who take care of systems that are 20 years old — and deal with all the complexity that brings — will have to invest a lot to improve when it comes to the Four Key Metrics. Other teams, say ones who are building new things from scratch, with modern technologies, will have a much easier time to achieve the “elite” benchmark. Your teams working on business-critical services, with lots of changes, will have a much higher priority to invest in respective improvements than teams taking care of less critical applications. So while the Four Key Metrics make for useful comparisons and do tell you what good looks like in general, you still have to keep in mind the balance of costs and benefits of investing in improvements for each of the services.
Don’t lose your way in the pursuit of performance
The Four Key Metrics are a useful way of thinking about your software teams’ performance. But no metric is more important than the overarching goal of becoming an organization that can effectively meet the needs of its services’ end users. So while these metrics are very solid leading indicators of software delivery performance they shouldn’t become your teams’ main purpose. You want your teams to focus on adopting behaviors that support organizational performance — not just working towards metrics.