DORA metrics

Deployment Failure Rate

Deployment failure rate is a key metric for assessing the reliability of a software deployment process. By tracking how often deployments fail, teams can identify potential issues in their CI/CD pipelines, the quality of their codebase, and their ability to resolve problems quickly.

At Leanmote, we calculate the deployment failure rate based on the naming conventions of pull requests (PRs). This approach allows us to categorize deployments, distinguish between hotfixes and bug fixes, and get a clearer picture of how different types of changes impact the stability of our system.

Why Naming Conventions Matter

A consistent naming convention for PRs is essential for tracking the type and urgency of changes made to the codebase. By adopting a standardized naming format, teams can quickly categorize changes into hotfixes, bug fixes, recoveries and rollbacks, which play distinct roles in the deployment pipeline.

  • Hotfixes (HF) are urgent fixes for critical issues, typically aimed at addressing severe bugs or outages that require immediate attention in production. Hotfixes often bypass the usual development and testing process to ensure they can be deployed quickly.
  • Bugfixes (BF), on the other hand, address less critical issues, such as minor bugs or glitches that don’t affect core functionality or user experience. These are typically scheduled for the next release cycle and undergo more rigorous testing.
  • Recovery (RV) refers to the process of addressing a failure by deploying a fix or patch that resolves the issue and restores the system to a stable state. This typically happens within a short time frame to minimize downtime and impact on users.
  • Rollback (RB), on the other hand, involves reverting to a previous stable version of the software when a deployment failure cannot be immediately resolved. This is usually done to quickly restore functionality, allowing for a more thorough investigation and resolution of the underlying issue before attempting another deployment.

By using naming conventions to differentiate between hotfixes and bug fixes, we can gain deeper insights into the deployment process and how various changes impact system stability.

Note: We can adapt DORA metrics to your naming conventions, allowing for a more streamlined and customized approach to tracking deployment success, failure, and recovery rates.

How It Works

Here’s a simple example of how naming conventions help us understand deployment failure rate:

  1. Hotfix Naming Convention:
    We define a hotfix PR with a specific prefix, like HOTFIX/ or HF-. For example, a PR addressing an urgent production issue might be named HF-1234-Critical-API-Endpoint.
  2. Bugfix Naming Convention:
    A bugfix PR follows a different convention, such as BUGFIX/ or BF-. For example, a PR fixing a minor UI issue could be named BUGFIX-5678-Resolve-UI-Alignment.

We can then calculate the deployment failure rate by factoring in the frequency of failures and linking them to the specific type of change deployed.

Calculating Deployment Failure Rate

The formula for calculating the deployment failure rate can look something like this:

[Deployment Failure Rate] = [Number of Failed Deployments] / [Total Number of Deployments]

Conclusion

Measuring the deployment failure rate is essential for understanding the stability of your CI/CD pipeline. By leveraging naming conventions to distinguish between hotfixes and bug fixes, you gain a more detailed view of deployment failures and can make more informed decisions about how to improve your release process. This approach allows teams to identify patterns, prioritize improvements, and ensure the reliability of their deployments, even when under pressure to deploy urgent fixes.