Improving the DevOps Metrics that Matter with Cloud Native Patterns

I’m a glass-half-full kind of person. When I hear the real-life stories of complex, treacherous paths to production (with inevitable delays, failures and fire drills), I tend to think “OMG, what potential!” If you’re like most organizations, you have so much upside by transforming how you deliver software. But it is a journey—not a destination. It’s about setting goals and doing the work. Along the way, you should track your progress, and measure yourself against your peers and industry benchmarks, refining and improving as you go.

One essential benchmark is the Accelerate: State of DevOps. It’s the longest-running research on software delivery performance. It can validate your progress and help you set goals for improvement around the delivery metrics that matter: throughput, stability, and availability. And if you’re that company thinking your glass is half empty, this report can give you hope.

In a Pivotal webinar covering findings from the State of DevOps report, lead author Nicole Forsgren made the ultimate positive statement:

“High performance is available for everyone…. Anyone can do this.”

She’s confident in that because she’s seen the numbers. In the 2018 report, there are more high performers in all industries. In fact, 48% of respondents were categorized as high performers. The growing number of high performers led the authors to carve out an ‘elite’ group in order to distinguish those really pushing the boundaries.

A new finding for the 2018 report is that those most adept at cloud computing were very likely to be among that elite group. As an example, platform-as-a-service users were 1.5 times more likely to be elite performers.

In the spirit of positive reinforcement with real-world data, I wanted to take a look at how you can improve in the three metric areas discussed in the report (throughput, stability, and availability). I’ll bring in some data points for how companies are leading the way for each area of measurement. And I’ll share some recommendations on how cloud-native approaches combined with continuous delivery best practices can help you see results.

Metric One: Throughput

Companies can measure the speed and efficiency of delivering value (whether that’s a new feature or a bug fix) to their customers in a number of ways. In the State of DevOps report, the authors focus on throughput metrics, which include:

Lead time for the change: how long it takes from code commit to production run
Deployment frequency: how often you deploy to production

You could also include metrics around “developer work”—producing smaller, modular batches of deployable code more often. After all, if development is not agile, then your delivery certainly won’t be.

Pivotal customers have demonstrated some remarkable results around speed to market:

CSAA Insurance experienced a 205 percent increase in developer productivity and increased deployment frequency by 1,400 percent.
Liberty Mutual reported that it is deploying 1,000 times a day to production on 2,500 daily builds across 600 production apps.
United States Air Force increased release frequency to 30.5 releases per month.

How do organizations achieve these types of results? The actual work of improving throughput can include:

Self-serve developer experience—It starts with the code. Do developers have easy access to the libraries and services they need to create cloud-native apps? Your team will also need to automate their build and test process through continuous integration (CI) and have instant access to development and test environments that are at parity with production.
End-to-end automated CI/CD pipeline—Automating every facet of delivery, from development hand-off to production, is critical. Are you automating deployments and testing? Do you automate time-intensive tasks like feedback notifications and change tickets? Through a visible end-to-end pipeline, Dev and Ops teams can work together to get code to staging and production several times a day.
Shift-left testing, database changes, security scans—Does your pipeline stall waiting for testing, the DB team, or the security team? Do you attend change review meetings? You should integrate and automate these activities in your delivery pipeline. This way, they can be done earlier and in parallel to other activities. The end result: you avoid manual hand-overs and delay.

An example of an automated CI/CD Pipeline Using Concourse.

Metric Two: Stability

As your team speeds up software delivery, they can also reduce errors and remediate faster. Does this conclusion surprise you? It shouldn’t.

The State of DevOps report shows that as high performers increase their speed, they simultaneously improve their stability. The report measures stability through two metrics: mean time to repair (MTTR) and change failure rate (changes that degrade service to the point of remediation or that cause failure).

What does this look like in the real world? Here are a few instructive case studies:

Cerner Healthcare was able to recover from a production cell loss in 5-15 minutes with zero humans involved. This previously would have taken them 6-8 hours and four humans.
Comcast reduced incident frequency by 44 percent and MTTR by 47 percent, which amounted to an 81 percent decrease in customer-facing downtime.
T-Mobile reduced product defect incident frequency by 83 percent and reduced incident resolution time by 67 percent (going from 340 minutes to 112 minutes to resolve).

What drives software stability?

Cloud-native architecture—Are you sitting on a complex, monolithic application portfolio with a mandate to move to the cloud? You can start by replatforming suitable applications to run on the cloud. But to take full advantage of cloud infrastructure, you must modernize those monoliths as loosely coupled, lightweight microservices that follow 12-factor principles.
Bullet-proofed CI/CD pipeline—Make your pipeline the standard, reliable path to production. It starts with automating the manual tasks in your application delivery process to remove human error. Does your pipeline include integrations with process checks, approval gates, and final artifact testing? All of these steps validate production readiness. Are your pipelines declarative (“as code”) for easier traceability, version control, and remediation? They should be!
Immutable, always secure infrastructure—Are you worried that your servers are behind on security patching? If you’re not applying patches as soon as they are issued, your systems and customer data are at risk. With an automated patching process, you can sleep that much easier at night. You can use deployment automation to regularly “repave” your infrastructure during business hours to fight against advanced persistent threats. By deploying new infrastructure every time you deploy a new application, deployments are more reliable and predictable. You can be more confident in your releases.

Metric Three: Availability

Availability is a new area of measurement in the State of DevOps report for 2018. It’s a recognition that the ultimate marker of software delivery performance is that users can reliably access an application or service.

Of course, metrics like availability open the door to thinking about software delivery through an operational lens. After all, continuous delivery is never really done. Production provides valuable insights for improving the customer experience.

Pivotal customers really excel at keeping software available to their users, measured in areas of application scaling and downtime in particular:

Comcast reduced the time to scale their application by 90 percent and increased their number of transactions by 238 percent.
DBS has experienced zero downtime in production over the past 2.5 years.
Yahoo Japan Technology reports zero downtime in production with PCF.

How do you deliver availability and accessibility for your customers? These practices can help:

Low-risk deployment strategies—Start your applications off right by de-risking the cut-over to live production with deployment strategies that gently move applications into full production run. Are you leveraging automated blue/green and canary deployments to minimize risk and downtime? Are you able to roll-back automatically when a deployment does not succeed? Use safe deployment strategies to increase confidence and reduce risk. This approach boosts your throughput while offering a path to continuous deployment when you’re ready.
High-availability (HA) by design—Does your infrastructure include redundancies? How does your stack handle failure at the availability zone, VM, application and process level? How do you monitor and manage the health of your services? Redundancy and distribution minimize downtime during ongoing operation, security updates, and platform upgrades.
Dynamically scalable applications—Scaling up or down without disruption keeps everyone happy—especially during peak traffic times. Can you automatically scale up to handle a big traffic event? Can you automatically scale back down to save costs? Having an elastic infrastructure, based on thresholds you set, helps keep your apps online in the face of unpredictable traffic.

Don’t Forget Culture: The Leading Indicator of DevOps Performance

Tools and tech are essential to becoming a high performer in software delivery, but culture is perhaps the most important element. While it may be harder to define a clear measure for culture, there are many indicators of what drives a happier, more productive culture.

For example the State of DevOps report talks about outsourcing as an anti-pattern to high performance, because of added overhead and greater functional divide. Instead, cross-functional teams and agile practices are correlated with better performance.

It’s also clear that implementing continuous delivery (defined in the report as “Technical practices in delivery and deployment that reduce the risk and cost of performing releases”) boosts team morale and performance improvement. Better visibility across teams and faster feedback loops can help break down silos.

Let’s Get Started

We’ve helped organizations like yours transform how they develop and deliver software. Here are some suggestions on how to get started:

Whiteboard session with apps and ops teams. Here, we explore the areas of impact as part of a Pivotal Labs engagement. What parts of your application portfolio or process need the most improvement?
Get an application running on a modern platform. Remember those monolithic apps we mentioned? Let’s start a quick project to decompose it, and get it up and running on Pivotal Cloud Foundry. This is an Application Transformation engagement. Here, we help you realize greater throughput, stability, and availability, and show you proof of what’s possible.
Value stream assessment. What does your path to production look like today? Where are blocks and inefficiencies? Let’s analyze your status quo, and make a plan to get better. Read the whitepaper.

Let us help you get started on achieving your DevOps performance goals today.

Deliver Cloud Native Software with Pivotal

Metric One: Throughput

Metric Two: Stability

Metric Three: Availability

Don’t Forget Culture: The Leading Indicator of DevOps Performance

Let’s Get Started

Related Articles

The Shadow PaaS vs CaaS War: Cloud Foundry's Relevance in a Kubernetes World

Introducing Tanzu for Valkey on Cloud Foundry 4.0

CF Weekly Blog: Log Management with SCDF

Tanzu Platform for Cloud Foundry is Bringing Customers Faster to Market

Cloud Native Platforms Continue to Payoff, But There Can Be Too Much of a Good Thing