The 10 Platform Journey Health Markers: A Roadmap to Continuous Improvement

August 6, 2019 Parker Fleming

This post was originally published in August 2019 and has been updated.

What bothers you the most about enterprise IT? For me, it’s the tendency to promise easy solutions to complex problems. The appetite for risk aversion can cloud the judgment of stakeholders, allowing wishful thinking to influence decision making. Hope is not a strategy. This desire for an easy button is especially noteworthy when observing organizations navigating the complex and confusing world of DevOps and cloud native.

It is understandable behavior, though. For much of human history, our very survival depended on conserving energy and resources. And we avoided risks at all costs. Unfortunately, these instincts don’t serve us well when it comes to building and running distributed systems in our data centers or the public cloud. 

At VMware Tanzu Labs, we believe that transformation is something you do, not something you buy. This may seem counterintuitive coming from a software company, but bear with me. 

Application platforms are just tools. To maximize their value, you need to learn how to use them as they were intended. When companies wrap these modern platforms in their legacy operational model, they struggle to realize the maximum potential of the product. We’ve learned there’s a better way to achieve superior outcomes.

Our Tanzu Labs team teaches you how to treat your platform as a product. We pair with you and your team to apply user-centered design (UCD), extreme programming (XP), Lean, and site-reliability engineering (SRE) practices. Without these practices, companies’ ability to maximize the impact of these modern platforms will continue to be constrained by the same legacy governance, operational, security, and change control processes that contributed to the need for a modern platform in the first place.

So once you’ve made the move to a new platform like Tanzu for Kubernetes Operations or Tanzu Application Platform. If you want to maximize the value of your new platform, it’s fair to ask a few questions. How do you know if you’re getting better at running your platform? How do you know if you’re executing at the highest level? Have you transitioned to running your platform as a product?

That brings us to the topic at hand. Tanzu Labs has developed a framework to measure the relationship between the prescribed practices and the health of your application platform and your platform team. This framework—the Tanzu Labs Platform Journey Health Markers—consists of 10 distinct areas where we have seen our most successful customers achieve a state of continuous improvement.

The 10 platform health markers

Here is a quick summary of each of the 10 health markers. Is there one of these that stands out to you as the most important? Are there any that you and your team are struggling with? Let us know!

1. Monitoring and metrics

Teams must establish desired service behavior, measure how the service is actually behaving, and correct discrepancies.

Why it matters

Observability of your platform’s state is required to make certain that service-level objectives (SLOs) are being met. This imbues trust in the platform from your customers, which, over time, will afford the platform team increased autonomy in their platform operations. Monitoring and alerting on the right indicators also allows the platform team to react to situations before they impact the reliability of the platform.

2. Capacity planning

The ability to project future demand and verify that a service has enough capacity in appropriate locations to satisfy that demand is imperative.

Why it matters

On-demand access to services and resources is the cornerstone of the value proposition of a platform. To ensure this customer expectation is met, proactive capacity planning is required to accommodate externalities that require lead time, such as procurement, hardware, networking, limit management in the public cloud, and so on.

3. Platform update engine

Platforms require fast, frequent, and frictionless delivery and the measurement of incremental platform capabilities in production. 

Why it matters

Keeping your platform current provides your users with a secure, supported, and feature-rich developer experience. By setting expectations for the rate and cadence of change of your platform (through error, vulnerability, and legacy budgets), you can guarantee that you are providing all of these things to the business.

4. Emergency response

Quickly spotting and effectively responding to service failures preserves conformance to service-level agreements (SLAs).

Why it matters

Your ability to react to adversity helps prevent the erosion of customer satisfaction. Having a healthy feedback loop empowers you to learn from these incidents to avoid recurring issues. These competencies also protect your error budget to secure the time that’s needed to focus on delivering new features and value to your customers.

5. Self-service

It’s necessary to instantiate and delete service capacity in a predictable fashion (which is often a consequence of capacity planning). 

Why it matters

Shortening the time to market and empowering your business to support its applications without introducing additional friction are two of the most strategically valuable things your platform can do for your business.

6. Performance optimization

Characterize and track service component performance, efficiency, and resource utilization to identify and address regressions, as well as drive improvements in efficiency.

Why it matters

The performance of your platform capability is a key component of the reliability expectations of your customers. Monitoring performance-related metrics can help you take corrective measures to prevent unnecessary use of your error budget. Additionally, these insights can help inform responsible capacity planning.

7. Business continuity

Provide a secure, low-impact mechanism to meet recovery-time and availability objectives.

Why it matters

Having a robust and well-rehearsed process to restore your platform to a known good state ensures your business can trust the platform's capability to run business-critical applications.

8. Platform as a product

The team should frequently update the platform with new features and security updates and introduce new capabilities in response to the needs of its users. It should be treated as a product and include not only a VMware Tanzu platform, but all of the services and integrations that make it a viable environment for applications to run.

Why it matters

Focusing on the needs of your users not only creates a great product/market fit (which delights your users!) but also prevents your platform team from wasting time building the wrong things or over-engineering solutions.

9. Balanced team

The platform team should consist of a product manager and at least two platform engineers with a combination of infrastructure and software engineering skills.

Why it matters

Product management and platform engineering are complementary but distinct domains. Having individuals on your team who focus on each of these ensures you have high-quality interactions and feedback loops with your customers. In turn, this gives the engineers on your team clear direction on what to build.

10. Path to production

Developers should be able to take full advantage of the platform via modern and optimized tools and processes.

Why it matters

When coupled with the ability to self-service tenancy, a streamlined path to production (that is unencumbered by legacy tools, gates, change windows, etc.) is the key to minimizing the time to value for your business. It also empowers application teams to troubleshoot and address issues without escalation. Note that build pipelines apply here, as do continuous integration and continuous delivery principles.

Ready to learn more?

The idea of “digital transformation” is squishy, as the definition varies based on who you’re talking to. But if you’re serious about getting better at software, you can very quickly use these 10 health markers to quantify and track how you’re doing.

If you want to learn more about effectively running your platform, don’t miss our Tanzu Labs sessions at VMware Explore, August 28 to September 1, 2022, and SpringOne, December 6–8, 2022, in San Francisco, Calif. The agenda is packed with seasoned practitioners who’ll share their best practices and lessons learned along their platform journey.

You can also check out these resources:

About the Author

Parker Fleming is the use director of Platform Services at VMware Tanzu.

More Content by Parker Fleming
Why Kubernetes and Pivotal Container Service 1.5 is the Cure for Your Windows Server 2008 Headaches
Why Kubernetes and Pivotal Container Service 1.5 is the Cure for Your Windows Server 2008 Headaches

This Month in Spring - July 2019
This Month in Spring - July 2019


Subscribe to our Newsletter

Thank you!
Error - something went wrong!