This post was co-authored by Gustavo Franco.
In a world of “it’s always up,” expectations for reliability have risen dramatically. Customers are accustomed to a perception of near 100 percent availability over an indefinite amount of time. Yet, no system is infallible forever.
When it comes to delivering business value, the most effective approaches to building, running, and managing workloads and platforms are those that strike a balance between agility and reliability; those that balance investments in uptime (or other service-level indicators) with investments in product development.
With the accelerated adoption of Kubernetes as a platform for modern applications, production concerns—such as observability, availability, and change management—are heavily influenced by how your Kubernetes environments are built and managed. This presents an important opportunity for investment in order to strike that balance. This is where VMware CRE comes in.
What is VMware CRE?
Customer Reliability Engineering (CRE) is what happens when you ask software engineers to work with customers on Site Reliability Engineering (SRE).
VMware CRE focuses on taking hands-on SRE practices like service-level objectives (SLOs), error budgets, and addressing toil by partnering with our customers to help them embrace and adapt those principles within their Kubernetes environments.
We apply our combined expertise in running large, complex Kubernetes environments with a proactive engagement model to help customers meet their reliability goals.
In parallel, we seek to strengthen the reliability features available in our Tanzu portfolio and the ability of our Services teams to scale the delivery of our offering.
How does CRE help you?
For eligible customers, we offer an engagement model that involves knowledge transfer via education and workshops, and collaborative practice of one or more of the following areas, enabling your teams to independently succeed with SRE:
Non-Abstract Large Scale Systems Design (NALSD) with a focus on reliability
Defining and measuring reliability goals—SLIs/SLOs/error budgets
Designing for and implementing observability
Defining, testing, and running an incident management process
Capacity planning (including graceful degradation during saturation)
Change and release management, including CI/CD
Planning for and performing chaos engineering.
The level of engagement is based on several factors, such as tenure of the account, level of commitment, customer needs, and staffing availability.
While customers accepted into the CRE program are not charged for this service, we do expect an investment in terms of engagement and commitment from your team.
VMware also offers similar paid engagements. CRE can connect you and your account team with the right Professional Services team.