Application Resiliency for Cloud Native Microservices with VMware Tanzu Service Mesh

September 22, 2021 Deepa Kalani

Modern microservices-based applications bring with them a new set of challenges when it comes to operating at scale across multiple clouds. While the goal of most modernization projects is to increase the velocity at which business features are created, with this increased speed comes the need for a highly flexible, microservices-based architecture. The result is that the architectural convenience created on day 1 by developers turns into a challenge for site reliability engineers (SREs) on day 2.

Developers expect the business features to work at scale and exhibit certain performance characteristics, but they may not know what that will ultimately cost or have the compute space necessary. SREs, on the other hand, will have the needed compute space but may not know the best way to scale the microservices to meet the set performance objectives. This situation can very quickly escalate into release slowdowns and missed communications between teams, which in turn create resiliency issues for highly distributed applications.

What is needed is a much more automated approach in the form of a contractual agreement between developers as they define service-level indicators (SLIs) for their services, and SREs, who in turn use those SLIs to define service-level objectives (SLOs). At VMware, we think of such an agreement as an SLO policy.

In this post, we’re going to demonstrate how you can set up an SLO in Tanzu Service Mesh.

The SLIs in SLOs

In Tanzu Service Mesh, an SLO is composed of multiple SLIs. Developers communicate with SRE teams to help identify baseline SLIs so that they can configure SLOs in the various production environments and help improve application resiliency through constant iteration. In the world of microservices, an application may comprise one or more application domains. These domains can be created to deconstruct a larger application; to represent different environments such as development, staging and production, or platforms; or to maintain separation between the various concerns of operators, application owners, and developers.

Using an application domain construct implemented via the Global Namespace (GNS) in Tanzu Service Mesh, developers can use these namespaces to deploy microservices while application operators and SREs can define SLOs for these applications with agreed upon performance objectives via various SLIs.

The GNS in Tanzu Service Mesh binds the SLO agreement into the system so that all services within its scope can adhere to it. Once the system that can execute according to the SLOs that developers, product owners, and SREs have agreed to via the GNS, it can be thought of as a key architectural pattern for layering cloud native applications.

Building a resilient application

Let’s walk through how you can build a distributed resilient application using Tanzu Service Mesh.

The first step is to create a GNS in Tanzu Service Mesh. You can create a GNS that spans one or more Kubernetes clusters, which could then be deployed on-prem and in one or more clouds.

Next, monitor the health of your application by defining an SLO. You can set thresholds such as CPU, memory, latency, etc. as your SLIs and use them to define the SLO.

As demand for the application grows, it should be able to react to the increased usage. With that in mind, be sure to set up an autoscaling policy.

You can also set your SLO to influence the autoscaling behavior of your applications.

Tanzu Service Mesh monitors the SLIs for each microservice and automatically scales the application based on them. Depending on your needs, you can set the autoscaling policy to be efficiency-based (i.e., it will scale down when demand lowers) or put it in performance mode, which means it won’t ever scale down.

You can also continue to deploy your applications on more clusters and namespaces in order to increase capacity or have them be used for disaster recovery. Policies for the GNS in Tanzu Service Mesh will automatically be applied to these services as you scale out your infrastructure and add more namespaces. This video walks you through these new capabilities.

Stay tuned for our next post, which will cover a real-world scenario in which Tanzu Service Mesh SLOs and autoscaling policies are deployed and managed.

Kubernetes, Give Me a Queue

We double-click on our new Kubernetes Operator to show how the Kubernetes wrapper for the RabbitMQ API simp...

Want to Move the Needle for Observability? Join our Research Sessions at VMworld!

We are holding two different design studio research sessions at VMworld that will give you the opportunity ...

Application Resiliency for Cloud Native Microservices with VMware Tanzu Service Mesh

The SLIs in SLOs

Building a resilient application

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.