Tanzu Observability helps Zuora ensure high-quality service to their customers

80 percent of Zuora engineers adopted Tanzu Observability as a visibility tool

Tanzu Observability is now an essential tool for Zuora for improving customer quality and experience

Zuora team sends data directly from the apps into a data pipeline that feeds into Tanzu Observability


Zuora is the only solutions provider that offers complete order-to-revenue capabilities for modern businesses

Fallstudie herunterladen

Zuora's customers can manage quotes, orders, billing, and revenue recognition for the entire customer lifecycle on a single platform—all to help them successfully manage and grow their subscription businesses.


The challenge

To maintain high service quality levels as they scaled, Zuora had migrated from a monolithic architecture to a microservices approach, with each microservices development team taking full ownership and accountability of their particular service, end to end.

Zuora takes the service-level agreements (SLAs) they have with their customers very seriously, so each microservices team signs up for internal SLA. The teams must back up their commitments by measuring performance and adjusting quickly if something tracks outside objectives.

They started with a log-monitoring approach. However, using those logs to detect and figure out where something went wrong was tedious to do manually, and become ineffective as they grew. They added an open source, metrics-monitoring platform based on Graphite and Grafana to try to step things up.

But they found these open source tools to be quite resource-intensive. They had one engineer dedicated to ongoing maintenance work, and they projected that they would need to add two more engineers as they continued to scale. This cost was in addition to the hardware costs and the cloud costs for a hybrid architecture that partly used its own data center and partly used the cloud. They found that they were spending too much energy engineering this system of monitoring infrastructure rather than engineering innovative products for their customers.


Software programmer looking at screen


The solution

After 18 months using other tools, they switched to cloud-delivered Tanzu Observability so that they could free themselves of the mounting maintenance obligations. They managed the transition first by piping data from the incumbent tools into Tanzu Observability for parallel viewing by the engineers across the teams.

Now they’re sending data directly from the apps into a data pipeline that feeds into Tanzu Observability. The focus ahead is on getting all of the Dev and Ops teams on board with a metrics mentality, establishing the dashboards that give them visibility into how they’re doing, driven by analytics on metrics gathered across their applications, containers, clouds and infrastructure.

Next steps will expand further into alerts and predictive metrics so that Dev and Ops teams can see issues coming before they become a big problem. They are working to establish alert standards, so that each team approaches the problem the same way. They’ll also migrate visibility of the metrics into management and operations teams while increasing their data sharing and collaboration.

As we grew and scaled, we realized that [Tanzu Observability] was the fastest and most cost-effective way to get what we wanted for monitoring and alerting, compared to what we had to invest in time, engineers, and resources for the same from open source tooling.”
Karl Goldstein, Senior Director of Engineering, Zuora

The results

At Zuora, around 80 percent of the engineers across different teams ultimately began adopting Tanzu Observability as a visibility tool. The teams with the highest volume of transactions were the first to recognize the benefits and adopt the metrics approach; the rest of the teams are coming along as well. At present, Zuora uses Tanzu Observability for a variety of use cases:

  • Runing periodic “synthetic transactions” (transactions created for monitoring) so that they can feed response time and derived uptime metrics to be visualized and alerted by Tanzu Observability.
  • Monitoring application performance and error rates to diagnose when SLAs aren’t met.
  • Analyzing performance before and after point-in-time events, such as identifying whether a new deployment impacted performance of other services.
  • Publishing dashboards by each team to validate that it’s staying with their SLA.

Tanzu Observability is now an essential tool for Zuora to ensure that its customers get the quality of service they’ve grown to expect to successfully transform and win in the subscription economy.