Box created dashboards across 40 core services that showed compliance and error rates in real time
Visualizing and exploring data has helped Box’s engineers discover usage patterns and optimize their infrastructure
Setting up and fine-tuning an alert now takes seconds, compared to minutes or hours with other platforms
More than 85,000 businesses use the service, including 65 percent of the Fortune 500. As companies in diverse industries transition content to the cloud, Box is scaling to meet the demand for syncing and sharing files quickly and securely
Box’s customers count on fast upload speeds, site reliability, and quick resolutions to any issues that emerge. The firm was already using an open source, metric-based data analytics platform to monitor performance. However, it proved to be unreliable and was becoming expensive for Box to scale and maintain. Box was also using a log-based platform, but it was slower and could only support limited queries once problems had already arisen. Box needed a data analytics platform that would allow it to monitor all of its infrastructure and services reliably and get alerted to problems in real time.
After Box adopted VMware Tanzu Observability, it became the go-to tool for development and operations engineers to understand the health of all the company’s infrastructure and services. Within four months, the entire engineering team was leveraging Tanzu Observability. Now hundreds of engineers use the platform daily for managing application performance, troubleshooting, monitoring production and collecting business metrics.
During a trial of Tanzu Observability, Box requested a feature that would enable grouping by dimension rather than by host, as they continually reprovisioned the underlying infrastructure. The Tanzu Observability team added the capability in three days. “That helped build a lot of confidence in strong support from [Tanzu Observability] for our needs,” says Pierre-Alexandre Masse, Box’s engineering director.
To gain a more refined understanding of service-level agreements, Box created dashboards across 40 core services that showed compliance and error rates in real time. Setting up and fine-tuning an alert now takes seconds, compared to minutes or hours with other platforms, Masse says. As a result, individual teams can customize alerts and add more of them, improving reliability. Through an automated process, Box rolled up the data from each service and pushed it back into Tanzu Observability as a new metric. A global view into whether Box was meeting service-level agreements could easily be shared among multiple users. In all, Box uses hundreds of dashboards and 728 alerts to learn about problems as or before they arise.
[Tanzu Observability] gives us very quick insights and the best query language we could find to really explore our data and understand it. As we rely on data and metrics to make our decisions, this makes [Tanzu Observability] now an essential and indispensable part of our day-to-day operations.”Pierre-Alexandre Masse, Engineering Director, Box
The ease of visualizing and exploring data has helped Box’s engineers make discoveries about usage patterns and optimize infrastructure. For example, they were able to see that some data they were collecting was not useful and to observe previously unknown drops in data rates.
Thanks to a reliance on real-time metrics rather than standard log queries, Tanzu Observability has allowed Box to improve application development speed, diagnose live site issues, and predict failures ahead of time. Masse believes Tanzu Observability will continue to be a core support for Box as it grows: “We have full confidence that [Tanzu Observability] will be able to scale along with Box, both in dealing with more data and more users.”
Read about the Tanzu Observability automation tooling that Box uses.