How Communication Helps T-Mobile Keep Its Applications Up

October 13, 2020 Derrick Harris

T-Mobile has been a vocal Tanzu Application Service user for years, and as that environment grows—along with its younger Kubernetes environment—the wireless provider is learning a lot of valuable lessons. In this session from our recent SpringOne event, Brendan Aye and James Webb—two of T-Mobile’s cloud native platform leaders—share their experiences and strategies on a topic that should resonate with everybody who’s ever had to manage an application platform: what to do when something goes wrong.

Aye and Webb get into specifics in the embedded video, but keep reading for some highlights from the talk that lay out the scope of T-Mobile’s cloud native footprint, as well as how their team approaches communicating about and resolving platform issues. For more information about the company’s journey over the past couple of years, scroll to the bottom.

Cloud native environments are important and growing fast

“We have 30 [Tanzu Application Service] foundations supporting 75,000 application instances. This is a fairly important platform to the business. Most of our middleware and a lot of our order-management and digital-facing channels run on the platform now. Our Kubernetes environment is about 90 clusters, 22,000 pods. We've been hovering right around the 100,000 [container] mark for a while...

“I know...100,000 containers does not sound like a lot to a lot of large companies, but for us it's a pretty big deal. These are not workloads that are running ephemerally, being spun up and spun down every day. It's more a place for our core applications that we used to be running on VMs or bare metal or other forms of infrastructure, and moving them to a platform that provides an agile, consistent experience for teams to run their workloads without having to go through all the pain of maintaining their infrastructure like they had to do in the past.” —James Webb

Slack has opened up communication

“Another big thing for us is Slack. We adopted Slack very early when we deployed these platforms. We have customer channels for both [TAS] and Kubernetes, where effectively all our customers are members of those channels. We broadcast all of our notifications there around platform upgrades, any kind of incidents we have, new features, breaking changes, things of that nature. So, we can very quickly interact with all of our customers in one place, instead of having to shoot out emails and have people reply-all, and everyone gets fatigued from that kind of stuff...

“Because of that, we have customers that report problems very quickly because that's our primary method to engage with [support]...So that's generally the fastest way that we find out about incidents that we haven't caught with our monitoring, is when we see one or two or three customers report similar behavior or chime in on someone else's report. We know pretty quickly what's important and what might be causing issues on our platform.” —Brendan Aye

“Everything’s production to us”

“The other thing that we do for our internal customers is we don't evaluate things in terms of production and non-production. Everything's production to us. All of our customers are important, whether it's just internal developers who are trying to meet deadlines for their project, or whether it's external customers who are interacting with the website to buy or upgrade a phone.

“Nothing is more frustrating to me than hearing someone say, ‘Well, it's just non-production; I don't care.’...As a culture on our team, we do not say that. Every customer is important to us.” —James Webb

Don’t blame, just fix

“A big thing that you see in many large corporations is a mean time-to-blame, where customers that have incidents for their applications want to shift blame to someone else as quickly as possible. We've tried to really not play that game. We don't want to be in a position to try to blame someone else for an issue, or to be embarrassed because our platform has a problem itself.

“If it's our fault, we accept responsibility. If it's not, we demonstrate why and make sure leadership knows, as well...We want to fix the issue, explain what went wrong, and talk about how we can try to prevent that in the future.

“[These are] new platforms for our company—a lot of new technologies, new architectures. So when we see customers doing things that will get them in trouble...we make sure that the customer knows about it...We will help them redo architecture, redo any kind of changes they want to make, because we want them to be successful on the platform. If their app dies and it's not our fault, it still is a bad look overall to see apps failing on the platforms that our team manages.” —Brendan Aye

More from T-Mobile

How T-Mobile laid the foundation of its digital future with VMware Tanzu Application Service

Making Multi-Cloud a Reality at T-Mobile

T-Mobile’s Same-Day Test, Deploy, and Production Transformation

T-Mobile Success Story of Migrating Monolithic Application to Spring Cloud Services

3 Reasons Behind T-Mobile’s Success with Kubernetes

About the Author

Derrick Harris is a product marketing manager at VMware.
More Content by Derrick Harris

Announcing the General Availability of VMware Tanzu Kubernetes Grid 1.2

We are excited to announce the general availability of VMware Tanzu Kubernetes Grid 1.2 with support for Ku...

From Manager to Transformation Leader

Neville George, manager at Comcast; Jon Osborn, IT executive at Bell Tracy, Ltd.; and Jana Werner, head of ...

How Communication Helps T-Mobile Keep Its Applications Up

Cloud native environments are important and growing fast

Slack has opened up communication

“Everything’s production to us”

Don’t blame, just fix

More from T-Mobile

About the Author

Previous

Next

How Communication Helps T-Mobile Keep Its Applications Up

Cloud native environments are important and growing fast

Slack has opened up communication

“Everything’s production to us”

Don’t blame, just fix

More from T-Mobile

About the Author

Previous

Next

Related content in this Stream

VMware Tanzu empowers Netflix accelerates its service evolution and boosts the capabilities of its development teams. Tanzu helps to provide them with the platform to run on and scale.

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

Uncover open source risks and the 'Zero CVE' myth with insights on continuous lifecycle management. Discover how VMware Tanzu supports diverse projects effectively.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.