The VMware Tanzu Labs Data Transformation team recently helped a major global bank predict corporate customer churn with a 60-day lead time. We also identified more than $10 million of revenue that was at risk of churning along with specific factors that the bank’s relationship managers could proactively address with their clients to minimize that risk.
In this post, we explore how we helped the client using a collaborative approach grounded in business logic while gaining new insights from data, as well as how we can use this approach to help our clients retain customers in any industry.
The opportunity: Data science applied to churn prediction
Avoiding customer churn (or attrition) is a high priority for many businesses. For corporate banks, it’s a challenge that costs them an estimated 10 to 15 percent of gross revenue per year. It’s what prompted the Treasury Services division of a Fortune 500 bank to come to the Tanzu Labs Data Transformation team with the goal of using its data to better understand, predict, and reduce that churn.
The project presented several opportunities that we seized on—most importantly, the chance to deliver a valuable and interpretable solution in collaboration with the key stakeholders, not just a black box model or a PowerPoint summary.
An essential aspect of the project was making the job of the bank’s relationship managers (RMs) easier. Too often, Treasury Services RMs would only notice churn once their clients had substantially reduced their balances or number of transactions, or when they voiced complaints. In either case, by that point it was difficult to regain the lost business. Leveraging data and machine learning could give RMs substantially more lead time for interventions, plus additional information about their clients’ situations.
It was also an opportunity to bring data from disparate sources together under one roof, some of which was siloed within other parts of the bank or held externally, and as such was difficult to access due to administrative challenges.
The approach: Applying machine learning to customer data
Throughout the project, we worked closely with business leads and subject matter experts to understand both the domain and the possible causes of churn. This allowed us to validate our approach on a regular basis, and to prioritize high-impact data exploration, feature engineering, and modeling.
We began the project with exploratory data analysis, identifying the available data sources and asking questions that allowed us to build hypotheses about what might be driving the churn. We not only reviewed the bank’s internal data, but we brought in relevant external data, including data about economic and political conditions in each client’s geography. We also considered client-specific data from news sources as well as syndicated financial and operational data (e.g., Dun & Bradstreet).
Once we’d finished collecting and cleaning the data, we engineered several informative features, including client size (based on the bank’s revenue), rolling averages and standard deviations of transactions and balances, month-over-month and year-over-year changes, and derived metrics. Given that the bank is based in the U.S., we also accounted for U.S. interest rates and GDP fluctuations as we were developing those metrics.
The modeling methods we used focused on time-series classification techniques, which we implemented using free and open source Python libraries. We explored a variety of machine learning algorithms to predict churn, with an emphasis on models that were both powerful and interpretable—in other words, models that could predict which clients were at risk and also shed light on why they were classified as at-risk or not. To reach an acceptable trade-off of predictive power and interpretability, we used an open source tool called SHAP, which leverages tree-based or deep learning methods to estimate the specific impact of each input variable (i.e., feature) on each model prediction.
SHAP “force plots” showing the highest-impact variables on churn risk for two hypothetical clients (red variables push the risk score higher, while blue variables are interpreted as mitigating risk)
The results: At-risk customers identified sooner, allowing timely interventions
Using models with SHAP visualizations, Treasury Services stakeholders and RMs were not only able to see which clients were at risk of churning, but could better understand and address the contributing factors, such as sensitivity to interest rates, idiosyncratic corporate events, or economic conditions in specific geographies, among many other possibilities.
Crucially, the models alerted them to potential churn 60 days in advance of a substantial loss of revenue, as well as 30 and 15 days in advance. These lower lead time models (30-day, 15-day) served as additional lines of defense against churn; they were also used to monitor the effectiveness of interventions made around the 60-day mark. In the meantime, they gave RMs a chance to update or tweak intervention strategies.
The models also enabled the bank to quickly identify more than $10 million of revenue that was at risk, especially from large and medium-sized Treasury Services clients. The ability to analyze, identify, and subsequently monitor these risks using one easily accessible tool—and proactively tackle churn in the process—was a quantum leap for the bank’s capabilities.
Additionally, Tanzu data scientists collaborated with the bank’s internal data scientists so they could integrate, update, and expand upon the project going forward. We also presented results to executives and other end users of the models—and incorporated their feedback—to help with organizational buy-in and to empower all stakeholders to take advantage of this data-driven approach to churn prevention.
Learn more about Tanzu Labs Data Transformation services, including how we can help you modernize your data systems and solve difficult business problems using data science.
About the AuthorMore Content by Ryan Rappa