Financial Compliance: New Frontiers with Data Science

February 18, 2015 Niels Kasch

Joint work performed by Niels Kasch and Mariann Micsinai of Pivotal’s Data Science Labs.

Financial institutions must overcome the shortcomings of existing compliance pipelines that do not live up to the standards of expanding new regulations. In this blog article, we share experience from real-life engagements and show how an innovative, agile and real-time computational platform can re-architect compliance workflows and provide several advantages over existing solutions.

Our solution, a data lake platform coupled with cutting edge data science techniques, helps to identify underlying risk and fraud while reducing the compliance department’s overburdened, manual review process. The approach also advocates a flexible user interface to promote an adaptive, continuously learning compliance framework.

Current Challenges

After the financial crisis of 2008, banks were subjected to more and more intense regulatory scrutiny. US regulatory agencies tightened their enforcement of responsible conduct and placed renewed vigor into the elimination of unfair, deceptive or abusive practices. As a result, a myriad of strict rules under the comprehensive Dodd-Frank and Basel Committee regulations are enforced. Violations of any of these rules are coupled with mounting fines and litigation costs as evident in recent news headlines quoting heavy fines.

Banks face immense challenges to revise their compliance and governance infrastructure to meet regulatory standards in a timely manner. These challenging areas for financial institutions include:

Aggregation and real-time analysis of large, diverse and rapidly growing datasets across the institution
Real time data reporting at all levels of granularity
Overburdened manual review within compliance departments
Flexibility to respond to new regulations, e.g., changing OFAC lists, new sanctions against Russia, and implementations of yet unimplemented rules under the Dodd-Frank Act

Next Generation Compliance Platform

Current compliance systems only focus on a small part of compliance needs, be it for archival purposes or basic analytics. To prevent the next Bernie Madoff or Libor scandal, a next generation storage and processing platform is required.

The platform needs to address three main components:

A compliance data lake with:
1. The ability to handle the archiving and storage requirements of large volumes of complex, structured (e.g., transactions) and unstructured (e.g., text data: emails and chats) data.
2. A home to easily integrate various data assets such as transactions, securities information, and governmental watch lists.
3. An open and extensible architecture to better facilitate enhancements, such as an increase in capacity to address scale and performance requirements or an integration with new technologies as the business requirements change (e.g., real time data ingest and model scoring).
Capabilities for scaling predictive models, advanced machine learning, and natural language processing over large, compliance-related data sets while supporting agile data science methodologies.
Support for additional compliance apps and user interfaces that drive analytical insights and decision making for business users as well as capture new feedback-based data to increase the predictive power of the models.

The platform that brings all of the above components together is Pivotal’s Big Data Suite (BDS) with an ability to add Pivotal Cloud Foundry (PCF) as a PaaS for additional application or integration workloads. While PCF is the leading enterprise PaaS, Pivotal’s BDS allows for extensive storage and agile analytics on massive data sets using three paths—an MPP and column store database, in-memory data processing, or Hadoop. This combination is a data scientist’s dream because it facilitates agile data exploration and data integration coupled with advanced machine learning algorithms (c.f. Madlib and MLlib) to derive the most value from your data.

The Next Generation Financial Compliance Solution

Before getting into the details of the analytical components, it is worth pointing out how the architecture can extend for similar analytical scenarios with additional requirements for high-scale applications or integrations, as with financial trading information. These cases can benefit from inserting PaaS-based services at various places within the architecture to provide automated scale, lower development complexity, and fast, iterative development cycles. More importantly, the next generation financial compliance solution is driven by advanced analytics capabilities. Next, we will address each of the analytical components individually.

Fig. 1: Overview of next generation analytics solution for financial compliance. The 3 main components (the data lake, analytics, and feedback-centric user interface) and their interactions are depicted.

The Financial Compliance Data Lake

The data lake is a data-centered architecture, where all types of data come together in one place. The key here is to bring as much information together as possible to support the analytics behind financial compliance. For example, to analyze emails and chats, the data lake can serve as the archiving solution while simultaneously making the data available for analytics. Pivotal’s Big Data solution incorporates an MPP RDBMS that enhances data integration tasks such as resolving and joining entities across multiple and diverse data sources. Such a capability also allows for the integration of unstructured text with structured transactions (e.g., transactions, trades). This makes catching insider trading easier since compliance analysts can link trades to various communication channels. But, the data lake does not stop there. For example, an organization’s hierarchy can be part of the data lake as well and support legislative requirements which prohibit certain interactions within a company (e.g., Chinese wall policy between traders and trade clearing). Other data assets can also be incorporated into the data lake to benefit compliance use cases and include updates or retention policies for:

News feeds
OFAC watch lists (e.g., countries prone for AML)
Securities information
Portfolio information and risk metrics
Other communication channels (phone conversations, social media, text messages)
Access logs (building entry logs system access logs, weblogs)

Financial Compliance Analytics

The analytics pipeline is the heart of the solution. It determines whether a given trade or communication item violates regulation or not. The platform supports traditional e-discovery methods, such as search, but, more importantly, it features a complete machine-learning pipeline with multiple predictive models and modeling techniques:

Classification—To identify irrelevant messages such as automated notifications, newsletters, out-of-office messages, and print job notifications.
Graph analysis—To build communication profiles of individuals. This technique is often used in security analytics and malware detection to identify anomalous behavior. Graph analysis can establish hot spots of fraudulent activity based on who is talking to whom.
Text analytics—To identify the language behind fraud, determine the sentiment and certainty in the language of a trader before and after executing trades. Semantically, it can interpret if too much information (e.g. deal coloring) was involved with communication partners.

Financial Compliance User Interface and Feedback

The point is not to replace compliance analysts, but the approach focuses their attention on actual fraud cases. To enable effective compliance reviews for analysts, a dynamic user interface is an absolute must. The user interface provides the opportunity to make the system smarter as a whole. For example, a properly designed UI can solicit decision-making information from compliance analysts that can be automatically integrated into a feedback loop for analytics—a continuous learning system that gets smarter over time. Such feedback is instrumental to the system for the following reasons:

Keeps the system up to date, for example, in the face of changing regulations.
Provides more algorithmic training information in the form of fraudulent trades or communications.
Injects additional domain knowledge such as new expressions used by traders or new types of fraudulent transactions.

In our compliance pipeline, the combination of platform (Pivotal Cloud Foundry and Pivotal’s Big Data solutions), data science (Pivotal Data Labs), and software development (Pivotal Labs) come together in unison to stand up next generation financial compliance solutions.

Learning More

In this blog, we described the most important factors that financial institutions face in the current regulatory environment. We presented an innovative, agile and real-time computational platform that addresses financial compliance needs and explained how cutting edge data science can reduce the compliance department’s review process. The framework is easily extended to other industries where fraudulent activities need to be identified, for example, in the insurance industry. For more information on aspects of the solution:

Pivotal Cloud Foundry: Product | Blog
Pivotal Big Data Suite: Product Info | Documentation | Download | Blog Posts
Pivotal Data Labs: Blog Posts | About
Pivotal Labs: Website | Blog

About the Author

Biography

How to Define Your Value Proposition

In 2014, Pivotal Labs hosted over 50 sessions of Product Office Hours, an hour of free advice to help entre...

Persistent Google Hangouts for Distributed Teams

(Note: this is a companion piece to Simon Holroyd’s post: Do the Google Hangout Hop) Our teams have had suc...

All About Agile

Learn More

Return to Home

Financial Compliance: New Frontiers with Data Science

Current Challenges

Next Generation Compliance Platform

The Next Generation Financial Compliance Solution

The Financial Compliance Data Lake

Financial Compliance Analytics

Financial Compliance User Interface and Feedback

Learning More

About the Author

Previous

Next

Related content in this Stream

Learn how to start with platform engineering in an enterprise. Our experts address challenges, staffing, guidelines, and more. Ensure success running workloads on the platform.

We're unveiling the open source release of Application Portfolio Auditor, a tool designed to help organizations automatically analyze and make sense of their most complex application portfolios.

Unveil the power of product management pairing, a strategy that propels VMware Tanzu Labs' PMs to collaborative success and innovative product solutions.

This blog explores six prioritization techniques for product management and when they can be employed for maximum return.

Our Hands-On Labs offer you practical, real-world experience with VMware Tanzu products, Kubernetes, and modern application development.

This blog covers what makes machine learning different than traditional software delivery in the context of agile software delivery and what product managers must do to adapt.

VMware Tanzu Developer Advocate and Enlightning host Whitney Lee speaks with Denee Lake, a modern compliance architect with VMware Tanzu Labs, about continuous authority to operate (ATO), a term commo

Facilitation underpins the way VMware Tanzu Labs balanced teams work together. As with any other process or skill that a team practices, you should iterate on your remote facilitation practices.

In this blog post, we’ll explore the factors that go into the selection of development framework for mobile—whether to build native apps, use a specific multi-platform framework, or build a web app.

Today we are announcing the open sourcing of the Educates platform. Educates is a cloud native platform to deliver hands-on workshops and labs.

Design systems take a lot of work and collaboration. Read about how Tanzu Labs approaches design systems from a design operations perspective.

Jennifer Pahlka's book highlights the unique challenges government entities face. See how Tanzu Labs works with public sector clients to build technical competency and deliver great products quickly.

In this video, see how Tanzu Labs helps organizations build a modern cloud native application platform with a frictionless developer experience to speed the delivery of applications to production secu

Organizations undergoing application modernization initiatives run into three common roadblocks. Here are approaches VMware Tanzu Labs recommends for overcoming them.

See how VMware Tanzu Labs enables customers to build better software, with a systematic approach that focuses on measurable progress and collaborative learning.

VMware Tanzu Labs takes an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need minimal Spring configuration.

Do the right thing. Do what works. Be kind. These core tenets, a.k.a. the Pivotal Way, drove Pivotal Labs client engagements for decades and still guide the approach Tanzu Labs takes today.