Big Data Meets Fast Data To Fight Fraud (And More)

December 20, 2016 Jeff Kelly


36771-GemFIre-9-sfeatured Pivotal GemFire 9.0 is now available. The release includes a new feature, called the GemFire-Greenplum Connector, that enables users to more easily tackle use cases that require a blend of analytical and transactional processing, such as fraud detection.

Why is this capability important?

Consider, as more businesses interact with their customers digitally, ensuring trustworthiness takes on a critical role. Over 17 million Americans were victims of identity theft in 2014, the latest year for which statistics are available. Fraudulent transactions stemming from identity theft—fraudulent credit card purchases, insurance claims, tax refunds, telecom services, etc.—cost businesses and consumers over $15 billion that year, according to the Department of Justice’s Bureau of Justice Statistics. Worldwide, the cost of identity theft is estimated to top $5 trillion annually.

It’s not surprising, then, that detecting and stopping fraudulent transactions related to identity theft is a top priority for many banks, credit card companies, insurers, tax authorities, as well as digital businesses across a variety of industries. The art of building these systems typically relies on a multistep process, includingthe difficult steps of moving data in multiple formats between analytical systems, which are used to build and run predictive models, and transactional systems, where the incoming transactions are scored for the likelihood of fraud. Analytical systems and transactional systems serve different purposes and, not surprisingly, often store data in different formats fit for purpose. This makes sharing data between systems a challenge for data architects and engineers—an unavoidable tradeoffs, since trying to use a single system to perform two very different tasks at scale is a fool’s errand.

Different Systems Excel At Different Parts Of The Fraud Detection Process

In the case of Pivotal, a number of customers use Pivotal Greenplum, a massively parallel processing analytical database, to support their fraud detection efforts. With Greenplum’s horizontal scalability and rich analytics library, analytics and data science teams can quickly iterate on anomaly detection models against massive data sets. How those models are used to catch fraud in real-time, however, requires using them in an application. Depending on the velocity of data ingested through that application, a “fast data” solution may be required.

This is where Pivotal GemFire, a Java-based transactional in-memory data grid, supports fraud detection efforts, as well as other use cases like risk management. Until now, the most common method for moving data between the two systems is to perform a flat file export from Greenplum using GPFdist, write custom code to transform the CSV files into plain old Java objects (POJOs), then load the data into GemFire.

We know from experience working with customers that writing and maintaining custom code required to move data between Greenplum and GemFire over the course of an indefinite life cycle is time consuming and inefficient. Such code gets very complex very quickly, requiring the creation of development roadmaps and thorough documentation. And when either Greenplum or GemFire is upgraded to a new version, custom written code that connects the systems must be manually rewritten to maintain compatibility and QA tested. This custom code is also known to negatively impact system performance.

Closing The Feedback Loop Between Big And Fast Data

To simplify and speed up this important process, Pivotal developed the new GemFire-Greenplum Connector. The GemFire-Greenplum Connector is an extension package built on top of GemFire that maps Greenplum tables and GemFire POJOs. It enables parallel data movement between the two scale-out systems. With the GemFire-Greenplum Connector, the contents of Greenplum tables can now be easily loaded into GemFire, and entire GemFire regions can likewise be easily consumed by Greenplum. The upshot is that data architects no longer need to spend time hacking together and maintaining custom code to connect the two systems.

In fraud detection scenarios, this means it is now seamless to move the results of predictive models from Greenplum to GemFire via an API. Once the scores are applied to incoming transactions, those transactions deemed most likely to be fraudulent can be surfaced to investigators for further review. Once cases are resolved, the results—was the transaction or claim fraudulent or not?—can be easily moved back to Greenplum from GemFire to continuously improve the accuracy of the predictive models.

Pivotal maintains theGemFire-Greenplum Connector and ensures it always compatible with the latest version of both Greenplum and GemFire. The GemFire-Greenplum Connector also improves performance compared to homegrown solutions. One Pivotal customer that used the beta version of the GemFire-Greenplum Connector to support its own fraud detection system saw order of magnitude improvements in analytic logic performance and increased rates of data motion. The company also significantly reduced its overall code footprint by getting rid of the previously required hacked data pipelines, which in turn reduced the workload on data architects tasked with managing them. Pivotal’s John Knapp, a solutions architect at Pivotal that played a role in developing the GemFire-Greenplum Connector, gave a good overview of this case study at Geode Summit in March. You can watch John’s presentation here.

Of course, fraud detection is just one use case that will benefit from the new GemFire-Greenplum Connector. Any smart application or process that uses Greenplum to develop and run predictive models against massive volumes of data and GemFire to apply the results to incoming transactions is a candidate for the GemFire-Greenplum Connector—including next best offer, lateral movement detection, customer churn analysis, and more.

GemFire-Greenplum Connector Now Available

The GemFire-Greenplum Connector is available today as part of GemFire 9.0. While the connector is production-grade and ready to support mission critical intelligent applications now, Pivotal is committed to continued innovation on this product. Future versions of the connector will expand the semantics and options for data movement between the two systems.

We’re extremely excited to make the GemFire-Greenplum Connector fully available to all GemFire customers. Solving complex business challenges, such as fraud detection, often requires a blend of large-scale analytical processing and high-speed transactional processing. The GemFire-Greenplum Connector bridges the gap between the two, opening up entirely new possibilities for enterprises to build high value, impactful applications.

The GemFire-Greenplum Connector works with GemFire  v9.0. You can check out GemFire 9.0 documentation here, GemFire-Greenplum Connector documentation here, and there are a number of great resources, including white papers and case studies, on the GemFire product page.


About the Author

Jeff Kelly

Jeff Kelly is a Director of Partner Marketing at Pivotal Software. Prior to joining Pivotal, Jeff was the lead industry analyst covering Big Data analytics at Wikibon. Before that, Jeff covered enterprise software as a reporter and editor at TechTarget. He received his B.A. in American studies from Providence College and his M.A. in journalism from Northeastern University.

Follow on Twitter Follow on Linkedin
“How to Operate like a Startup" - Chicago CIOs talk about transforming with Pivotal
“How to Operate like a Startup" - Chicago CIOs talk about transforming with Pivotal

Hear never revealed before details of Allstate and others regarding their award-winning transformations and...

Partner Interview: Fast, Highly Available Data With Redis Labs
Partner Interview: Fast, Highly Available Data With Redis Labs

One tactic Pivotal Cloud Foundry customers use to make their apps perform better and scale more easily, is ...