The Pivotal and General Electric partnership continues to provide insight into GemFire and HAWQ’s success driving real-time results in “Industrial Big Data”. A technical white paper, titled Bridging high velocity and high volume industrial big data through distributed in-memory storage & analytics, recently published by GE Power & Water and GE Global Research highlights the benefits of running Pivotal products for storing large amounts of data, while simultaneously taking in new data and deploying real-time analysis.
The Industrial Data Challenge
A global network of gas turbines, aircraft engines and medical devices means highly-sensored machines require remote monitoring to ensure productivity, reduce waste and unplanned downtime. Previously, separate systems processed high velocity incoming data and high volume historical data, but neither could handle this while performing fast analytics. Analysis of increasingly large high velocity datasets meant processing, storage and analysis would need to be combined for a hybrid system.
GE’s belief in in-memory data grids (IMDG) helped spur a multi-tiered approach to testing its system’s abilities. A number of IMDGs were considered but GE ultimately selected Pivotal Gemfire to prove out the impact on their productivity. The goal of enabling GE to take full advantage of their ever-growing data load would unfold over little more than 10 days of testing.
Realizing GE’ Vision of the Industrial Internet
Three rounds of testing on Gemfire resulted in increasingly positive proof points at every level of measurement. Benchmark, stability and large-memory testing measured data throughput, scalability, and multi-day load by continuously updating data storage and high-velocity data in memory storage over time. They were ultimately able to prove out system storage over 5TB of data in memory while continuously ingesting new data at a rate of 100,000 time series data points per second, pushing the boundaries of what had previously been achieved prior to this effort.
Benchmark testing showed a continuous ingest of data, constant analysis and scalability. Stability results proved the system could run for five days while ingesting data and storing only the most recent 10 minutes in memory. The final round ran for five days, ingesting at least 100K points per second of data and successfully storing three days’ worth of replicated high-velocity data in memory across a large cluster of 46 machines—all while running near real-time analytics and continuously updating.
Enabling Big Data Going Forward
GE’s testing demonstrated that IMDG-based infrastructure can successfully handle high velocity and high volume requirements while taking in and storing data and simultaneously running real or near-time analytics. Pivotal GemFire was called out for its plans to integrate its IMDG applications on Apache Hadoop® and enabling streaming data for near real-time processing as well as batch analytics on historical data. The report further says that Pivotal’s approach mirrors GE’s own vision for its next-generation of big data architectures.
- Case Study: 300% Increase in App Performance with India Rail on Pivotal GemFire
- The Big Data Story (and Webinar) Behind Chinese New Year
- Case Study: Scaling Reservations for the World’s Largest Train System, China Railways Corporation (with infographic)
- Pivotal GemFire Blog Posts
- Pivotal GemFire Product Info, Downloads, and Documentation
About the Author