Pivotal HD 2.0 to Help Enterprises To Get More Out of Hadoop With a Business Data Lake

March 17, 2014 Nikesh Shah

featured-data2dollarsPivotal HD 2.0 will help companies get more out of their Hadoop investments by providing industrial-grade, enterprise capabilities and an upgraded Pivotal HD for gaining actionable insight in real-time.

Pivotal HD 2.0 is the first platform to fully integrate an enterprise in-memory SQL datastore, GemFire XD, with advanced analytical data services on top of Hadoop to build out a flexible and comprehensive data science and big data toolset that is prepared for the enterprise.

To explain simply, this takes the capabilities of the all-encompassing data management Hadoop system and provides two key services bundled into Pivotal’s unique enterprise distribution: first, an in memory SQL Database that allows data to be ingested, processed, analyzed, and used immediately; and second, a powerful set of analytical services that provides businesses with a head start toward unlocking the value of their data. Combined, this new release creates an enterprise grade data management system that allows companies to realize even more of their Hadoop systems and realize the full potential of their data, faster.

Today, enterprises are using Pivotal HD to maximize their Hadoop investments by developing data savvy software in a more flexible, faster way than they could do with Hadoop alone. For example, an energy company is using Pivotal HD to simultaneously collect sensor data, wind, and weather conditions from wind turbines then immediately analyze both the real-time information captured and historical weather patterns archived in HDFS to predict future weather conditions and adjust the turbines accordingly, optimizing energy collection and revenue possibilities at the same time.

Enterprise Real-time Data Services on Pivotal HD

Enterprise-grade In-Memory Processing for Hadoop

Pivotal GemFire XD introduces some powerful new real-time, bi-directional data management capabilities for enterprise Hadoop users. Inheriting over a decade of R&D and intelligence by some of the largest financial institutions as well as the highest traffic eRetailers, GemFire XD is essentially Pivotal SQLFire optimized for use on Pivotal HD’s HDFS.

Pivotal GemFire XD can ingest streams of transactional data using SQL/ODBC, Java/JDBC, Spring Data, and other interfaces while processing events, analyzing, and scoring the data in real-time. It stores data in-memory and writes to Pivotal HD’s HDFS without an automated or manual ETL process in between. Bi-directional integration with GemFire XD and Pivotal HD means that queries can become interactive, combining real-time data from in memory with historical HDFS data and actually updating or creating new data sets as appropriate. This creates the industry’s first true enterprise-grade real-time analytics for OLAP and OLTP in a single platform.

Pivotal HD

Analytics and Machine Learning

Pivotal HD 2.0 provides data scientists and business analysts with a ready-to-use library of analytic algorithms that span several types of intelligent toolkits. These types of algorithms have allowed innovative companies like Google to power PageRank, Wall Street to run algorithmic trades, Amazon to optimize recommendation engines, Apple to provide speech recognition, and GE to optimize turbines and engines. The new release includes a broad set of ready-to-use algorithms that solve the enterprise’s most common critical use cases, such as network optimization for TeleMedia companies, smart grids for energy & utilities, bio-informics for pharma, and supply-chain optimization for discrete manufacturing.

Expanded analytic capabilities include:

  • Enterprise Integration for GraphLab OpenMPI. Pivotal HD 2.0 includes an integrated and supported version of GraphLab OpenMPI (Message Passing Interface), which is a popular set of algorithms for graph analytics that data scientists use for insight, such as page rank, collaborative filtering and computer vision.
  • Enhanced MADLib support. MADlib 1.5 now has over 50 predictive analytic algorithms for relational data to run in HAWQ, eliminating costly data movements and long data science cycles.
  • Expanded HAWQ capabilities. Pivotal’s high performance SQL query engine, HAWQ, now supports the Parquet columnar storage format for performance, opening this powerful capability to this popular file type. This also enables hybrid data processing methods on similar data types, without requiring data movement or transformations.
  • Reduced learning curves. HAWQ now enables data scientists and data engineers to do various data learning experiments using custom functions for R, Python and Java, reducing the time it takes to ramp up and start leveraging Pivotal’s next generation offering.

With these capabilities, data driven business groups across a variety of industries gain a massive head start toward developing analytics and applications for more intelligent and innovative products and services with the languages and frameworks they know and love.


Pursuing the Business Data Lake with Pivotal HD 2.0

For data-driven businesses, this is a major technological advance toward the Business Data Lake. Most IT shops see the value of real-time data and analytics, but the path to get there has been hindered by two physical constraints: the cost of data storage and movement, and the complexity of supporting a virtually endless array of data types, volumes and analytic needs.

The Business Data Lake solves this by pooling all data in the same environment and using new Big Data technologies to simultaneously remove the cost constraints of data storage and movement and providing the right analytic tools to accelerate time-to-insight for data scientists and business analysts alike.

Pivotal HD 2.0, with HAWQ and GemfireXD, is a major advancement for enterprises looking to fuse all their data needs in a single environment and pursue the time and cost saving benefits of the Business Data Lake.


About the Author


5 tenets of Web Smiles
5 tenets of Web Smiles

Make me Happy Usability has always been an important facet of success in whatever measure, on the web. Desp...

Test After in Java: Subclass and Override
Test After in Java: Subclass and Override

On a recent project, my team inherited a large, lightly-tested Java/Spring codebase. As we began to modify...