Pivotal HD differs from other Hadoop distributions in several important ways. It offers the flexibility and scalability of Hadoop while enabling a robust, integrated suite of services which are available through the Pivotal Big Data Suite. In a video interview for the Big Data & Brews series, Pivotal’s Chief Scientist Milind Bhandarkar shares a beer with Datameer’s CEO Stefan Groschupf and provides an overview of the many features that differentiate Pivotal’s Hadoop distribution from the rest.
Bhandarkar diagrams Pivotal’s Big Data offering from the ground-up, beginning with the bottom layer, which at its core is native Apache HDFS. In addition to core Hadoop functions such as MapReduce, Pivotal’s stack integrates HAWQ, a whip-fast, 100% ANSI compliant SQL engine ported from Pivotal Greenplum Database, which operates atop HDFS.
This approach presents a number of advantages, including speed, the ability to create expressive queries in SQL, and security and monitoring tools.
Another advantage, as Groschupf states in the interview, is that migration from a traditional MPP database to a Hadoop-based architecture “could be absolutely pain free.” Aside from the removal of append-only tables due to HDFS limitations, “The execution engine essentially remains identical,” Bhandarkar says. Adding to its allure, because HAWQ is based on Greenplum, it delivers not only the compliance and performance you’d expect but also delivers support of the popular and powerful open source MADLib library of best practice analytic algorithms across a variety of industries, giving users a head start to mining their data.
Pivotal’s suite also boasts Pivotal GemFire XD, which operates as an in-memory data store that offers a SQL query interface. The Pivotal GemFire XD component is optimized for rapid data ingestion and analytics, combining OLTP and OLAP while using Hadoop as the common storage layer. “These are the two components that are sort of a special in our Pivotal Hadoop distribution which are not available from elsewhere,” Bhandarkar says. “For scan based workloads—when you are scanning large amounts of data—we already have a product which is optimized for that.”
At the top of the stack stands Spring XD, an application development layer that benefits greatly from GemFire’s speed and direct writes to HDFS. This enables the development of sophisticated data-aware apps that interact with Big Data stores in real time. Moreover, this enables a virtuous cycle of integrated data ingestion, extraction, and analysis, as Bhandarkar explains. “The data,” he says, “when it gets retired from GemFire XD actually lands in HDFS, so that it can be ingested back into HAWQ as well.” As Groschupf notes, this functionality “makes [for] a really strong enterprise application.”
Looking ahead, Bhandarkar says, Pivotal’s offering will integrate packages and features such as GraphLab, Open MPI, and Apache Spark. Yet two of the most compelling differentiators—HAWQ and Pivotal GemFire, running atop native Hadoop—are available now, he tells Groschupf.
Also important to prospective customers is Pivotal’s support pedigree, which it has retained from EMC. “You can call them at 2:00 a.m.,” Groschupf says, “[and] they pick up.” Bhandarkar confirms Pivotal’s always-on support, which is essential for data-driven enterprises operating on a global scale, to which Groschupf jokes, “All right, I want to have that phone number.”
Watch the Big Data & Brews interview with Datameer’s Stefan Groschupf and Pivotal’s Milind Bhandarkar:
About the Author