Greenplum Can Now Do Hybrid Queries Across On-Premises & AWS S3 Data

April 20, 2016 Jeff Kelly

Greenplum adds hybrid queries for AWS S3 One of the biggest challenges in modern computing is connecting data stored across the web with analytical data engines, often deployed on-premises, in near real-time. Pivotal Greenplum just made this challenge a lot easier by introducing Amazon Web Services Simple Storage Service (AWS S3) external table support.

To put a finer point on the challenge, modern applications running in the public cloud generate huge volumes of incredibly valuable data. So, you’re going to want to bring all that data into your data warehouse environment for analytics to, for example, better understand customer behavior and identify new revenue opportunities. But, if your data warehouse is sitting in your own data center, that means you’ve got a time- and resource-intensive data movement project on your hands. And, with the speed of business what it is today, any lag time between data creation and data analysis can mean missed opportunities.

So, why not just run all your data warehouse workloads in the cloud? There is a challenge here as well. You probably have an effective analytic environment running on huge volumes of data in your private datacenter. Again, you’re not looking for a data movement project, you’re looking to do your analytics on all your data regardless of its location.

Two Key Scenarios For Analytical Workloads In Hybrid Cloud Scenarios

With the latest Pivotal Greenplum update, on-premises Greenplum deployments can now query data natively stored in AWS S3 as external tables. This capability eliminates the need to move data from AWS into your on-premises Pivotal Greenplum environment for near-real-time analysis. You can, for example, use Amazon Elastic MapReduce (EMR) for large-scale ETL and data preparation, then easily query the data stored in S3 directly with Pivotal Greenplum running in your own data center. The only data movement are the results of the queries, which are returned to your on-premises Greenplum environment.

For those of you offloading data generated by on-premises transactional and operational applications to S3 for cost-effective storage at scale (a.k.a. a data lake), S3 external table support immediately makes that data available for analysis in Pivotal Greenplum as well.

A Data Warehouse For The Hybrid Cloud

In today’s world, it is unrealistic to think data will live only on-premises or only in the cloud. The reality is clear. Some data will be stored in the cloud, some data on-premises. Your data warehouse and analytical databases need to be able to query data wherever it lives. Pivotal Greenplum’s external table support for S3 is just the first step. Over time, Pivotal is going to make analyzing data in hybrid environments easier, faster and more effective. In future releases, expect to see support for more data formats and additional public cloud providers, as well as the addition of write capabilities.

To learn more about the latest update, Pivotal Greenplum 4.3.8, check out our Greenplum Chat video with Product Manager Ivan Novick, see release notes at gpdb.docs.pivotal.io, or download the sandbox at greenplum.org. Finally, check out this TDWI e-book on shaping the future of data warehousing with open source software. It’s a good read on the benefits of open source data warehousing over proprietary approaches.

About the Author

Jeff Kelly

Jeff Kelly is a Director of Partner Marketing at Pivotal Software. Prior to joining Pivotal, Jeff was the lead industry analyst covering Big Data analytics at Wikibon. Before that, Jeff covered enterprise software as a reporter and editor at TechTarget. He received his B.A. in American studies from Providence College and his M.A. in journalism from Northeastern University.

Follow on Twitter Follow on Linkedin
Previous
Tech How To: Using Pivotal Greenplum Hybrid Queries on Amazon S3 Data
Tech How To: Using Pivotal Greenplum Hybrid Queries on Amazon S3 Data

In this post, Pivotal data engineers Amey Banarse and Qi Shao follow the recent announcement of Pivotal Gre...

Next
Yahoo! JAPAN To Build Largest Open Source-Based Cloud Platform
Yahoo! JAPAN To Build Largest Open Source-Based Cloud Platform

Pivotal® and Yahoo! JAPAN Corporation, the most visited website in Japan and the 15th-most visited site glo...