One of the biggest challenges in modern computing is connecting data stored across the web with analytical data engines, often deployed on-premises, in near real-time. Pivotal Greenplum just made this challenge a lot easier by introducing Amazon Web Services Simple Storage Service (AWS S3) external table support.
To put a finer point on the challenge, modern applications running in the public cloud generate huge volumes of incredibly valuable data. So, you’re going to want to bring all that data into your data warehouse environment for analytics to, for example, better understand customer behavior and identify new revenue opportunities. But, if your data warehouse is sitting in your own data center, that means you’ve got a time- and resource-intensive data movement project on your hands. And, with the speed of business what it is today, any lag time between data creation and data analysis can mean missed opportunities.
So, why not just run all your data warehouse workloads in the cloud? There is a challenge here as well. You probably have an effective analytic environment running on huge volumes of data in your private datacenter. Again, you’re not looking for a data movement project, you’re looking to do your analytics on all your data regardless of its location.
Two Key Scenarios For Analytical Workloads In Hybrid Cloud Scenarios
With the latest Pivotal Greenplum update, on-premises Greenplum deployments can now query data natively stored in AWS S3 as external tables. This capability eliminates the need to move data from AWS into your on-premises Pivotal Greenplum environment for near-real-time analysis. You can, for example, use Amazon Elastic MapReduce (EMR) for large-scale ETL and data preparation, then easily query the data stored in S3 directly with Pivotal Greenplum running in your own data center. The only data movement are the results of the queries, which are returned to your on-premises Greenplum environment.
For those of you offloading data generated by on-premises transactional and operational applications to S3 for cost-effective storage at scale (a.k.a. a data lake), S3 external table support immediately makes that data available for analysis in Pivotal Greenplum as well.
A Data Warehouse For The Hybrid Cloud
In today’s world, it is unrealistic to think data will live only on-premises or only in the cloud. The reality is clear. Some data will be stored in the cloud, some data on-premises. Your data warehouse and analytical databases need to be able to query data wherever it lives. Pivotal Greenplum’s external table support for S3 is just the first step. Over time, Pivotal is going to make analyzing data in hybrid environments easier, faster and more effective. In future releases, expect to see support for more data formats and additional public cloud providers, as well as the addition of write capabilities.
To learn more about the latest update, Pivotal Greenplum 4.3.8, check out our Greenplum Chat video with Product Manager Ivan Novick, see release notes at gpdb.docs.pivotal.io, or download the sandbox at greenplum.org. Finally, check out this TDWI e-book on shaping the future of data warehousing with open source software. It’s a good read on the benefits of open source data warehousing over proprietary approaches.
About the AuthorFollow on Twitter Follow on Linkedin