At Pivotal, we work with many customers who are looking to adopt an agile approach to how they develop and deploy applications. For applications, this is done by utilizing continuous integration with products like Pivotal Cloud Foundry and methodologies of extreme programming and test-driven development. A common request is, “How can we apply these types of methodologies for data when many of our users are still struggling to just get access to data that they need to analyze? The time it takes to get data loaded just to determine if there is any value in the data presents a problem towards a ‘fail fast’ model of development (i.e., nothing useful in this data, let’s throw this dataset out and try a new one).” These customers and their data scientists really need a way to get data into a database like Pivotal Greenplum quickly so they can iterate on the data, perform the necessary analysis, and provide data “rules” for database designers and developers to incorporate this data into modeled data warehouses.
The solution presented in this paper utilizes Greenplum on Amazon Web Services (AWS)—along with several AWS components—to automate a “no-IT touch” mechanism to upload data to AWS S3 and be able to quickly access this data using standard reporting and SQL-based tools from Greenplum.
The following components are highlighted in this paper:
AWS S3 notifications and Lambda handler
AWS SES for email notifications
Greenplum on AWS
Greenplum S3 connector
About The Author
Louis Mugnano is an Advisory Data Engineer within Pivotal's Data Engineering organization. He has worked with Pivotal customers in both a pre-sale and professional services capacity for the last 5 years. As a Pivotal Data Engineer, he helps customers turn their data dreams into data realities. Working hands-on, he strives to help organizations achieve their business-changing goals. Prior to working at Pivotal, Louis had worked 20+ years at several large enterprises and across multiple sectors. In those roles, he has been a lead architect in many large-scale data projects and enjoys sharing those experiences with customers he gets to team with on their data journey.