Greenplum Database Sandbox Now Available On Amazon

May 3, 2016 Scott Kahler


Greenplum sandboxHot off the software release stack is Greeenplum Database with an upgraded Apache Madlib (Incubating) version 1.9. Until now, we have been packaging up a sandbox version in both VMware and Virtual Box VM formats. This time around, we are happy to announce an EC2 AMI.

While personal computers are getting larger and more powerful, it can still be rough to find enough resources to run a VM on a local machine, test out a big data platform, and churn through some analytics while running all the apps needed to get you through the day. Our engineers and architects have been making use of Amazon for awhile now, so we thought it was time to support an AMI. All of the instructions and labs available in the sandbox VM still apply to the AMI. This is especially handy for testing the new S3 external table integration made available in 4.3.8.

If this sounds great to you and you are ready to dig in the AMI, it is replicated to the following three zones and others will be added as we see demand.

When choosing an instance size to run the sandbox, most of the large or xlarge instances make 8GB of RAM available. This is recommended as a bare minimum. The sandbox is configured with 2 segment instances, and, should you make use of a large instance size, you may optionally want to utilize the gpexpand feature to add more segments to the system. This also allows for additional parallel processing.

The default user for the EC2 instance is gpadmin. In order to log into the instance, generate a new key or use your existing one to log in as gpadmin. That login string will look something like this:

ssh -i ~/my-key.pem

At that point, you will be logged on to the system and will need to execute the command to bring up all of the services.

In order to make full use of the AMI and do all of the things listed in the tutorials, you will want to configure the instance’s security group and allow access to the following ports.

Tool Port
SSH 22
Database Connection 5432
Greenplum Command Center 28080
Apache Zeppelin 8080

It should also be noted that the data files for the master and segments will be located in /gpdata, and the default AMI spins up with it’s disk sized at 8GB. If you plan to work with larger datasets, we recommend that you mount an EBS volume, move the data to it and create a symlink at the old location to point to the new one. If the instance that is being used provides ephemeral disk, this could also be used for data. Ephemeral disks usually has faster access speeds, but risks the data being lost, since the ephemeral devices will go back to the EC2 pool if the system goes down.

We are excited to get this new version rolled out and look forward to feedback on any additions you would like to see or other items to help you get started.

Learning More:


About the Author


3 Biggest Questions Companies Have Before Starting To Tackle Apache Hadoop
3 Biggest Questions Companies Have Before Starting To Tackle Apache Hadoop

After attending the Pivotal Big Data Roadshow in Atlanta, Pivotal’s Stacey Schneider validates that it is s...

How Do I Migrate Applications to Pivotal Cloud Foundry?
How Do I Migrate Applications to Pivotal Cloud Foundry?

I spend a lot of time talking with customers that are bringing existing applications into Pivotal Cloud Fou...