Hot off the software release stack is Greeenplum Database 4.3.8.1 with an upgraded Apache Madlib (Incubating) version 1.9. Until now, we have been packaging up a sandbox version in both VMware and Virtual Box VM formats. This time around, we are happy to announce an EC2 AMI.
While personal computers are getting larger and more powerful, it can still be rough to find enough resources to run a VM on a local machine, test out a big data platform, and churn through some analytics while running all the apps needed to get you through the day. Our engineers and architects have been making use of Amazon for awhile now, so we thought it was time to support an AMI. All of the instructions and labs available in the sandbox VM still apply to the AMI. This is especially handy for testing the new S3 external table integration made available in 4.3.8.
If this sounds great to you and you are ready to dig in the AMI, it is replicated to the following three zones and others will be added as we see demand.
- us-east ami-274a524d
- us-west-1 ami-5c3b453c
- us-west-2 ami-062cdd66
When choosing an instance size to run the sandbox, most of the large or xlarge instances make 8GB of RAM available. This is recommended as a bare minimum. The sandbox is configured with 2 segment instances, and, should you make use of a large instance size, you may optionally want to utilize the gpexpand feature to add more segments to the system. This also allows for additional parallel processing.
The default user for the EC2 instance is gpadmin. In order to log into the instance, generate a new key or use your existing one to log in as gpadmin. That login string will look something like this:
ssh -i ~/my-key.pem gpadmin@ec2-11-22-123-234.compute-1.amazonaws.com
At that point, you will be logged on to the system and will need to execute the start_all.sh command to bring up all of the services.
In order to make full use of the AMI and do all of the things listed in the tutorials, you will want to configure the instance’s security group and allow access to the following ports.
Tool | Port |
SSH | 22 |
Database Connection | 5432 |
Greenplum Command Center | 28080 |
Apache Zeppelin | 8080 |
It should also be noted that the data files for the master and segments will be located in /gpdata, and the default AMI spins up with it’s disk sized at 8GB. If you plan to work with larger datasets, we recommend that you mount an EBS volume, move the data to it and create a symlink at the old location to point to the new one. If the instance that is being used provides ephemeral disk, this could also be used for data. Ephemeral disks usually has faster access speeds, but risks the data being lost, since the ephemeral devices will go back to the EC2 pool if the system goes down.
We are excited to get this new version rolled out and look forward to feedback on any additions you would like to see or other items to help you get started.
Learning More:
- Sandbox Tutorials Guide
- Sandbox AMI video
- Read the Greenplum Database Administrator Guide
- Download the Greenplum Database Best Practices Guide
- Find out more from the Greenplum Database product page or read more blog articles
- Watch the latest Greenplum Database videos
About the Author