Do you want to to deploy a full Hadoop Cluster on Google Compute Engine in under 3 minutes? Would you like to achieve rolling updates to this cluster while preserving your data? Learn how Cloud Foundry BOSH on Google Compute Engine can do this and more.
Introduction to Cloud Foundry BOSH
Cloud Foundry BOSH is an open source release engineering, packaging, deployment and lifecycle management tool that simplifies working with distributed software.
CF-BOSH installs and updates software packages on large numbers of VMs over many IaaS providers with the absolute minimum number of configuration changes. CF-BOSH orchestrates initial deployments and ongoing updates that are:
- Declarative: The CF-BOSH deployment manifest defines exactly what resources are needed by the deployment, the network topology of the deployment, and the software components and properties to be deployed.
- Predictable: CF-BOSH compiles the source code of your packages in an isolated (even internet-less), sterile environment. When CF-BOSH completes a new deployment or update, the virtual machines deployed contains only the exact software specified in the release (a collection of configuration files, job definitions, source code, package definitions and accompanying information needed to make a software component deployable by CF-BOSH).
- Repeatable: Every time you repeat a deployment, the result is the exact same deployed system, avoiding ‘Snowflake servers’. Release versions allows you to deploy an specific version (no matter how old it is) of the release and be sure that it will deploy the exact version of each software component.
- Self-healing: CF-BOSH monitors the health of processes running on the virtual machines it deploys and compares the results with the ideal state of the system as described in the deployment manifest. If CF-BOSH detects a failed job or a non-responsive VM, CF-BOSH can automatically restart the processes and/or recreate the job on a new VM if necessary.
- Infrastructure-agnostic: CF-BOSH works on multiple IaaS offerings including AWS, OpenStack, vSphere, vCHS and CloudStack, freeing operators from concerns about differences between environments.
- Canary-style updates: When many virtual machines are going to be upgraded with a new software package, CF-BOSH will first try to upgrade a small number of them, the “canaries”, and only if that is successful will the remaining nodes in the cluster be upgraded.
This approach is what gives working with CF-BOSH a high degree of efficiency, as the following typical scenarios illustrate:
- You switch a deployment between clouds by editing a few lines in the manifest and choosing a new stemcell (image template that includes a minimum OS with a CF-BOSH agent), while keeping the same release.
- You scale up your application by editing a single line in the manifest, specifically the number of instances required, and redeploying with a single command.
- You update or roll back your application by changing just the release version and redeploying with the same stemcell and manifest.
- You provide configuration information to authorities to satisfy compliance requirements. The release is complete and specific about details such as which version of each package is installed on each VM.
- You resize your persistent disks just changing the size of the disk in the manifest file. CF-BOSH will stop any process inside the vm, attach a new persistent disk to the VM, copy the data from the old disk to the new disk, and will start the process again without loosing any data.
- When VMs fail, BOSH “resurrects” them on new VMs. BOSH discovers VM failures by comparing the actual state of the deployment with the intended state as described in the manifest.
Google Compute Engine CF-BOSH CPI
Cloud independence and multi-cloud support are important beliefs for the Cloud Foundry team and our philosophy at Pivotal. We have seen lately an increasing interest from our customers to run their workloads in Google Cloud Platform. So today we are very proud to release a Google Compute Engine CF-BOSH CPI (the interface that exposes basic CF-BOSH primitives allowing the interaction with a specific IaaS API, in this case Google Compute Engine) to our open source community.
We believe that the incredibly fast instance creation and superior network and disk performance of Google Compute Engine, combined with the CF-BOSH features to manage and orchestrate a full and complex deployment is a perfect match for those customers wanting to create rapid, reliable, and on demand clusters of applications and/or services.
We ran some tests for a prospect customer using an alpha version of the Google Compute Engine CF-BOSH CPI and we were able to deploy an on-demand full Hadoop cluster in just 3 minutes:
The Google Compute Engine CF-BOSH CPI includes the following capabilities:
- Booting new VMs in parallel at a fast rate (only capped by your project API rate limit).
- Setting instance scheduling options, specifically the maintenance behaviour and automatic restart.
- Defining independently what machine type (resources in terms of virtual CPUs and memory) should be assigned to each VM that conforms your deployment.
- Choosing the region where your deployment will run, giving you control over where your data is stored.
- Multi-Zone deployments, so you can distribute your nodes across different zones to increase availability.
- Attaching and detaching any size persistent disk to/from instances and taking snapshots of them on-demand.
- Defining the network to use for each instance, and if they require an ephemeral IP or if they must have a reserved static IP assigned.
- Assigning tags to your instances or a group of instances that may share a set of firewall rules or routing tables.
- Enabling protocol forwarding on your instances, so you can create a VPN from your local network to your CF-BOSH deployed VMs.
- Defining a group of instances that should receive incoming traffic and distribute the traffic across the instances by adding them to a Target Pool.
- Uses an exponential backoff algorithm when waiting for resources/operations to not overload your Google Compute Engine API rate limit.
All of this work is possible thanks to a joint effort between Google and Pivotal to add support for nearly all of the Google Compute Engine APIs to the fog Ruby gem. Supporting GCE in fog was very easy thanks to the excellent Google Compute Engine API documentation.
Building a new CF-BOSH CPI is not an easy task. It requires modifying several CF-BOSH components (see this example). But the good news is that the CF-BOSH team has started working on what we call the “External CPI” epic, several features that will simplify how the community can easily add another IaaS of their choice to CF-BOSH.
As the external CPI effort is still under development, we have not included the Google Compute Engine CPI into the CF-BOSH master codebase, so in order to try it you will need to clone the github repository and create the Ruby gems from source. To simplify the task for those users who want to start right now :), we have created a tutorial and a base Google Compute Engine image to start with. So check out the tutorial to start deploying CF-BOSH on Google Compute Engine.
And as a real example of deploying a complex distributed system, we have published another tutorial that explains how to deploy Cloud Foundry, the leading Platform as a Service, into Google Compute Engine. We have adapted the tutorial so anyone with a basic quota can quickly and easily deploy a private Cloud Foundry environment.
If you’d like to learn more about CF-BOSH, see the CF-BOSH documentation website and/or check the source code at our github repository. For any specific questions about CF-BOSH or the Google Compute Engine CF-BOSH CPI, use the CF-BOSH Users Google Group. And don’t forget to explore the cloudfoundry-community github repository to discover what community contributed CF-BOSH releases are ready to deploy on Google Compute Engine!
About the Author