Migrating a Cloud Foundry PaaS to Run on OpenStack

January 29, 2014 Nima Badiey

featured-migrating-cloudThe following is a guest blog post by Julian Fischer (hello@anynines.com, @railshoster) founder and CEO or AnyNines, a Cloud Foundry and Rails hosting service operated by Avarteq GmbH in Saarbrücken, Germany.

Cloud Foundry is well known for simplifying application portability from one CF-based PaaS to another, but how simple is it to move an entire, live, Cloud Foundry installation from one underlying IaaS to another? We asked the team at Pivotal, who recounted their experience moving the Cloud Foundry instance at run.pivotal.io from one Amazon AWS availability zone to another in 40 minutes. If Pivotal could do it in less than an hour, could we?

After running the AnyNines Cloud Foundry PaaS on vCloud, we decided to move our underlying IaaS installation over to OpenStack (see the official AnyNines OpenStack migration announcement for a more details, and also check out our AnyNines blog). This decision was motivated in part because we wanted to build a competence in the emerging synergy between Cloud Foundry and OpenStack, and to gain experience in this domain for our growing cloud hosting and consulting business.

We already had experience running OpenStack, and more recently took an active contributory role as well with the release of OpenStack Swift Cloud Foundry service giving simple access to OpenStack’s Amazon S3-like object store. Operating a Cloud Foundry PaaS layer on top of OpenStack was simply the logical choice.

Before Getting Started

Before jumping into the fray, we experimented with several migration scenarios, ultimately deciding to start with a virgin Cloud Foundry deployment on top of OpenStack (as opposed to a moving the currently deployed CF stack). Customer uptime was our primary concern and we didn’t want to affect the existing production environment, just in case we needed to revert to the old stack. Deploying Cloud Foundry is relatively straightforward and the incremental cost of running two CF deployments for a short period mitigated the risk of breaking anything in production. In addition, a second CF platform allowed us to test the migration before actually performing it on production system.

Deploying Cloud Foundry using BOSH meant we could use the same manifest as our production deployment, with only minor adjustments for the cloud plugins, network configurations and resource allocations (see the Gist below to compare the details of each manifest). With only a few lines of difference, it was a relatively straightforward exercise to deploy a new CF stack.

network:                                     network:
  cloud_properties:                           cloud_properties:
    name: "anynines"                            name: "anynines"
  ip: 5.22.x.x                              |   net_id: 12345678-9101-1121-1236-142355d67ca5
  netmask:                    |   type: manual
  gateway: 5.22.x.x                         |   label: private
  dns:                                      |   ip:
  - 109.234.x.x                             <
  - 109.234.x.x                             <

resources:                                    resources:
  persistent_disk: 100000                       persistent_disk: 100000
  cloud_properties:                             cloud_properties:
    ram: 1024                               |     instance_type: any-infr-small
    disk: 8192                                    disk: 8192
    cpu: 2                                  <

cloud:                                        cloud:
  plugin: vcloud                            |   plugin: openstack
  properties:                                   properties:
    vcds:                                   |     openstack:
      - url: https://cloud.example.com      |       auth_url: https://auth.example.com:5000/v2.0
        user: anynines                      |       username: anynines
        password: secret                    |       api_key: secret
        entities:                           |       tenant: anynines-tenant
          organization: anynines            |       default_key_name: secret-key
          virtual_datacenter: anynines-vdc  |       default_security_groups: ["anynines"]
          vapp_catalog: Bosh                |       private_key: /root/.ssh/secret-key.pem
          media_catalog: Bosh               <
          vm_metadata_key: cf-agent-env     <
          description: Bosh                 <

Preparing the New OpenStack Environment

Preparing the new OpenStack environment involved the following steps:

  1. Deploy Bosh in the new OpenStack Cloud Foundry environment.
  2. Change all configuration variables in the deployment manifest to suit the new environment. We switched from static IPs to DNS names for nats, nfs and db access. This required adjusting a few settings in the new environment.
  3. Setup a new SSL termination gateway and configure it to use the new gorouter.
  4. Running a mirrored Cloud Foundry deployment will result in two different domain names. This will require a post-deployment step to change the domain names back to the original (post migration). To avoid this additional name change, adjust the DNS system included in CF BOSH to respond to both domains (For AnyNines we used a9s.eu and a9sapp.eu). Insert both domains in the CF BOSH powerdns database as shown in this Gist. The DNS is queried from all instances deployed using CF BOSH and the new environment will be able to connect to an SSL gateway without hitting the live endpoint.
  5. Deploy a clone of the existing Cloud Foundry installation to the new environment.

Migrating Apps, Databases, Configurations, …

Once the OpenStack infrastructure is ready, focus attention on migrating key system state parameters, as follows:

  1. Transfer all persistent disks from the vCloud installation to the new OpenStack environment.
  2. Store the gorouter routing table for later comparison in the new environment.
  3. Shutdown all instances with persistent disks. This may include cc_db, uaa_db, nfs_server and all service nodes. Additionally, stop the health manager to avoid a situation whereby the Cloud Controller may try to restart all applications.
  4. Sync persistent disks (again).
  5. Start cc_db, uaa_db, nfs_server for the first step of the migration.

TIP: OpenStack requires minor adaptations to the Cloud Controller database. For faster processing of encrypted entries, use a script (AnyNines used a small Ruby script) to update all services with the new host IP. In addition, ensure all application environment variables are correctly updated.

  1. Start cloud controller, uaa and service nodes. At this point , the new environment should be ready to re-start all existing applications.
  2. Start health manager to enable app health monitoring and logic re-start all previously running applications.
  3. To validate the startup, compare the routing table of the new environment with the old parameters.
  4. Start any remaining instances.
  5. Adjust the old gateway to point to the new environment.

At this point, the new environment should be up and running. Be sure to clear up (or archive) the old environment. Don’t forget to revert any DNS settings and to remove the CF BOSH DNS hack.

Post Migration – Facts and Lessons Learned

With solid prep work and some practice (to adjust our recipe and process flows), the whole migration took less than one hour:

  1. Customer downtime was limited to only 30 minutes, all performed in a maintenance window. The downtime was required as we had to freeze the cc_db state and prevent any changes to customer apps/data during the migration.
  2. Start to finish, the migration took about 45 minutes.
  3. Several hundred live customer apps and services, with their data, were migrated within this timeframe.
  4. The startup on the new system took only 10 minutes – that’s the time it took to get all customer applications up and running.

As with any migration, the most time consuming part was the prep work involved and the investment in quality time experimenting beforehand to ensure we had a sound and repeatable recipe. We took care to perform the work in a planned maintenance window, so as not to affect customers. We also created several backup and risk mitigation plans – “just in case” scenarios – to restore any changes back to their original state in case the migration didn’t go as planned.

Ultimately we proved our original premise that Cloud Foundry was a platform robust enough to support full stack cloud migrations. This was a critical requirement for us as a business, and equally important to customers who want to protect their cloud investment with migration flexibility. Prep work minimized the risks, practice gave us confidence, and a recipe ensured a repeatable process from start to finish.

About the Author


This Month in Data Science
This Month in Data Science

As we close out the first month of 2014, we've seen a plethora of data-driven innovations and breakthroughs...

Continuous Integration: Scaling to 74,000 Builds Per Day With Travis CI & RabbitMQ
Continuous Integration: Scaling to 74,000 Builds Per Day With Travis CI & RabbitMQ

Recently surpassing a milestone of providing 74,000 builds per day for customers, hosted continuous integra...