How Do I Migrate Applications to Pivotal Cloud Foundry?

June 5, 2015 Josh Kruck

sfeatured-pcfI spend a lot of time talking with customers that are bringing existing applications into Pivotal Cloud Foundry. This subject is difficult and brings words like “monolith”, “legacy” and “refactor” to the table. I declare that your legacy, monolith application, which is perhaps in need of refactoring, can stand in front of a mirror before its date with Pivotal Cloud Foundry and call itself attractive. Monoliths are applications, too.

So, where do you start when considering an application migration? A few questions that come to mind are:

  • Have you triaged the applications according to business value?
  • Over the next three years which applications will be End of Life’d (EOL) or replaced?
  • What applications need to continue running for the next six years, but ideally at a lower cost?

The ideal applications are those that are too important to EOL, and you are not prepared to completely rewrite. The remaining applications don’t make sense for migration. If it’s dying, let it. If it’s going to be rebuilt, build it according best practices.

I sort candidate applications into two camps, in order to avoid “analysis paralysis” and start work as quickly as possible—direct migration and refactor. Tell me more about your application.

  • Does it use an unusual directory layout
  • Do you have a homegrown shim/orm for connecting to data?
  • What about 12 Factor—how many factors are missing? Which ones?
  • What about the dependencies?

Direct Migration of Cloud-Native Applications

This refers to those applications that are cloud-native to begin with – applications that are stateless and can scale horizontally. When we perform a direct migration we bring the whole application, in its current structure to Pivotal Cloud Foundry. I advocate this approach when an application has a solid foundation, and easy-to-solve issues.

Changes will typically include logging to stdout and stderr, consuming VCAP_SERVICES and unusual file system layouts, or other things buildpacks are not expecting. My preference is to conform to a buildpack rather than create new ones. I constantly weigh the complexity of changing the application against the cost associated with maintaining a new buildpack. If I decide to change a buildpack then I try leave a backlog in place detailing the migration path to a standard buildpack, as this lowers the maintenance cost over the long term. Changing a buildpack creates a snowflake, one-offs and snowflakes create complexity in the long-term.

The final step of the direct migration is usually addressing the sticky sessions. Most applications I come across rely on connections being relayed to the same appserver. While not ideal, Cloud Foundry enables sticky sessions when a JSESSIONID is set. If the application is Java a transparent session replication can be enabled, which can provide better failure scenarios than the application had originally. At a minimum, the application now has the same behavior it had before Pivotal Cloud Foundry, but with many of the operational benefits the platform provides.

Refactoring Legacy Applications To Be Cloud-Native

For some applications direct migration is not an option. The application may write to a file system, require a ton of RAM, or has a huge artifact size. Whatever the reason, the application may require refactoring to be successful running on Pivotal Cloud Foundry.

To refactor an application we first need to decompose the application, and isolate concerns. While this may seem intimidating, it’s not as difficult as it sounds. My go-to reference for refactoring is Working Effectively With Legacy Code, by Michael Feathers.

Step One of Refactoring

I start the process by working with the team on a whiteboard. First we draw an Activity Diagram, this builds a shared understanding of what the application does. Once we’re armed with the understanding and ubiquitous language we can then draw a Component Diagram. The Component Diagram maps what the application does to how it does it, and then we begin to find bounded contexts. There are no points for UML correctness. We’re not creating design artifacts, we’re fostering discussion, as well as a shared understanding.

Now that we understand what the application does and how it does it, we should see the data path as well as the transformations within the application. This gives us a good indication of problem areas. Our next step is to evaluate the components for 12 factor compliance, and choose which component(s) we want to tackle. I use three markers (red, yellow, green) and have the team rate each component on the whiteboard, rather than an exhaustive evaluation. Gut feel is good enough, and allows us to get into the code rather than speculate in a conference room.

We’ve decomposed the application into components and color coded them. Here are some questions I ask about those components:

  • Can we break out the green components into standalone applications, and address separately?
  • Do we see any clusters to isolate? Is there a group of components that we should break out together?
  • Do we see the need for any new services (look for red clusters)?

I rely heavily on integration tests, refactoring tools and interfaces. Lets pick a green box and go for it!

Smaller is better, so break out components into individual applications, rather than many components into one new large application.

Step Two of Refactoring

Step two, make sure we have a test that runs through the code we want to break out. You don’t have complete coverage? That’s okay. All we need are tests that execute the code paths we are breaking apart. I write these tests by calling an application interface (CLI) or by directly invoking a set of API’s, whatever is easiest at the time. I’ve implemented these many ways in the past: RSpec, CPPUnit, JUnit to robot and shell scripts. My criteria is simple – repeatable, easy, and fast. In that order.

The tests are in place, now we can begin to break apart our legacy code. Our strategy is implementing the proxy pattern. We’ll create an interface, then insert a proxy which allows us to move a portion of legacy code into Cloud Foundry. IDE “Extract Interface” functions are a great safety net, as well as a huge timesaver for this task.

Take Advantage of Your Tools.

Lets create an interface as a standalone artifact (.jar, .gem, etc), and then acquire it with your dependency manager. Then create an implementation of the interface that calls your original code, and run the tests again. If your test passed then you make the implementation of the interface (i.e. your original code) assert. If this didn’t cause a test failure, you must improve your tests before proceeding, once the assert causes a failure, remove it and continue on.

Step Three of Refactoring

Passing tests allow us to move to the next step. Create another implementation of the interface to call an HTTP endpoint, instead of calling the existing code. Setup an environment variable that points to the URL for the new endpoint, and inject that into the application (i.e. MY_NEW_SVC=http://localhost:8080/svc). Do this step even though the new endpoint is running in our original application container.

Next, implement a controller that listens at the location. This controller calls your first implementation of the interface (i.e. your original code). I maintain a one-to-one mapping of HTTP endpoints to public methods. I don’t sweat state, the only thing I’m doing is making the method use an HTTP proxy. Then run additional tests, and when they pass extract along the HTTP boundaries we created.

Additional refactoring can be done in the future, tackle one small thing at a time and prioritize the work.

Now that we have achieved separation, take the code that implements the new HTTP endpoint and create a new project. Compile this project into a new artifact that advertises the HTTP endpoints and push it to Pivotal Cloud Foundry. Do not leave the original implementation in the legacy code, as we want the compiler to fail if we’ve missed calls that should go to our new application!

Run the integration tests again, while specifying the application URL provided by Pivotal Cloud Foundry in the environment variable. Did the tests pass? Do you see logs in Pivotal Cloud Foundry? If so, then you have successfully refactored a portion of the legacy code base into Pivotal Cloud Foundry.

Iteration, Or What Do We Do With the Remainder Of The Application?

What about the remaining red and yellow and green boxes from our white-boarding session? If we repeat this exercise enough times we end up with a cluster of red and yellow. Now what? First, repeat the process at a finer level of detail for each component. Can we break individual components into even smaller pieces that can stand on their own?

Problems become manageable when they are small, so our goal should be to make our problem surface as small as possible.

You can continue to run the the remainder of the legacy application as it is today and connect to it as a Service when necessary. In the future, we will use additional techniques to extract the individual problems in the remaining cluster.

Not All Applications Are Ideal

Not all applications should be candidates for migration to Pivotal Cloud Foundry. Does your application require extremely low latency that it bends over backwards to achieve, like long lived connections with data deduplication and wire speed compression? Pivotal Cloud Foundry will introduce additional hops, can your application tolerate them? WebSockets may be a solution, but use caution, is the device on the other end capable of supporting them? Does your application make use of specialized hardware, or smell like something that runs on a supercomputer? If any of these scenarios sound familiar, then these applications are likely not a great fit to run on the platform.

This represents an iterative approach for breaking apart legacy applications, and migrating them to Pivotal Cloud Foundry in a safe, sane and predictable manner. Migration and refactoring of applications historically represented a large-scale, multi-year effort. When you use a platform and a tight feedback loop you can tackle the problem in much more manageable pieces – demonstrating success quickly and recognize value immediately.

Learn More:

About the Author


Greenplum Database Sandbox Now Available On Amazon
Greenplum Database Sandbox Now Available On Amazon

The new Greenplum Database sandbox has been released as an Amazon EC2 AMI in addition the VMware an...

Use Balanced Teams to Suck Less at Software
Use Balanced Teams to Suck Less at Software