During a talk this month at the Cloud Foundry Summit 2014, Pivotal’s Engineering Manager Onsi Fakhouri unveiled Diego, a ground-up rewrite of a major component of the Cloud Foundry Elastic Runtime. The Cloud Foundry Elastic Runtime is an environment for running and managing web applications written in most modern frameworks or languages.
During his talk, Fakhouri explained the motivating factors behind this major rewrite of a core Cloud Foundry component, and the philosophy driving the development of that rewrite, which the team has dubbed Diego.
Diego is a major rethinking of the Droplet Execution Agent (DEA) within the Cloud Foundry Elastic Runtime. The primary functions of the DEA are to stage apps, run them in Warden containers, and manage their lifecycle by starting and stopping apps upon request of the Cloud Controller component.
“You can look at Cloud Foundry (CF) as a black box,” Fakhouri said to the attendees. “If you are a developer using CF to push an app, what the black box will do for you is take your app, stage it, run ‘n’ instances, and keep them running, and then create a route to the app so that your users can connect your app to the Internet with their browsers.”
“If you pull back even further and look at how the sausage is made,” he continued, “you find the DEA pool. These are the Droplet Execution Agents that actually run your app. One of the components in the DEA pool is the DEA itself — it’s responsible for staging and running your apps.”
“You also have Warden, which is responsible for containerizing your apps,” he said. “This is what keeps one app separate from one another.” Fakhouri then pointed to the Health Manager component, which “makes sure that what is desired by the Cloud Controller is actually running on the DEA’s.”
Why rewrite the Cloud Foundry DEA?
Despite these safeguards, Fakhouri noted, if an app goes missing it continues to be the joint responsibility of three different components to bring it back online. As a result of this and other limitations, he explained, it is “hard to add new features, [and] hard to maintain existing features.”
“Why is this?” he asked. Fakhouri identified a number of problems in the current Elastic Runtime model:
- Tight coupling
- Poor separation of concerns
- Creating “triangular dependencies” which are problematic as they create complex interactions which are difficult to test and hard to reason through
In addition, the current Elastic Runtime is tailored to a specific domain: apps. This makes it hard to extend to new domains, or perform functions such as cron-like jobs. Moreover, the current Elastic Runtime is tightly coupled to the Linux platform, which makes supporting new platforms (such as Windows) difficult. Finally, the current codebase is written in ruby, which is showing signs of strain in a context with “tons of concurrency” and many low-level OS interactions. As a result, it has been challenging for developers to add new features or maintain existing features.
Fakhouri then introduced Diego, the CF “Elastic Runtime 2.0,” as the solution to many of these challenges. Diego is written in Golang, a platform-agnostic open source programming language that “makes it easy to build simple, reliable, and efficient software,” in the words of its project leaders. The language supports a wide range of operating systems and processor architectures, and has been deployed in multiple Google production environments.
Fakhouri laid out the advantages afforded by this significant rewrite:
- Flexibility—by breaking away from the domain-specific notion of apps, Diego will allow new features like cron-like tasks to be added easily
- Platform independence—Diego is built from the ground-up to be platform agnostic.Supporting a new platform entails implementing just two components.
He stated that Diego not only addresses the limitations posed by the existing Cloud Foundry Elastic Runtime model, but also sets the stage for building increasingly complex interactions between apps and data services running on the PaaS. Fakhouri acknowledged that “complexity interactions are hard to test or reason through,” but emphasized that “complexity in a distributed system of this scope is real and necessary. Diego embraces this and tries to make its complexity explicit, transparent, and easier to reason about.”
Further confirming support for this new approach, Fakhouri noted that Diego has been embraced by Pivotal’s engineers, open source Cloud Foundry contributors, as well as developers at IBM and SAP.
- See this slide deck and others from CF Summit
- WFakhouri laid out the advantages afforded by this significant rewrite:
- Watch this video and others from CF Summit
- Learn more about Cloud Foundry and Pivotal CF
About the Author