As many companies begin planning their roll-out of Pivotal Cloud Foundry, there is one question that regularly comes up—do we use more, small VMs or fewer large VMs?
The benefits of reducing the number of VMs required for running Pivotal Cloud Foundry (PCF) comes with the drawbacks of reduced control, higher monetary cost and less flexibility. Overall, the bundled, default configuration provides customers with ideal settings for running a scalable production configuration of PCF—without any real drawbacks.
The PCF install bundle (commonly referred to as the Elastic Runtime tile) uses a pre-determined configuration of VMs to host components like Cloud Controller, Router, NATS (Message Bus), Application Execution (DEA), MySQL, Log Aggregator, Health Manager, etc. The Ops Manager uses this configuration to instantiate and manage the various VMs while allowing individual components to scale. An often repeated issue with customers is the large number of VMs required to run Cloud Foundry. Customers often ask if it is okay to reduce the footprint by consolidating the VMs.
Managing PCF VMs is Different than VM Management of the Past
In addition to the key components of PCF, there are additional components that could be required. These optional components could include Collector and HA Proxy, as well as one-time, short lived VMs like Compilation and Errand VMs that are used to push console, auto-scale applications, and more. Together, this can result in the appearance of a large number of VMs. This can leave the impression that installing and managing PCF is a complex task, and that perhaps things could be simpler with a fewer number of VMs. The image below shows the components and associated VMs necessary for a standard PCF install.
Traditional IT Operations and even developers tend to look at the architecture from the same perspective as they have for years. This is based on the assumption that end users (i.e. developers) or administrators have to individually manage each VM . But, it is the underlying BOSH layer that manages the creation and maintenance of the VMs throughout the application’s lifecycle of running on PCF. BOSH, together with Ops Manager, ensures the VMs are created on the underlying cloud infrastructure, and have the right resource capacities (CPU/Memory/Disk), networking, persistence, upkeep, execution of related jobs and patching. In the past, larger number of VMs meant additional maintenance burden, this is not so with PCF.
VM Consolidation Financial Analysis: Run More Smaller or Fewer Bigger VMs?
Most of the smaller PCF components require a single virtual CPU with 1GB of RAM. While others, like the DEA, Cloud Controller, and Router are heavier. If you were to consolidate the smaller components into bigger VMs, those VMs would need to be bigger, and in many environments would actually result in higher costs.
When we look at the underlying financials, there is a higher cost associated with larger VMs (i.e. a greater number of CPUs and/or memory) compared to smaller VMs on hosted cloud vendors. For example, as of publishing time, the hourly cost of various AWS EC2 instance types would look like this:
Certainly, prices change, and it is important to refer to AWS EC2 instance pricing for the most current information. But as the table shows consolidating the VMs can result in two or higher multiples of the cost versus of the default setting of a large number of smaller sized VMs. Importantly, this cost only grows as the system runs.
Here are some sample costs for running a standard PCF implementation on AWS EC2:
Alternately, here are the costs associated with a consolidated configuration of PCF.
As you can see, the hourly costs increase with the consolidated configuration. You may feel like you are getting a better deal by running more of the PCF components on larger VMs, but the reality is that the costs for the larger VMs will end up costing more. Why spend $2M when you can spend $1M to meet the same need?
Additional Management and Configuration Considerations
In addition to cost considerations, there are several other operational requirements and constraints that need to be taken into account should you choose to use fewer, larger VMs. These additional considerations include:
- Ensuring there are no conflicts when different components are put together on the same VM (e.g. overlapping listening ports and conflicting resource requirements).
- Managing the dependency order between components (e.g. NATS has to go before other components, UAADB before UAA, NFS or shared blobstore ahead of Cloud Controller, etc).
- Losing finely grained control over the components:
- If a particular component fails within a VM, the BOSH Agent would just mark the entire VM as failing (as it currently does not give weighting or insight into the failing job amongst a bunch of jobs) and attempt to recreate.
- Putting multiple components on the same VM means the components cannot be scaled individually, even if only one component needs to be scaled up. Scaling groups can significantly lower efficiencies—a big deal in a 1000-VM or 10,000-VM environment.
- The inapplicability of reducing the number of IP addresses through consolidation when most PCF installs use private internal addresses and a NAT instance in front.
The default configuration for a PCF installation is designed to run on VMs as efficiently as possible. The orchestration and automation capabilities provided by the platform, provide administrators an easy to way to manage all of the components of the platform. Deviating from the ideal configuration requires additional overhead for administrators, as well as potentially additional costs for the underlying infrastructure.
Learn More:
- Pivotal Cloud Foundry Product Information
- Read other Pivotal Cloud Foundry Blog Posts
- Listen to Pivotal Podcasts
About the Author