How We Harden a Cloud Foundry Stemcell (So You Don’t Have to)

June 29, 2017 Jared Ruckle

Ah, the hardening of an operating system. It’s a time-honored ritual for IT operations teams. Take the latest bits from an OS vendor, and toughen them up to meet enterprise security standards. Operators lock down network protocols, restrict user accounts, and remove unnecessary services. The output: a golden image that’s been through the digital wringer, a worthy template for an organization’s servers.

OS hardening may not be the most glamorous task. But it minimizes risk.

Nowadays, IT pros can count on cloud providers to do much of this for them. When you create a new instance, the OS template is usually already hardened to some extent.  There may also be different OS images available, targeting different use cases.

In the case of Cloud Foundry, we go one step further. The platform embeds the operating system as a “stemcell.” The OS that’s bundled in the stemcell runs the VMs that power Cloud Foundry deployments. More broadly, a stemcell:

...is a versioned Operating System (“OS”) image wrapped with IaaS specific packaging. A typical stemcell contains a bare minimum OS skeleton with a few common utilities pre-installed, a BOSH Agent, and a few configuration files to securely configure the OS by default.

Stemcells help you embrace immutable infrastructure. Operators use stemcells to repair and repave large fleets of servers. This reduces the risk of a harmful attack.

Of course, the Pivotal team actively discusses these ideas at various meet-ups and conference venues.

One thing we've learned from these events is that this message clearly resonates with the community.  But enterprise IT teams undertaking the cloud-native journey are always curious to find out more. 

How do stemcells and OS hardening come together in the cloud-native era? We sat down with John Field, a product manager for platform security at Pivotal, to find out.

You’ve often said that traditional OS hardening is insufficient for most companies. Why?

It’s about velocity and focus. Business leaders and engineers are on the cloud-native journey. They have organized into product teams.

But many security and compliance teams operate as if they are still running standalone servers. When these product teams deploy new code, InfoSec wants to secure the environment, starting with the OS. When you’re deploying to production many times a day, you have to re-think this approach.

Here’s where the ‘focus’ part comes in. In a cloud-native platform, the OS is performing a smaller set of tasks. Its role is more specialized compared to an app server.

We optimize the security posture of the operating system for Cloud Foundry. Removing unused services is an accepted best practice. In Cloud Foundry, we can take that to the next level. I’d describe it as “less is more.” Diego cells within the platform are single-purpose. We can remove even more of the OS features than in a typical standalone server. As a result, the scope of your OS scanning can be narrowed accordingly.

The mantra of “less is more” only holds true though if you can customize for the task at hand. That means adding on capabilities like encryption, anti-virus scanning, and file integrity monitoring. Cloud Foundry users will follow the guidance of their auditor in these areas.

How is a stemcell configuration different from a standalone server?

It’s not all that different. We secure a stemcell by configuring it conservatively and reducing its attack surface. As a result, an adversary has fewer potential vulnerabilities to exploit.

For example, here are three areas we’ve focused on in the last few months.

  • File System Hardening. We’ve locked down the tmp filesystem. Bad actors often target this directory. Other problematic directories are in a separate partition with restricted access. Malware can’t easily spread.

  • Minimization of Attack Surface. We’ve removed unnecessary items like Network Information Service and rsh tools. Of course, we also disable talk, telnet, tftp and many other servers. Less is more!

  • Network Security. We modified the IPv4 network configuration to be safer. We disable unneeded protocols like SCTP, DCCP, LDAP, and RDS.

These tactics are familiar to operators. We install more restrictive configurations of the OS kernel and system services. We change default passwords. We remove unnecessary software, unnecessary usernames and logins. We remove unnecessary services. There is much, much more detail in our stemcell hardening docs.

Cloud Foundry is a distributed system. How does the community think about OS hardening? How do you help operators sleep a little better at night?

Your goal as a digital business is to run a distributed system that may need to change often. You’re focused on infrastructure management at scale, and developer productivity. On the InfoSec side, you have to move faster to be more secure. This why we advocate for cloud-native security principles.

Forward-looking organizations are now adopting security accreditation or assurance via inherited controls. This can be the “secret sauce” to going faster without sacrificing security.

For example, traditional application deployments assumed that infrastructure supplied the firewall. Yet, these appdev teams had to configure and patch the OS when deploying a new application server. With Cloud Foundry, the stemcell configuration is an inherent part of the platform. The development team “inherits” these controls from the stemcell, just by using the platform. Our goal is to make the secure thing the easy thing.

For the operator, the Cloud Foundry community delivers a base configuration. They still use a tested, rock-solid OS. Operators need to think about their configurations a little differently. Tailor your configurations for cloud-native. This means that some of the configuration scanning tests you run on a standalone server are no longer needed. And some new Cloud Foundry specific tests are needed.

And therein lies the beauty of BOSH and stemcells. With these tools, customers enjoy a faster “mean time to patch.” In the case of configuration control, the remediation to virtually any issue is to do another BOSH deploy. The same is true for vulnerabilities.

Complexity is the enemy of security. BOSH takes this notion to heart. Why? Because you can remediate both configuration issues and vulnerabilities using one approach!

Configuration is especially challenging when you’re operating at scale. You run into bespoke systems. With BOSH, you can configure your machines to be as secure as possible, no matter how large your pool of servers.

And you can do this with no downtime. You don’t need to do the configuration changes during a scheduled outage window. You’re more secure this way, because you’re updating continuously. You’re not waiting until next month.

What’s a real-world example of this automation?

The ability to rapidly apply new configurations, like with a newly patched OS, is table stakes. If you can’t do that quickly, you will struggle to mitigate critical issues when they come up.

Let’s consider a CVE, say one that affects the operating system.

When the community reports high-severity CVE, the BOSH team devotes at least one pair to bumping the OS image in the stemcell. When the new stemcells are ready, the team posts a notice on cloudfoundry.org/security. Customers subscribe to the Cloud Foundry and Pivotal security RSS feeds to stay on top of things.

Once the fixes for Pivotal Cloud Foundry are ready to go, we post mitigation steps, including new tile versions, at pivotal.io/security. We post new stemcells on Pivotal Network. Customers download the new stemcell. From there they kick-off a new Pivotal Cloud Foundry deployment based on the new bits.

The deployment is usually done within a few hours, without downtime.

With some configuration management tools, you can probably get patches online fairly quickly. But you’re very likely to take some downtime, disrupting the business. And you probably won’t be able to patch your Windows systems as fast as your Linux systems.

If you still use ITIL and traditional infrastructure management, you are going to be vulnerable for weeks while you update systems.

Speaking of Windows, the tooling for Windows stemcells was just open-sourced. What kind of experience can the Cloud Foundry community expect?

The developer and operator experiences on Windows are identical to Linux. cf push doesn’t change. Rolling updates without downtime “just work” with the Windows stemcells. The BOSH experience is the same. The platform abstracts these differences away so you don’t need to care or think about it.

Want to learn more about security at Pivotal? Check out pivotal.io/security and the cloud-native security topic page. Review the stemcell hardening specifications in more depth. Peruse all of Pivotal’s security-related content.

About the Author

Jared Ruckle

Jared works in product marketing at VMware.

Follow on Twitter Follow on Linkedin More Content by Jared Ruckle
Previous
Listen to the Crowd
Listen to the Crowd

Next
Set it and Forget it
Set it and Forget it

When it comes to scaling applications, let the platform do the heavy lifting.