In the days following the public announcement of the CVE involving containers and runC (CVE-2019-5736), there were a lot of interesting discussions within Pivotal. Our team of talented engineers discussed our own processes for patching and releasing software. While reading through the threads, I happened upon a message that piqued my interest:
This comment inspired me to dig deeper into how we achieved this remarkable result.
At Pivotal, we love to talk about continuous integration (CI). In a nutshell, CI is the automation of software builds and unit testing as commits get added to a code repository. When you include testing as part of the build process, it’s easier to validate functionality with every change and squash bugs before they make it to a release. This process is a critical step to boost the velocity of the “idea to production software cycle" and is coded into our DNA. It’s just one of the ways we enable our customers to develop software faster and more securely.
In the case of Cloud Foundry, I have always been a consumer of the tech and never gave much thought to what was happening “behind the curtain” to improve the project over time. For me, continuous integration has always been about platform or software deployments within other companies.
Real World: CI
I started to explore the CI process for the component of Cloud Foundry that uses runC for container creation and management; it was the piece that was potentially affected by this exploit. (Spoiler alert: it wasn’t.) Even so, the actions of the team in the days that followed the exploit are informative as you seek to improve your own security practices.
The component development team that is highly-distributed, with engineers from Pivotal, IBM, and SAP located in England and Bulgaria. Automation and continuous integration are essential and support their aim of building highly-reliable software with velocity at scale.
What follows is the summary of what I learned about that team and its process.
Let’s start with a nifty visual of the CI pipeline for the open source project component:
The Cloud Foundry CI environment relies on Concourse, an open-source CI/CD tool. Concourse serves as a software development CI tool. We also use it for the continuous integration of the Cloud Foundry platform itself. So, the platform can be treated as a product; it can be patched, upgraded, and tested in an automated and repeatable way.
In this case, we are looking specifically at the pre-release pipeline that integrates and tests runC. (It’s just one pipeline in a large set used to build Cloud Foundry. But it is a useful example).
The workflow goes something like this:
Unit testing occurs as every commit flows through the entire pipeline.
The pipeline builds a binary testing candidate (pre-release) and a BOSH release.
These releases are deployed onto independent BOSH directors, and a full-suite of outside-in tests are applied.
If all these tests pass, the pre-release is promoted to a release.
From there, the release pipeline can be triggered manually to build the final release artifacts.
The tests don't stop there, however. Steps 1-5 run every 20 minutes every day—continuously.
The release pipeline also contains a set of non-gating tests that attempt to uncover issues not caught in pre-release unit tests. These tests run every 30 minutes all day, every day, to test for any sporadic issues that might occur over time.
How about scale testing? It spins up 30,000 containers to simulate a production load, and then it monitors the speed at which new containers can be added to the environment. Yes, you read that correctly—30,000 containers!
The Fix is In
Now, with that background on CI, let’s get back to the CVE story. CVE-2019-5736 identified an exploit that enabled the attacker to overwrite the runC binary on the host and execute code with root privileges. This bug was a huge deal, considering the hype and popularity of containers.
Because Cloud Foundry leverages runC, the team received information about the CVE a few weeks before the public announcement. They looked into the exploit and saw Cloud Foundry’s secure by default implementation around runC meant that it wasn’t vulnerable. That was great news! However, they still wanted to make an update to the component in case a similar vulnerability arose.
The team agreed that the best course of action would be to bump the runC version to as high as possible before the patch was released to the public. Pre-integrating the most recent runC version was vital. Why? Because most of the code and the dependency changes could be integrated and tested before patch day. Then, applying the patch would cause minimal changes to the community.
The team believed that this would reduce the standard 45-minute runtime of the pipeline. Pre-integrating the version of runC closest to the newly patched version would also reduce the risk of any blocking issues that would delay the patch from making it into a release. The day before the patch was released, the team executed this plan: the most current version of runC integrated into the build. Then they sat back and waited.
Around lunchtime the next day, the patch was released. The platform operator put down his sandwich and launched the pipeline to integrate the patch. He picked up his sandwich and took about 5 minutes to finish off the last few bites as the pipeline went green for signal completion. He wiped his hands, clicked “ship,” and kicked off the release pipeline. The entire process from patch to release took less than an hour.
A few hours later a CF Deployment was published!
The Value of a Platform
This story was just a quick peek behind the curtain, to reveal a small part of the process around building, integrating, and testing one component in Cloud Foundry and Pivotal Cloud Foundry. Sometimes people hear “platform" and think “oh, we can just take pieces X, Y, and Z and stitch them together to do the same things.”
However, a platform like Pivotal Cloud Foundry is much more than the sum of the parts. Remember, you get all the testing and support behind the scenes that eliminates stress and worry. With trust in your open-source software support chain, your development teams focus on the important things: building great software that differentiates your business.
To learn more about using CI/CD in your organization, please take a look at the Speed Thrills—How to Harness the Power of CI/CD for your Development Team whitepaper. You can also read more about using Concourse or another continuous integration product for software build automation and testing on our website.
About the AuthorFollow on Twitter More Content by Dan Baskette