How Two Operators are Continuously Patching Tanzu Application Service

February 4, 2019 Jim Thomson

This post was co-written with David Laing.

In a breakout room at SpringOne Platform 2018, the scene played out more like a rock concert rather than a tech talk about upgrading software. Bryan Kelly, an engineer at Cerner, had just presented about continuously consuming security patches for Pivotal Cloud Foundry (PCF). When he announced that they had consumed 73 stemcell patches since May, the audience broke out in applause. An eager cadre of fellow operators jockeyed for a chance to ask Bryan questions—and the group quickly moved en masse to the hotel bar to keep the conversation going. The reaction was similar at Lance Rochelle’s talk (he leads the PCF operations team at Wells Fargo).

If you’re an operator, you can achieve this level of glory! I recommend watching both of these talks in full—they’re excellent success stories, they provide guidance on how to achieve the security outcomes the Cerner and Wells Fargo teams have been able to realize. By following their lead, you’ll be able to show your CISO, and your team, about the value these practices can bring to your organization.

Here are the talks:

Both of these organizations share these common traits that I’ll describe in more detail below:

  • They understand the importance of constant security patching, and take it seriously

  • They’ve built an upgrade machine using automation

  • They make small, constant updates to reduce risk and make it easier to de-bug when something goes wrong

  • They communicate their success to their leadership to advocate for the value they and PCF provide; they advocate for the practices that allow them to realize that value.

 

Why Patch?

So why did these talks elicit such a reaction? In a time where security breaches make news at an alarmingly regular cadence, both Pivotal and our customers’ platform operations teams recognize the value in fixing security vulnerabilities as soon as possible. Threats are everywhere—Lance from Wells Fargo estimates we’ll see more than 19,000 common vulnerabilities and exposures (CVEs) this year across all software.

At Pivotal, we take security seriously—on average our teams release security patches every six days. That’s just for a single minor line for Pivotal Application Service (PAS) and Operation Manager—it doesn’t count Stemcells or patches to additional tiles.

And Pivotal makes it easy to consume those patches. Take Stemcells, a bare-bones operating system deployed on every virtual machine (VM) in a PCF foundation. Patching operating systems used to take months (if done at all). With stemcells, you can upgrade every VM’s operating system in hours, with nearly no risk of downtime or other impact.

Successful platform operations understand how valuable it is to keep their software up-to-date on security patches. Cerner and Wells Fargo both took Pivotal’s security-patching cadence as a challenge: if Pivotal is going to make it possible to keep their foundations secure, their teams were going to take advantage.

Legacy practices and organizational constraints can make it difficult for many operations teams to realize the value of constant-patching, but once embraced, the results can be great. Cerner currently doesn’t let any installation—whether it’s PAS, Ops Manager, or Stemcell—outlive its welcome and become a security liability. Now we’ll go into detail about how you can keep up with Cerner’s success.

 

How to Patch: Build an Upgrade Machine

In his talk, Lance asks "What’s keeping you from repaving your servers once a day? If you’re doing it manually, you’re gonna struggle with once a month." Automation is key. Both of these teams invested in building an upgrade machine: a way to configure tooling to consume patches automatically. Once the machine is built, the team just needs to monitor and maintain it.

In Cerner’s case, they use Concourse to automate their upgrades, and GitHub to monitor and report changes. Their upgrades roll through their environments in a set weekly pattern:

  • Every patch is automatically applied to their Sandbox.

  • If that patch is successful, as measured by automated tests, the deployment is automatically promoted to their dev environment. (Flipping a typical interaction on its head—each evening, the team receives an email digest alerting them to updates that are already applied on their foundations)

  • Each Friday at 9:30am, the team reviews the updates made to their lower environments throughout the week and does a few sanity checks. If everything checks out, they flip a switch, and those updates are applied to production. They use Selective Deploy to keep these Friday deployments incremental.

Wells Fargo has a similar model—with a bonus. If Pivotal hasn’t released a stemcell patch since they last ran their pipelines, they run them again anyway. That means they repave constantly, removing any persistent threat that might have made its way to their servers.

As a third example: the Pivotal team that runs the PCF foundation that hosts Pivotal Tracker have a similar automated patching machine; although they apply patches directly to Production as long as they haven’t exceeded their availability error budget.

 

Inch-by-Inch, Life’s a Cinch

Both of these teams apply every update separately. In doing so, they reduce the risk of any one deployment failing and, importantly, when something does fail, they can debug quickly as they know exactly what has changed. When they’re ready for a minor upgrade, they’ve moved incrementally patch-by-patch and built up their upgrade muscles along the way. As a result, their minor upgrades are also typically quick and painless.

By staying consistently updated on the latest version of PAS or Ops Manager—thanks in large part to how they’ve set up their sandbox to automatically pull in the latest update off of PivNet—the team at Cerner can stay comfortably in support of the platform. As Bryan said, “that's a really cool feeling because you can review the release notes against something that's running, instead of having to review the release notes and then decide to take the update.”

 

Communicate Success

Finally, both of these teams are able to brag about their success, so leadership knows about their value too. As a healthcare company, Cerner is required to track their upgrade activities, but they also share these data throughout their org: it shows how successful they’ve been in keeping their foundations and workloads secure. As a result, they’ve increased investment in their team and PCF and plan to continue to expand and realize value.

 

Don’t Worry and Trust the Platform

If you’re an operations team, I encourage you to follow Wells Fargo and Cerner’s lead. Take advantage of Pivotal’s security posture, automate, make small changes, and brag about your success. You’ll have a secure platform, easier upgrades, happy teams, and an even happier CISO. Your customers will be safer thanks to these efforts, and you’ll be the next success story.

 


Learn more about why infrastructure automation is a critical path to improving your organization’s security by downloading this report from Forrester Research.

About the Author

Jim Thomson

Jim is a Product Lead at Pivotal Cloud R&D, responsible for the upgrade experience. He loves to bring product-thinking into an operations world.

Previous
Wells Fargo's Rod Sayegh: "Being Empathetic Has Ripple Effects"
Wells Fargo's Rod Sayegh: "Being Empathetic Has Ripple Effects"

Next Video
Securing by Regularly Rebuilding
Securing by Regularly Rebuilding

Why patch servers when you can rebuild servers in less time, with greater consistency, avoiding human error...