How the Toolsmiths Team in Pivotal R&D cut over half a million dollars from its IaaS bill without impacting development or users.
The Pivotal R&D organization has grown rapidly over the last several years and with it, our IaaS costs. If your business is growing, then you’re adding new customers and services every day and there’s a good chance your R&D organization is in the same boat.
This isn’t necessarily a bad thing, of course. Elastic infrastructure helps us build and test great products! But thrift is a virtue. So we decided to look for ways to cut wasteful usage, and reduce spending with our internal cloud provider, Google Cloud Platform (GCP). Our goal was to save money, without impacting our pace of delivery.
Here’s a look at some simple tactics that lead to big savings. We hope our experience gives you some ideas about how to trim your IaaS spending without limiting innovation. You should be able to apply these ideas for all the public clouds you use—none are specific to GCP.
First Things First: Understand Your Current Bill
The first step? Getting a better understanding of where our money was going. It turns out, 90% of our spending with GCP went to virtual machines. We asked ourselves, “What are all these VMs doing, and why are they so expensive?”
Keep CI Test Environments for a Single Day
To answer these questions, we looked at how long R&D teams used environments for our myriad of continuous integration (CI) tests. We discovered that some environments are used for a few hours, while others are used for days or weeks. After interviewing teams, we decided that a test cycle shouldn’t be longer than one day. It’s just CI right? There’s no need for infrastructure to run longer than this!
So we started deleting environments after one day. Our user research showed that individual R&D teams actually liked this new rigor. It’s viewed as a garbage collection feature. This simple action reduced the average “time-to-live” for environments by five hours. That adds up to $100,000 per year savings!
Nix the Extra Load Balancer
Next, we discovered that we deployed an HAProxy VM for each Pivotal Application Service tile. This surprised us, since we use GCP’s built-in load balancers. But someone, sometime, figured that we might want to test load balancing using an HAProxy. After interviewing a few teams, we found no one was using that VM. So we killed it. There’s another $70,000 per year saved just by eliminating all the HAProxy’s from our PAS tiles and modifying a few .yml files. Not bad for a Tuesday!
Right Size Instances for Your Workload
GCP gives recommendations for VM instance sizes based on the characteristics of your workloads. So many suggestions, so little time! We picked the biggest instances for the components that really need it — Diego Cells, UAA, Cloud Controller, and a few others. Then we adopted the recommended smaller instances for other parts of PCF. From there, we set each job to use its new, right-sized instance type. Then we used the Ops Manager API to implement the new lean-and-mean configuration. The result is a reduction of about $250,000 per year from our total spend. If you use AWS or Azure, you’ll have to do some number crunching to run these optimizations yourself. Google makes this optimization very easy.
Last One Out on Friday? Delete the Environments Before You Go Home
Then it was Friday, and we went home. It turns out that pretty much everybody goes home for the weekend. Bizarre, I know, especially given Pivotal’s rapid product release schedule. Our PCF environments are pre-created, and left running for R&D teams to start using whenever they may need them. No one really wants our PCF environments on the weekends. But they do want them to be available Monday morning. So we decided to take all of our PCF environments offline on weekends (except for one, just in case). We’re also spinning up new environments in the morning, before the teams get into the office. The result is another $130,000 in cost reduction.
Use Blob Storage, Not File Servers
Did you know that a typical FileServer VM costs $1.14 per day on GCP while 100GB of Blob Storage is only $0.07? Unless you’re a GCP pricing nerd like me, you probably didn’t. That’s another $125,000 a year in savings by turning off a VM and switching to built in IaaS Storage. (This savings calculation assumes you’re using the default configuration for the VM in the US-Central location, where the prices are the lowest.)
It’s been a week since we made these configuration changes, and we haven’t impacted our users. The net cost savings is around $675,000 per year, and our user stories aren’t any harder than normal to implement.
Your most difficult task may be building the cost model, and determining the value of an individual change. As long as the same assumptions are made for each scenario, though, the relative values will be correct. You’ll soon notice that a few of the most important stories will result in savings that are orders of magnitude larger than the others.
We hope we’ve inspired you to take a look for ways that you can trim IaaS costs! The savings can be put towards building even more great products and services, or jump-starting innovation at your organization. Take a stab at it and let us know how it goes! If you have any other tips on how to reduce IaaS spending, please leave them as a response to this post.
Change is the only constant, so individuals, institutions, and businesses must be Built to Adapt. At Pivotal, we believe change should be expected, embraced, and incorporated continuously through development and innovation, because good software is never finished.
How to Save a Fortune On Cloud Infrastructure was originally published in Built to Adapt on Medium, where people are continuing the conversation by highlighting and responding to this story.