This post continues our CI pipeline series. In this edition, we talk about how we optimized the pipeline’s Cloud Foundry deployment step (into bosh-lite on AWS) from previously over one hour to about 20 minutes, resulting in shorter feedback cycles and therefore faster development.
To prepare the integration tests, the pipeline first launched a bosh-lite VM in AWS (using the Vagrantfile created for this purpose). Next, it would create a Bosh release from the current version of cf-release and the new loggregator version to be tested, upload it to bosh-lite, deploy, and then run the tests. Finally, it would destroy the bosh-lite VM.
Running the tests themselves took less than five minutes, but because the pipeline would do a complete Cloud Foundry deployment from scratch for every test run, so the entire run took more than one hour.
Reusing the Bosh-Lite VM
Our first optimization was to re-use the same bosh-lite VM for multiple test runs. This was effective because Bosh automatically detects the differences between a new release and the currently deployed one, and only uploads and deploys those differences.
We still wanted to periodically do a from-scratch deployment on a new VM, so that our pipeline wouldn’t degrade over time or show false results due to bad state of the bosh-lite VM.
To achieve this, we set up a separate pipeline that would run on a schedule to automatically recreate the bosh-lite VM every 24 hours at midnight. The first (automatically triggered) integration test run after midnight would still take over an hour, but subsequent runs (triggered during daytime by developer changes) would complete in 25-30 Minutes.
Caching create-release Artifacts
Bosh can also cache the results of create-release operations locally, so that creating subsequent releases will complete faster. We did not leverage this capability yet, because the build agents clean up their state in between runs, and each integration test could potentially run on a different build agent.
We were able to achieve another optimization by writing our own scripts to cache those create-release artifacts in S3: After creating the release, we upload Bosh’s .dev-builds directory into S3, and before creating the next release, we download and restore this directory.
Similarly to the first optimization, we wanted to avoid problems over time due to stale cache; therefore, we start fresh by not restoring the cache for the daily first run after re-creating the bosh-lite VM.
The graph below shows the current runtime of our integration test pipeline. The spikes indicate the daily >1h run at midnight. (Note that the horizontal axis shows the number of each run and is therefore not linear with time.) Without the optimizations, each run would have taken as long as those spikes.
About the Author