Authored by Dan Baskette, Gareth Smith and Claude Devarenne
In the world of Continuous Delivery, the term Blue-Green Deployment is a very popular and widely discussed topic. A blue-green deployment is the deployment of a parallel idle environment to enable rapid upgrades. This can be accomplished by switching the application traffic to the idle version to make it the new live version. This deployment technique can eliminate downtime for application upgrades, while also lowering risk by providing an environment for pre-upgrade testing and capability for immediate fallback in the event of an issue. This is an important concept to take advantage of in agile organizations that are deploying cloud-native and pushing application updates frequently. While blue-green deployments work well for cloud-native applications, what about the services those applications depend on? Databases and message brokers are typical components of these architectures and both require a bit more work to manage a blue-green deployment. This blog will cover how to handle blue-green deployment when RabbitMQ is being used as the message queue for the application.
Applications that can tolerate interruption of service may be good candidates for a classic blue-green deployment. In this scenario, after setting up a new RabbitMQ cluster (the green cluster) new producers and consumers are launched, targeting the green cluster. Producers on the blue RabbitMQ cluster are stopped. Consumers on the blue cluster are left active until all messages have been consumed. In such a sequence, there may be duplication of messages as the green producers and consumers are activated. Applications therefore need to be able to handle duplicates. It is also possible that the cutover from the blue cluster to the green cluster is not completely seamless:
Some messages may still be left in queues of the blue cluster and it may not be possible or practical to allow blue consumers to drain the queues. When that is the case, it is common to use the RabbitMQ shovel plugin to move messages from the blue cluster to the green cluster.
During the cutover there may be a short period of time when either producers or consumers in the green cluster are not immediately available. Applications that can tolerate a short outage may be able to handle this without issues.
A second option, sharing a message queue between the blue and green deployments is workable, but this type of deployment would need some configuration acrobatics to provide full benefits. For instance, to use the green environment for testing requires a change of the exchange and queue that the green producer and consumer use. This configuration would then need to be changed back before a blue-green traffic swap. That is a sub-optimal solution and introduces variables and net-new risk points that are not exposed in a blue-green deployment of just the application. There may also be times in which an application upgrade includes an upgrade to the message queue itself, so to perform these types of upgrades a parallel queue will have to be deployed.
Fortunately, an easier and safer option is available for blue-green deployments. RabbitMQ has a feature that can help mitigate the issues with these situations known as Federated Queues. A federated queue links to other queues (called upstream queues). It retrieves messages from upstream queues to meet the demand for messages from local consumers. The upstream queues do not need to be reconfigured and they do not have to be on the same broker or in the same cluster. In fact, this feature of federated queues is an advantage in this type of deployment discussion because it allows the green deployment to contain its own independent RabbitMQ instance.
For applications with high SLAs (e.g. above 99.9% availability), a blue-green deployment gets complicated very quickly. This is particularly true when the volume and velocity of messages are high. The combination of a high SLA with high traffic makes a typical blue-green approach more difficult to execute smoothly. In such scenarios, federated queues are an easier approach to minimize the loss or duplication of messages and to avoid application downtime.
Before we get into the details of making this work, a couple of constraints should be considered. If any of the apps involved are both producers and consumers (typically discouraged), this process will encounter problems with queued messages not being drained before cutover. Also, if the producer or consumers are caching messages, there is a risk of cached messages being lost before they are processed because they are contained in the host cache which will disappear when the host is turned off.
Let’s walk through the steps required to use the Federated Queue functionality to enhance the blue-green deployment process of apps deployed with Pivotal Cloud Foundry.
In our example configuration, there is both a producer and a consumer application connected to a RabbitMQ instance. These initial copies of the applications are the blue apps in this deployment. The first step is to deploy a second RabbitMQ instance and then configure a federated queue between the two instances. Without a local consumer attached, this second instance of RabbitMQ will remain idle for the time being. If you are using the On-Demand functionality of the RabbitMQ tile within Pivotal Cloud Foundry, the Federated Queue plugins required are enabled by default.
The next step is to begin the switch-over of the blue consumer to the new version of the consumer, known as the green consumer. This green consumer can be a new version of the application with new features, but one key difference is that it will be configured to use the second RabbitMQ instance. The switch-over will cause the flow of traffic to begin between the federated queues and the messages will then be consumed by this newly active green consumer. In Pivotal Cloud Foundry, this is a cf push of the new application version with the manifest modified to reference the second RabbitMQ instance. For this deployment, the Autopilot plugin to the cf cli is used to simplify this process. This description is from the README on the Autopilot GitHub Repo:
Autopilot takes a different approach to other zero-downtime plugins. It doesn't perform any complex route re-mappings instead it leans on the manifest feature of the Cloud Foundry CLI. The method also has the advantage of treating a manifest as the source of truth and will converge the state of the system towards that. This makes the plugin ideal for continuous delivery environments.
Now, the process used on the consumer can be repeated, but instead, the producer side is switched over to the green producer. The Autopilot plugin replaces the original application with the new version, rather than re-mapping traffic to a new instance. This will cause the traffic flow to the green producer which is connected to the second RabbitMQ instance, as is the green consumer.
At this point, the system is live and operational on the green versions of the apps and message queue. So, the first or blue RabbitMQ instance can be taken offline, or upgraded for use as the green queue during the next upgrade cycle.
Blue-green deployments are an important piece of the agile software deployment workflow. It's important to research all the functionality available to you to minimize risk and downtime. This research will lead to discovery of capabilities and technologies that can help mitigate the risk in upgrades.
For a video demonstration of the process described here, you can take a look at the Pivotal YouTube Channel at the Blue-Green Deployment of Applications leveraging RabbitMQ video.
You can also find some great information on RabbitMQ usage and patterns, in these two Pivotal Webinars. These will demonstrate some of the newest features and give you good data points on how to leverage them.
For those that like to get their hand dirty and play with the technology, here are the download links for the software used for the blog and video demo.