Introducing VMware Tanzu RabbitMQ 4.0 - Built for Resilience and Performance

September 27, 2024 Howard Twine

It's been quite some time since the last major RabbitMQ release—version 3.0 launched back in November 2012. While that may seem like a long wait for the next big update, much has happened in the interim. Alongside several acquisitions, the VMware Tanzu® RabbitMQ® team has introduced significant features and improvements in point releases. Today RabbitMQ is a pivotal component of the VMware Tanzu® Data Solutions portfolio at VMware Tanzu®. The continued commitment by Broadcom to the open-source community is also key to this with the core engineering team for the community all being staff members at Broadcom.

Not surprisingly, the most significant feature of RabbitMQ 4.0 is both a breaking change and largely invisible to users, yet it's crucial for the future resiliency of RabbitMQ. Like many other products of this type, RabbitMQ also has a metadata store at its heart. This is used to store the definitions of the messaging topology, users, vhosts, queues, exchanges, bindings and runtime parameters - so it’s pretty vital to the operation of the message broker. The old metadata store Mnesia has been in use for as long as RabbitMQ has been in existence. As you would expect, there have been multiple changes to RabbitMQ’s functionality in the past 17 years,and many features have improved around it. The new metadata store in RabbitMQ 4.0 (Khepri) uses the same proven raft based algorithm that has underpinned the data-safe Quorum Queue for several years. This makes it very resilient to network partitions which in turn makes Tanzu RabbitMQ more robust. 

Other significant benefits are that Khepri is more efficient at maintenance tasks like deleting queues or exchanges, leading to an overall performance improvement with the core of Tanzu RabbitMQ. These performance gains also pave the way for future enhancements to the clustering capability of Tanzu RabbitMQ. Users have the choice to enable Khepri via an optional feature flag so that no one is forced to adopt this technology if not required or needed. For sure, there are no code changes required when enabling it and RabbitMQ seamlessly handles the migration from Mnesia to Khepri. 

On the subject of performance, this release sees the inclusion of AMQP 1.0 as a core protocol at the heart of RabbitMQ. This does not mean that the older AMQP 0.9.1 protocol won’t be supported, it just means there is a better, more standardized protocol that can be chosen. Support for this newer and OASIS published standard version further strengthens RabbitMQ’s multi-protocol approach and increases its versatility. In line with other ‘nativizations’, AMQP 1.0 support also sees the same kind of performance improvements that we saw with the introduction of ‘native’ MQTT in  3.12 (MQTTv3) and 3.13 (MQTTv5). In fact, the throughput for AMQP 1.0 between 3.13 and 4.0 is more than doubled for Classic and Quorum Queues as well as for Streams. As we saw with the native MQTT work, this native approach to protocols leads to less memory usage per connection. This means that RabbitMQ clusters can now handle more than twice as many AMQP 1.0 connections compared to previous releases. For more information on these performance improvements please see this blog post.

Previously, users couldn't take advantage of the topology management features in the AMQP 1.0 specification, requiring predefined queues, bindings, and exchanges. This rigid setup limited RabbitMQ’s key strength—its flexible routing capabilities. However, with 4.0, client applications themselves can now declare, delete, purge, bind, or unbind both queues and exchanges when using AMQP 1.0. It is even possible to ‘get queue’ which provides helpful information on the queue leader and its replicas in the case of quorum queues. These client libraries are a massive help to developers in getting the most out of the protocol without much effort. For example,  capabilities like publisher confirms become very easy. As of today, both Java and .NET clients are available with more to follow soon.

It’s not often the removal of a feature gets as much attention as the removal of Classic Mirrored Queues in RabbitMQ. Despite these old mirrored queues having been deprecated for more than 3 years, many are disappointed to see them finally removed. The good news though is that although the mirroring is removed any queues that were mirrored are not totally removed in 4.0, just the mirrored bit. It’s also worth pointing out that users do not have to mirror in order to publish or consume to/from specific nodes. In fact, even after removal of a mirrored queue, it is possible to publish or consume to/from a queue on any node within the same RabbitMQ cluster. So applications can use the same classic queues as before. Classic Queues remain to be supported without any breaking changes for client libraries. For continued data safety, the replicated types are Quorum Queues and Streams, both of which are very mature technologies. There is more information on how to migrate from Classic Mirrored Queues to Quorum Queues in this blog post.

The RabbitMQ engineering team are continuing to develop Quorum Queues, so that it performs better and safer than before. Previously users could assign a range of up to 255 priorities with Classic Mirrored Queues, however that is a road to complete priority misery and confusion. In fact, the JMS over AMQP 1.0 specification defines just two priorities - normal & high. This is what has been implemented in Quorum Queues for 4.0. Any priority value between 0 and 4 (inclusive) is treated as the normal priority. With values above 4 considered a high priority, and if the publisher doesn't specify the priority of a message, the value of 4 is assumed (normal priority). What if users need more control? It might be time to reassess your message priority definitions. Priorities are often inherited from legacy systems and rarely reviewed, as seen in the JMS spec. If unresolved, one solution is using multiple queues—now simplified with RabbitMQ 4.0's faster Quorum Queue boot times and AMQP 1.0 programmatic declaration. Additionally, Quorum Queues now have a default delivery limit of 20 and priorities for Single Active Consumers. 

A new exchange type has also been added - the Random Exchange. Mainly designed for “Request-Reply” type use cases, here messages are always delivered to a local queue (to reduce publisher latency). If there are multiple local queues connected to the same exchange, one of them will be picked at random to deliver the message to.

For Kubernetes users there is a new Helm chart that deploys both the open source Kubernetes operators as well as the Tanzu commercial Cluster and Topology Operators. Additionally there is now regex support in the Kubernetes Custom Resource Definitions, which allows users to define a range of vhosts instead of declaring them explicitly each time.

The resiliency features of Warm Standby Replication also get some enhancements in this release as well. In 3.13 we added a config file to make it easier for users to setup and configure WSR, this has been enhanced in 4.0 with support for lists and wildcards for vhosts etc. Secrets support is now available via the same config file to help provide additional security to the replication link between upstream and downstream clusters, currently this supports Hashicorp Vault. It is also possible for users to tag specific shovel and federation definitions to be synchronized as part of the Warm standby Replication process.The Kubernetes Standby Replication Operator is now replaced by this config file making it easier to configure WSR regardless of the environment.

Thus, RabbitMQ 4.0 introduces several key enhancements, particularly for interoperability and resiliency. The most obvious to users is the adoption of AMQP 1.0 as a core protocol with significant throughput and connection improvements. This release and its complimentary client libraries pave the way to greater things with RabbitMQ that will enable it to remain the most widely used message broker across the world for many years to come.

Previous
Mitigating Safety Risks with AI-Powered Applications
Mitigating Safety Risks with AI-Powered Applications

While AI brings unique risks, reinforcing fundamental cloud native patterns and practices can help you get ...

Next
Broadcom Named a Leader by IDC MarketScape in the APeJ for Cloud Cost and Capacity Optimization 2024
Broadcom Named a Leader by IDC MarketScape in the APeJ for Cloud Cost and Capacity Optimization 2024

Broadcom Named a Leader by IDC MarketScape in the APeJ for Cloud Cost and Capacity Optimization 2024