We’ve been on a journey towards increasing levels of enterprise readiness with our MySQL for Pivotal Platform service, and we’ve reached an exciting milestone in this journey. MySQL for Pivotal Platform v2.7 is now generally available, and this important milestone is marked by the general availability of high-availability clusters, as well as the beta release of multisite replication. Both of these features are hallmarks of an enterprise-ready product for running business-critical workloads in production.
I had the opportunity to discuss these new developments with Judy Wang, one of our Product Managers for MySQL. This discussion is captured in the second half of a webinar I did with Judy. The remainder of this post is a synopsis of this discussion. You’ll want to watch the entire webinar, because Judy goes through some cloud-native patterns and practical strategies in the first half.
Defining Your Availability Requirements
In setting the context for our HA discussion, Judy talked about how availability is covered via multiple configuration options. Each configuration option offers a different profile with respect to your recovery point objectives and recovery point objectives.
Recovery point objective refers to the amount of data you’re willing to lose in the event of a failure. So a recovery point objective of zero means you’re unwilling to lose any data —an important objective for many transactional workloads and systems of record.
Recovery time objective refers to mean time to recovery, i.e. the elapsed time users experience when a system is unavailable while it is being recovered. So, a recovery time objective of zero means you don’t want users to experience any downtime during failure recovery.
Configuration Options for Availability
The key in selecting the best fit MySQL topology for your workload is to choose the simplest configuration option for your availability requirements. Options with higher availability criteria, though enticing, are more complex and difficult to manage, and accrue higher infrastructure costs. So, let’s dig into these configuration options to help you decide which option is the best fit for your recovery point and recovery time objectives.
This is the simplest topology—a single MySQL server with one persistent disk mounted onto it. Even a single node configuration offers a simple form of availability and disaster recovery (DR), and can satisfy the majority of App workloads.
Availability relies on the automation provided by Pivotal Platform and BOSH. If you experience a service interruption, such as a crashed process or a VM failure, BOSH will manage those interruptions for you and recover within minutes in most cases.
Disaster recovery is made possible with a single node system because of backups that are built into the configuration (not optional), and taken at regular intervals. If the data in your MySQL server is lost, you can recover by restoring your data from a backup. There is some potential for data loss for data that is in-flight between backups.
This offers modest coverage for your recovery time and recovery point objectives.
Leader-Follower Satisfies More Stringent Recovery Point Objectives
The leader-follower topology is better suited for more stringent recovery point objective (RPO) requirements. This topology creates a second copy of your data in read-only mode. The data is continuously replicated from the primary node to the secondary node using binary log replication. When the leader-follower instance is deployed across two Availability Zones, it offers a disaster recovery strategy to partial data center outages. Unlike backups, this configuration will not take you back in time - you would be able to failover to the follower node, which will be in-sync with the leader in terms of its dataset, give or take some minor network lag.
High-Availability Clusters Satisfy More Stringent Recovery Time Objectives
If you have more stringent recovery time objectives, then high-availability (HA) clusters are your favored option. For example, this would be the case if you want five nines of uptime. In the event of a failure, HA clusters offer zero downtime and zero data loss. Data replicated across the cluster and kept synchronized through a form a synchronous replication. In the event of a failure, the failover to another node in the cluster is fully automated, and no data is lost.
You can also set up a cluster to meet these stringent objectives even when an availability zone fails. This simply requires you to set up the nodes of your cluster in different availability zones.
Putting it All Together
Here’s what it looks like when you put it all together.
Let your requirements dictate your choice, always choosing the simplest configuration possible.
Multi-Site Replication on the Horizon
If you need to recover from failures that impact an entire data center, then there is good news on the horizon. We are now in beta with multi-site replication. There are two configuration options available with multi-site replication: active-passive, and application layer active-active.
Disaster Recovery with Active-Passive Configuration
This configuration takes leader-follower to the next level by allowing the leader and follower to be on different sites. Traffic is sent to a global load balancer, which forwards this traffic to the leader site, i.e. the primary data center. In the event of a site failure, the global load balancer redirects traffic to the follower, i.e. the secondary data center. Similar to leader-follower at a single site, data is replicated from the leader to the follower using the binary logs of the leader.
For this to work, you need to bind instances of your applications at both your primary and secondary sites to MySQL instances at each site. So, both the application logic and the data are available in both sites.
The downside of this configuration is related to the resources needed at the passive site, which only has utility in the event of a site failure.
Application Layer Active-Active Configuration
The application layer active-active configuration makes use of the compute and application logic resources on both sites. We’re particular about labeling this as the ‘Application Layer’ active-active configuration because the activity across both data centers is managed at the application layer, not at the database layer.
Like the active-passive configuration, traffic is sent to a global load balancer. Unlike the active-passive configuration, the global load balancer sends traffic to both data centers simultaneously. However, the notion of a primary and secondary data center still exists. The application logic at the secondary site still accesses data at the primary site. The data is replicated from the primary site to the secondary using the same mechanism as the active-passive configuration. If the primary fails, the global load balancer routes all the traffic to the secondary, which now access the local copy of the data. All of the logic to make this happen is handled for you and you don’t need to configure anything in your application or do anything extra to handle the flow of traffic.
Want to Dive Deeper?
This post is focused more on the ‘what’ rather than the ‘how’. If you’re interested in how HA and multisite replication work, then you really need to listen to my webinar with Judy. The second half is all about your available options and multisite replication. You’ll want to experience the entire webinar though - the first half has very useful information on modern architectures and design patterns, and how MySQL for Pivotal Platform supports them.
You can also learn about MySQL clusters from a blog post I published when this feature first went into beta in MySQL v2.5. Our journey towards increasing levels of enterprise readiness also includes transport layer security which we added in MySQL v2.3, and you can read about it in this post, which was also the initial introduction of the leader-follower configuration.
Judy will join a customer on the main stage at our annual SpringOne Platform conference happening October 7-11 in Austin to talk about the customer’s experience with MySQL for Pivotal Platform! You can still register, and receive a $200 discount by using the following code: S1P200_JMirani. Can’t make it in person? Recordings will be posted in the following weeks.
These are high-value additions to the product, but there will be more in this journey - our aspirations are even higher than what we’ve brought forth so far in the area of enterprise readiness. It will be my pleasure to inform you of these improvements as they occur.
About the AuthorMore Content by Jagdish Mirani