How to Plan for a Crisis with Infrastructure-Agnostic Recovery of Kubernetes Applications

August 9, 2023 Pradeep Kumar Chaturvedi

Corey Dinkens and Carol Pereira contributed to this blog post.

As enterprises deploy modern containerized applications to their Kubernetes clusters, managing data protection centrally is necessary to run critical business applications, especially in multi-cloud distributed environments. 

Today, we are glad to announce that VMware Tanzu Mission Control now adds support for Container Storage Interface (CSI) snapshot for backing up Kubernetes volumes in addition to the File System Backup (FSB) already provided. This addition improves application resiliency and introduces choice to platform operators offering point-in-time backups that are more current (i.e., recent data) and crash consistent. Tanzu Mission Control continues to expand the capabilities of its enterprise-grade data protection solution so platform teams can enjoy the benefits with minimal overhead since it is all managed as a service directly through the platform. 

Challenges of data protection for containers on Kubernetes

To avoid application downtime and data loss, platform engineers need to easily back up and restore their Kubernetes clusters and namespaces. But finding skilled Kubernetes operators for cloud native deployments is very difficult. Therefore, providing the right tools and automating operational tasks is essential to ensure those professionals deliver valuable benefits to their organizations.

To plan application resiliency against failure, Kubernetes operators need flexible data protection solutions with features, such as backup and restore, cluster migration, and disaster recovery. But how do they choose the best data protection solution for their organization? Like any important decision, they must first evaluate their non-negotiable needs, which might be related to minimal overhead in managing such a solution, recovery speed, or other variables.

Tanzu Mission Control is a hub for multi-cloud, multicluster Kubernetes management and leverages Velero, an open source solution, for the backup and recovery of Kubernetes clusters across environments and clouds. 

With enterprises running hundreds of clusters across public and private clouds, installing, configuring, and running Velero at each cluster is a daunting task; however, Tanzu Mission Control centrally manages the entire lifecycle of Velero across cluster fleets, drastically reducing the amount of toil involved.

The added support for the previously mentioned CSI snapshot,  in addition to the FSB, introduces more options for the backup of volumes (and one that is more crash consistent). A crash-consistent snapshot is needed for applications that store data on disk in addition to memory (e.g., databases or data processing pipelines). If applications are using a volume type that does not support snapshots, FSB would be the method of choice.

Backing up volumes with CSI snapshots guarantees customers a point-in-time copy of data available to restore in case of data corruption and any failures. CSI snapshot support in Tanzu Mission Control helps platform engineers become more agile when deploying and running stateful applications on Kubernetes.

Ensure stateful application backup and recovery with CSI snapshot option

In a Kubernetes context, a stateful application consists of two pieces: the configuration of the application (Kubernetes resources) and the application data (persistent volumes). Data protection solutions should provide backup and restore options for both Kubernetes cluster resources and persistent volumes.

CSI has become the ubiquitous choice for cloud native applications to provision persistent storage using Kubernetes primitives like StorageClass, PersistentVolumes (PVs), and PersistentVolumeClaims (PVCs). 

CSI snapshot represents the state of the storage volume in a cluster at a particular point in time and can be used either to provision a new volume or bring a volume back to a prior state.

FSB backs up volumes attached to pods from the file system of the volumes. FSB backs up data from the live file system resulting in data that is less consistent than the CSI snapshot. Unlike CSI snapshot, FSB needs to copy volume data to a different storage platform which takes longer to complete. To capture consistent backup using FSB, users may pause applications during the backup, however, this can result in application downtime. Additionally, platform engineers need to consider the data transfer implications such as egress cost, data transfer security, and bandwidth availability. 

The FSB approach does not work well with backing up large data volumes; whereas the CSI snapshot leverages storage systems to quickly create a consistent backup instead of duplicating data to different storage.

When platform teams are evaluating data protection solutions, it is important to choose one that offers both volume snapshot (CSI), as well as backup from the file system (FSB) of the volumes. This is especially important if applications are using a volume type such as emptyDir, which doesn’t have the concept of snapshot.

Since most Kubernetes clusters have different volume types, the option to choose both CSI snapshot and FSB in a single backup provides the essential capability to capture the entire cluster data in that single backup. 

  Create a backup with the flexibility to choose FSB and CSI snapshot for volume backup.

Back up entire clusters or a select namespace with total flexibility 

Most organizations use either cluster or namespace-level  tenancy for applications in their Kubernetes environment. Best-in-breed data protection solutions can scale to both models by allowing the backup of a portion of, or even the entire cluster using namespaces or label selectors, giving users increased flexibility. 

Since August 2022, Tanzu Mission Control has allowed Kubernetes operators to create an entire cluster backup without worrying about the application tenancy model in advance. With this feature, operators are then able to restore either a portion of, or the entire cluster (e.g., an application) using label selectors, with Velero managing the retention of backup files.

One of our customers, a technology company in the insurance industry, chooses to back up entire clusters even though their applications are mostly stateless. They prefer to have comprehensive backups because in these situations, they have found that their disaster recovery site is brought up more quickly, and  they leverage Velero through Tanzu Mission Control to restore the clusters as needed in case of a disaster. 

Ensure application mobility across platforms for best-of-breed cloud services

In today’s cloud-first world, increasing infrastructure choices and improving application portability can offer cost benefits, avoid vendor lock-in, and ensure access to new capabilities, which may bring new value to your applications.

It is not easy to manually move application configurations and data to new clusters running on other infrastructure types, so cross-cluster backup and restore is a valuable capability that VMware introduced in August 2022.

Cross-cluster backup and restore features make Kubernetes-based application infrastructure and distribution agnostic, supporting compliance requirements for organizations that must recover applications quickly (in case of disaster). Additionally, these functions support innovation as it allows teams to move applications between cloud providers in order to use the capabilities needed to bring more value to their organization.

Select a backup from another cluster and specify what resources to restore.

Reduce network costs with a truly multi-cloud solution

Backup storage location (BSL) is the target location where application backups are stored. In a multicluster environment, Kubernetes application backups should be stored centrally at alternate storage locations so that in case of a complete outage at the primary site, applications can be restored from there.

One of the critical considerations in choosing BSL is to reduce network costs (i.e., egress charges) from data movement. Data protection backup solutions should support multiple location types including those on-premises and cloud-based (e.g., Azure Blob Storage, Amazon Simple Storage Service [Amazon S3]) to complement an organization's multi-cloud strategy. 

While choosing software as a service (SaaS)-based data protection solutions, operators must ensure it takes advantage of high speed, on-premises networks running between clusters and storage instead of backing up over the internet, so they can achieve optimal performance and cost when dealing with multiple clusters. If they are running clusters and apps in Amazon Web Services (AWS) they should be able to back them up on Amazon S3 storage. Alternatively, for Azure clusters and apps, they may use Azure Blob Storage. 

Tanzu Mission Control allows platform engineers to specify a target location that points to a storage location in their cloud provider account, either AWS or Azure, and they can share target locations across clusters and cluster groups to use when performing cluster backups.

Why Tanzu Mission Control? 

As mentioned previously, Tanzu Mission Control offers platform teams the benefits of Velero, one of the most popular open source projects for Kubernetes data protection, without the toil of configuring it for each cluster.

Velero supports the backup and restore capabilities of both Kubernetes cluster resources and persistent volumes, including File System Backup for persistent volumes in addition to the volume snapshot approach.

Tanzu Mission Control allows administrators to provide data protection to their cluster fleet with role-based access to a central self-service console, CLI, and API. Moreover, the data protection features in Tanzu Mission Control also enable backup to on-premises from cloud clusters, or vice versa, for improved resiliency on failures. 

Tanzu Mission Control is never in possession of customers' application data, as backups go directly from clusters in their environment to storage controlled by them. The capability to restore a backup to the same or alternate namespace or cluster gives users increased flexibility. 

What's next?

Please review the documentation on how to back up the data resources in your cluster and watch the recording of our third webinar, part of a series covering the VMware view on multi-cloud, multicluster Kubernetes management.

Visit the Tanzu Mission Control product page, and check the release notes so you don't miss any new features.

Connect with us on Social Media (LinkedIn, Twitter, and Facebook), watch our YouTube videos, and follow the Tanzu blog for more news.

About the Author

Pradeep Kumar Chaturvedi is a product manager for Tanzu Mission Control focused on developing and expanding data protection capabilities for VMware Tanzu’s enterprise customers. Pradeep has 18+ years of experience delivering enterprise-scale IT management solutions that simplify the complexity of managing multi-cloud environments.

More Content by Pradeep Kumar Chaturvedi
Previous
VMware Tanzu and Aria in a Single Track: What to Expect at VMware Explore 2023
VMware Tanzu and Aria in a Single Track: What to Expect at VMware Explore 2023

VMware Explore 2023 happens in Las Vegas in August and the Tanzu and Aria portfolios are united in the Mode...

Next
Cloud Native Security Must Go Beyond the Perimeter
Cloud Native Security Must Go Beyond the Perimeter

Perimeter security isn't sufficient for today's cloud native environments and applications. An integrated a...