The lifecycle of Kubernetes is rapid. It feels like yesterday that we deployed our first persistent volumes. The technology that made this happen was the VMware vSphere Cloud Provider (VCP), and many cloud providers used the same type of integration early on. Over time, the goal of the Kubernetes project was to remove vendor code and move to an out-of-tree driver model. VMware was a part of the Container Storage Interface (CSI) design and began developing the vSphere CSI driver long before it was ever stable. The time has come to update Kubernetes versions and begin migrating our production workloads from the VCP to the CSI driver.
As an admin of a VMware Tanzu Kubernetes Grid Integrated deployment, you may know that the vSphere Cloud Provider driver has been deprecated in favor of the vSphere CSI driver, and there are many benefits to using the newer driver. For example, the vSphere CSI driver follows the current preferred, out-of-tree plug-in design unlocking patching from the Kubernetes version and has features that allow for better container volume visibility for the vSphere Admin (Cloud Native Storage). However, there are environments that currently use the vSphere Cloud Provider driver, and the problem is that the in-tree vSphere volume plug-in will be removed in future Kubernetes releases. So, for those who want to move over to the vSphere CSI driver, a new VCP-to-vSphere CSI migration feature introduced in VMware Tanzu Kubernetes Grid Integrated Edition 1.12 is here to help. Read on to learn more about this feature and how the migration works.
Focus on ease
There are already ways to migrate to the vSphere CSI driver. All you need to do is add a vSphere CSI storage class, scale the application to use the new storage class allowing the data to replicate, and then terminate the old pod instances. That sounds easy and, maybe in a perfect world or during a lab demonstration, it is. In a real production environment, however, it rarely goes that smoothly. Technical issues occur when some applications are not as cloud native as we would like. The human element involved—coordinating with multiple user groups and explaining what to do with their pods and storage classes—can make this process time-consuming and error-prone.
With the VCP-to-vSphere CSI migration feature, users don’t have to change anything they are doing. They can keep using the volumes as if they were still the VCP volumes, and Tanzu Kubernetes Grid Integration Edition 1.12 is smart enough to figure out how to back them with the vSphere CSI driver on the back end. Since data is not being moved from one disk to another for vSphere volumes that are migrating from VCP, that is easier as well.
Enabling VCP-to-vSphere CSI migration
The integrated vSphere CSI driver needs to be installed, and any manual install of the vSphere CSI driver should be removed. Before enabling VCP-to-vSphere CSI migration on a cluster, I recommend ensuring any application that is deployed and cannot have a service interruption has the replica set to 2 or more. Each pod will need to be restarted if it has any vSphere Cloud Provider volumes mounted so they can be unmounted and then mounted again under the vSphere CSI driver. However, since the “tkgi update-cluster" workflow will restart all the pods on each worker, it is a good idea to run through a health check of the environment before making changes.
Once you are ready to go, make a “config.json” file with the following in it. You can format this in YAML if you prefer. If you do, make sure to change the file extension to YAML since the Tanzu Kubernetes Grid Integrated Edition CLI will look for it:
Once you are ready to start, run the following with your config file and the name of the cluster you want to enable VCP-to-vSphere CSI migration on: “tkgi update-cluster my-cluster --config config.json”
Contents of the config.json file and output after enabling VCP-to-vSphere CSI migration
Deeper into the weeds
So, what happens when the migration is enabled? When “tkgi update-cluster” runs, the master nodes are rebuilt by Bosh Director. When this happens, the scripts generate-signed-webhook-certs.sh and create-validation-webhook.sh along with the manifests
namespace.yaml are added to the
/var/vcap/packages/csi/bin directory. Additionally, the manifest
/var/vcap/jobs/csi-controller/config/feature-switch.yml is updated, setting “csi-migration” to “true” (the default is “false”). After this is done and the master node bootstraps, the webhooks are set up and the VCP-to-vSphere CSI migration is enabled. At that point, all the current persistent volumes and persistent volume claims will be checked and identified by their annotations.
The worker will then be updated by draining and evicting the pods off to the other workers. When this happens, the vSphere Cloud Provider will unmount and detach the vSphere volumes. When the pod starts on another worker, the vSphere CSI driver will attach and mount the volume to the worker and pod instead of the vSphere Cloud Provider. When a volume has been migrated, the annotation “pv.kubernetes.io/migrated-to: csi.vsphere.vmware.com” is added. This is useful to keep in mind, since old volumes that used to be on the vSphere Cloud Provider currently won’t get all the capabilities that a new volume created on vSphere CSI driver would.
If you want to check whether csi_migration is already enabled or to verify that it is enabled, you can do this by looking at the configmap. For example: “kubectl describe configmaps internal-feature-states.csi.vsphere.vmware.com -n vmware-system-csi”
Output when describing the internal-feature-states.csi.vsphere.vmware.com configmap
Before making the leap, make sure to check out everything listed in the documentation for a list of things to consider before turning on migration. If you want to learn about more new features available in Tanzu Kubernetes Grid Integrated Edition 1.12 and how to update, check out the release notes.