A Closer Look at VMware Tanzu Kubernetes Grid Multitenant Setup

May 28, 2020 Tom Schwaller

VMware Tanzu Kubernetes Grid (TKG) is using the Kubernetes Cluster API to create Kubernetes clusters on vSphere 6.7U3, AWS, and in the future, other cloud environments as well. All you need (on vSphere) is the TKG CLI and two OVA templates — one for the Kubernetes control plane/worker nodes and one for the HAProxy, which is used to load balance the control plane nodes. If you want a fast start with TKG, use William Lam’s TKG Demo Appliance, which is available as a VMware Fling, and read Kendrick Coleman’s introduction to TKG.

In this post, we will only use command line tools to install a multitenant TKG 1.0 setup. We will also present a few tips and tricks, and discuss best practices for the installation on vSphere 6.7U3. In our next post, we will upgrade this setup using TKG 1.1, which was released a few days ago.

Automating the download of TKG binaries

First, download the TKG binaries and OVAs available at my.vmware.com.

The easiest way to do this in an automated way is to use the vmw-cli tool. Since we have to use Docker for the TKG installation anyway, we use the vmw-cli Docker image, not the NPM-based installation method. Let’s pull the image first:

$ docker pull apnex/vmw-cli

Next, add the following two environment variables in your ~/.bash_profile and reload it (source ~/.bash_profile). If you are using another shell (e.g., zsh, tcsh, etc.) on your Linux system, adapt this step accordingly. I am using an Ubuntu 18.04.4 LTS system as jump host, but any other Linux (or MacOS) system works equally well. 

$ cat >> ~/.bash_profile << EOF
export VMWUSER='<username>'
export VMWPASS='<password>'
EOF
 
$ source ~/.bash_profile

Don’t forget to use your own my.vmware.com username and password! The following command downloads the index files fileIndex.json and mainIndex.json into our working directory ~/tkg, which we create first.

$ mkdir ~/tkg
$ cd ~/tkg
$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli index vmware-tanzu-kubernetes-grid

And finally, let’s download the TKG CLI and two OVAs.

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get tkg-linux-amd64-v1.0.0_vmware.1.gz
$ gunzip tkg-linux-amd64-v1.0.0_vmware.1.gz
$ sudo mv tkg-linux-amd64-v1.0.0_vmware.1.gz /usr/local/bin
$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-v1.17.3_vmware.2.ova
$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-capv-haproxy-v0.6.3_vmware.1.ova

Uploading OVA templates 

To upload our OVAs, we will use govc, a very powerful command-line tool, to work with the vSphere API. Let’s install and configure it by first using some environment variables.

 
$ sudo curl -L $URL | gunzip > /usr/local/bin/govc
$ sudo chmod a+x /usr/local/bin/govc
 
$ cat >> ~/.bash_profile << EOF
export GOVC_URL='vcsa-01a.corp.local'
export GOVC_USERNAME='administrator@corp.local'
export GOVC_PASSWORD='VMware1!'
export GOVC_DATACENTER='RegionA01'
export GOVC_NETWORK='VM-RegionA01-vDS-COMP'
export GOVC_DATASTORE='RegionA01-ISCSI02-COMP01'
export GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'
export GOVC_INSECURE=1
EOF

$ source ~/.bash_profile
$ govc ls

If you are a user of the VMware Hand-on Labs this should look familiar to you, since I am using a very similar environment for this post. Let’s create a new VM and template folder, tkg, first.

$ govc folder.create /RegionA01/vm/tkg

This is the place for our OVAs as well as all the control plane/worker VMs. Since for many different clusters belonging to different tenants it is preferable to create a folder hierarchy, you will see later how to use them with the TKG CLI, but for this demo we will work with a single folder. And as you can see, we will create different resource pools for all our clusters in a multitenant setup.

The TKG management cluster will be deployed in a resource pool on the vSphere management cluster. Similarly, the tenant clusters will be deployed to separate cluster pools on the vSphere compute cluster. The cluster pools themselves are sub-pools of the corresponding tenant pool. As the image above makes clear, the infrastructure admin has full control over the resources consumed by each cluster and each tenant.

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1
$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1
$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-2
$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-2
$ govc pool.create /RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster

After creating the basic setup, it’s time to upload our OVAs. While govc can create the JSON-based configuration file automatically, we’ll start by changing a few parameters with the jq CLI tool, which we install first.

$ sudo apt install jq

Here is the shell script to use for the OVA upload:

#!/bin/bash

set -euo pipefail

NETWORK=VM-RegionA01-vDS-COMP
PHOTON=photon-3-v1.17.3_vmware.2
HAPROXY=photon-3-capv-haproxy-v0.6.3_vmware.1
GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'


cd ~/tkg
govc import.spec ${HAPROXY}.ova | jq ".Name=\"$HAPROXY\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${HAPROXY}.json
govc import.ova -options=${HAPROXY}.json ${HAPROXY}.ova
govc snapshot.create -vm ${HAPROXY} root
govc vm.markastemplate ${HAPROXY}
govc object.mv /RegionA01/vm/${HAPROXY} /RegionA01/vm/tkg
govc import.spec ${PHOTON}.ova | jq ".Name=\"$PHOTON\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${PHOTON}.json
govc import.ova -options=${PHOTON}.json ${PHOTON}.ova
govc snapshot.create -vm ${PHOTON} root
govc vm.markastemplate ${PHOTON}
govc object.mv /RegionA01/vm/${PHOTON} /RegionA01/vm/tkg

Keep in mind that we are using a shared ISCSI datastore across both vSphere clusters (REGIONA01-MGMT-01 and REGIONA-01-COMP01), but two different vSphere Distributed Switches —one for each cluster. The TKG CLI can remap the network when deploying the template, but not the storage. If you use separate datastores for the management and compute cluster, be sure to deploy the OVAs to both datastores!

As you can see in the shell script, we also create a snapshot before marking the OVA as a template. That way we can use linked clones for the deployment of the TKG clusters, which is highly recommended in nested environments (like the one we use here). VMware only supports full clones for production deployments as it is not possible to increase disk sizes with linked clones.

Installing the TKG management cluster 

So far, we have downloaded, installed, and configured all needed tools, created the folder and resource pool structure for our multitenant setup, and uploaded the OVAs. Now it’s time to create the TKG management cluster. You can of course do this using the TKG web UI by executing the command

$ tkg init --ui

But since we want to automate everything, we have to create a configuration file first. Using the UI would be an easy way to create it, but it is not possible yet to save the configuration before starting the deployment (which is using fullClone instead of linkedClone by default). As a workaround for this chicken-and-egg problem, you can kick off the deployment and immediately stop it, or you can create the configuration by hand.

Let’s assume you created the following configuration file and saved it outside the ~/.tkg directory:

$ cat tkg/vsphere.yaml

VSPHERE_SERVER: vcsa-01a.corp.local
VSPHERE_USERNAME: administrator@corp.local
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_DATACENTER: /RegionA01
VSPHERE_NETWORK: "VM-RegionA01-vDS-COMP"
VSPHERE_DATASTORE: /RegionA01/datastore/RegionA01-ISCSI02-COMP01
VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-COMP01/Resources/tkg
VSPHERE_TEMPLATE: /RegionA01/vm/tkg/photon-3-v1.17.3+vmware.2
VSPHERE_HAPROXY_TEMPLATE: /RegionA01/vm/tkg/photon-3-capv-haproxy-v0.6.3+vmware.1
VSPHERE_NUM_CPUS: "2"
VSPHERE_MEM_MIB: "4096"
VSPHERE_DISK_GIB: "40"
VSPHERE_FOLDER: /RegionA01/vm/tkg
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa ... ubuntu@cli-vm
SERVICE_CIDR: 100.64.0.0/13
CLUSTER_CIDR: 100.96.0.0/11

Then you can create the ~/.tkg folder and its content by executing

$ tkg get management-cluster

Now add this vSphere-specific configuration to the TKG configuration file:

$ cat tkg/vsphere.yaml >> ~/.tkg/config.yaml

Inside the ~/.tkg folder you will find the providers directory with the different development/production cluster templates for vSphere and AWS. The templates use variables set in the TKG configuration file, but they can be overwritten by environment variables. And this is exactly the approach we will use to put our clusters in different resource pools.

Notably, we only overwrite the resource pool variable (VSPHERE_RESOURCE_POOL) and use one network (VSPHERE_NETWORK) for the management cluster and a different one (e.g., VLAN-based distributed port groups or NSX-T segments) for each workload cluster. You can also use separate VM sizes for the clusters, different datastores (e.g., gold, silver, bronze), or just one network for each tenant. By overwriting predefined variables you are not breaking the TKG support model, just adapting the installation to your own needs.

If you need further customizations, you could add your own variables, pre/post scripts (e.g., using a separate DNS server per tenant), or Kubernetes YAML snippets, which are automatically deployed to your own cluster templates. But don’t do that right now, because doing so is very error-prone (especially when you upgrade clusters) and not supported. We will come back to this topic when upstream Cluster API and TKG improvements will simplify these activities. TKG 1.1 already has some more variables you can play with.

Before we create the management cluster, let’s change the cloning mechanism from fullClone to linkedClone in the templates.

$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-
vsphere/v0.6.3/cluster-template-dev.yaml
$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-vsphere/v0.6.3/cluster-template-prod.yaml

Showtime! Let’s finally create our management cluster:

$ export VSPHERE_NETWORK="VM-RegionA01-vDS-MGMT"
$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster"
$ tkg init -i vsphere -p dev --name tkg-mgmt-cluster

As you can see, all the pre-work we have done can easily be put into the automation or CI/CD tools of your choice. As to why a development cluster with only one control plane node is using a load balancer VM in front of it, it’s because in order to yield a seamless upgrade (e.g., from TKG 1.0 to TKG 1.1) a new controlplane node is added, the load balancer is reconfigured, and the old controlplane node is deleted. And for the load balancer VM we are relying on vSphere HA—in case you were wondering ;-))

Installing the TKG workload cluster

Creating two workload clusters for two different tenants is now really easy.

$ export VSPHERE_NETWORK="tkg-cluster-1-RegionA01-vDS-COMP"
$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1"
$ tkg create cluster tkg-cluster-1 -p dev -k v1.17.3+vmware.2 -c 1 -w 3
$ export VSPHERE_NETWORK="tkg-cluster-2-RegionA01-vDS-COMP"
$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-2/tkg-cluster-2"
$ tkg create cluster tkg-cluster-2 -p dev-antrea -k v1.17.3+vmware.2 -c 1 -w 3

Final thoughts

One of the TKG prerequisites is a DHCP server on your management/workload cluster networks. Make sure your DHCP options include a default route, DNS, and NTP server so the clusters can reach the internet (assuming you are not doing an air-gapped installation) and that they also have correct time settings. NTP often gets overlooked, and the public registries used will not deliver the container images needed for the installation. And last but not least, make sure you have Docker installed on your system, since TKG initially creates a Kubernetes kind cluster for a Cluster API-based installation of the TKG management cluster, transfers its configuration to it, and deletes the kind cluster afterwards (since it is not used anymore).

That’s all for today, but there will be more to come! There are some exciting TKG releases ahead of us! ☺

This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.

 

 

About the Author

Tom Schwaller (@tom_schwaller) is a Technical Product Line Manager at VMware (MAPBU) and has been involved with IT for more than two decades. Tom specializes in technologies like Kubernetes, Tanzu Kubernetes Grid, NSX-T, automation, and deep learning. He is a former NSX, VIO, and Cloud Native Systems Engineer at VMware, and a Linux/Python user since 1993! His latest passion is the Lean proof assistant (and functional programming language).

Follow on Twitter More Content by Tom Schwaller
Previous
A Deep Dive into the Kubernetes vSphere CSI Driver with TKGI and TKG
A Deep Dive into the Kubernetes vSphere CSI Driver with TKGI and TKG

The Kubernetes vSphere CSI driver is becoming increasingly prominent as it gradually replaces the original ...

Next
Getting Started with VMware Tanzu Build Service for Local Development
Getting Started with VMware Tanzu Build Service for Local Development

Insert VMware's Tanzu Build Service into your development cycle to move faster and more effectively.