VMware Tanzu Kubernetes Grid (TKG) is using the Kubernetes Cluster API to create Kubernetes clusters on vSphere 6.7U3, AWS, and in the future, other cloud environments as well. All you need (on vSphere) is the TKG CLI and two OVA templates — one for the Kubernetes control plane/worker nodes and one for the HAProxy, which is used to load balance the control plane nodes. If you want a fast start with TKG, use William Lam’s TKG Demo Appliance, which is available as a VMware Fling, and read Kendrick Coleman’s introduction to TKG.
In this post, we will only use command line tools to install a multitenant TKG 1.0 setup. We will also present a few tips and tricks, and discuss best practices for the installation on vSphere 6.7U3. In our next post, we will upgrade this setup using TKG 1.1, which was released a few days ago.
Automating the download of TKG binaries
The easiest way to do this in an automated way is to use the
vmw-cli Docker image, not the NPM-based installation method. Let’s pull the image first:
Next, add the following two environment variables in
your ~/.bash_profile and reload it (
source ~/.bash_profile). If you are using another shell (e.g.,
tcsh, etc.) on your Linux system, adapt this step accordingly. I am using an Ubuntu 18.04.4 LTS system as jump host, but any other Linux (or MacOS) system works equally well.
Don’t forget to use your own my.vmware.com username and password! The following command downloads the index files
mainIndex.json into our working directory
~/tkg, which we create first.
And finally, let’s download the TKG CLI and two OVAs.
Uploading OVA templates
To upload our OVAs, we will use
govc, a very powerful command-line tool, to work with the vSphere API. Let’s install and configure it by first using some environment variables.
If you are a user of the VMware Hand-on Labs this should look familiar to you, since I am using a very similar environment for this post. Let’s create a new VM and template folder,
This is the place for our OVAs as well as all the control plane/worker VMs. Since for many different clusters belonging to different tenants it is preferable to create a folder hierarchy, you will see later how to use them with the TKG CLI, but for this demo we will work with a single folder. And as you can see, we will create different resource pools for all our clusters in a multitenant setup.
The TKG management cluster will be deployed in a resource pool on the vSphere management cluster. Similarly, the tenant clusters will be deployed to separate cluster pools on the vSphere compute cluster. The cluster pools themselves are sub-pools of the corresponding tenant pool. As the image above makes clear, the infrastructure admin has full control over the resources consumed by each cluster and each tenant.
After creating the basic setup, it’s time to upload our OVAs. While
govc can create the JSON-based configuration file automatically, we’ll start by changing a few parameters with the
jq CLI tool, which we install first.
Here is the shell script to use for the OVA upload:
Keep in mind that we are using a shared ISCSI datastore across both vSphere clusters (
REGIONA-01-COMP01), but two different vSphere Distributed Switches —one for each cluster. The TKG CLI can remap the network when deploying the template, but not the storage. If you use separate datastores for the management and compute cluster, be sure to deploy the OVAs to both datastores!
As you can see in the shell script, we also create a snapshot before marking the OVA as a template. That way we can use linked clones for the deployment of the TKG clusters, which is highly recommended in nested environments (like the one we use here). VMware only supports full clones for production deployments as it is not possible to increase disk sizes with linked clones.
Installing the TKG management cluster
So far, we have downloaded, installed, and configured all needed tools, created the folder and resource pool structure for our multitenant setup, and uploaded the OVAs. Now it’s time to create the TKG management cluster. You can of course do this using the TKG web UI by executing the command
But since we want to automate everything, we have to create a configuration file first. Using the UI would be an easy way to create it, but it is not possible yet to save the configuration before starting the deployment (which is using
fullClone instead of
linkedClone by default). As a workaround for this chicken-and-egg problem, you can kick off the deployment and immediately stop it, or you can create the configuration by hand.
Let’s assume you created the following configuration file and saved it outside the
Then you can create the
~/.tkg folder and its content by executing
Now add this vSphere-specific configuration to the TKG configuration file:
~/.tkg folder you will find the
providers directory with the different development/production cluster templates for vSphere and AWS. The templates use variables set in the TKG configuration file, but they can be overwritten by environment variables. And this is exactly the approach we will use to put our clusters in different resource pools.
Notably, we only overwrite the resource pool variable (
VSPHERE_RESOURCE_POOL) and use one network (
VSPHERE_NETWORK) for the management cluster and a different one (e.g., VLAN-based distributed port groups or NSX-T segments) for each workload cluster. You can also use separate VM sizes for the clusters, different datastores (e.g., gold, silver, bronze), or just one network for each tenant. By overwriting predefined variables you are not breaking the TKG support model, just adapting the installation to your own needs.
If you need further customizations, you could add your own variables, pre/post scripts (e.g., using a separate DNS server per tenant), or Kubernetes YAML snippets, which are automatically deployed to your own cluster templates. But don’t do that right now, because doing so is very error-prone (especially when you upgrade clusters) and not supported. We will come back to this topic when upstream Cluster API and TKG improvements will simplify these activities. TKG 1.1 already has some more variables you can play with.
Before we create the management cluster, let’s change the cloning mechanism from
linkedClone in the templates.
Showtime! Let’s finally create our management cluster:
As you can see, all the pre-work we have done can easily be put into the automation or CI/CD tools of your choice. As to why a development cluster with only one control plane node is using a load balancer VM in front of it, it’s because in order to yield a seamless upgrade (e.g., from TKG 1.0 to TKG 1.1) a new controlplane node is added, the load balancer is reconfigured, and the old controlplane node is deleted. And for the load balancer VM we are relying on vSphere HA—in case you were wondering ;-))
Installing the TKG workload cluster
Creating two workload clusters for two different tenants is now really easy.
One of the TKG prerequisites is a DHCP server on your management/workload cluster networks. Make sure your DHCP options include a default route, DNS, and NTP server so the clusters can reach the internet (assuming you are not doing an air-gapped installation) and that they also have correct time settings. NTP often gets overlooked, and the public registries used will not deliver the container images needed for the installation. And last but not least, make sure you have Docker installed on your system, since TKG initially creates a Kubernetes kind cluster for a Cluster API-based installation of the TKG management cluster, transfers its configuration to it, and deletes the kind cluster afterwards (since it is not used anymore).
That’s all for today, but there will be more to come! There are some exciting TKG releases ahead of us! ☺
This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.
About the AuthorMore Content by Tom Schwaller