A Closer Look at VMware Tanzu Kubernetes Grid Multitenant Setup

May 28, 2020 Tom Schwaller

VMware Tanzu Kubernetes Grid (TKG) is using the Kubernetes Cluster API to create Kubernetes clusters on vSphere 6.7U3, AWS, and in the future, other cloud environments as well. All you need (on vSphere) is the TKG CLI and two OVA templates — one for the Kubernetes control plane/worker nodes and one for the HAProxy, which is used to load balance the control plane nodes. If you want a fast start with TKG, use William Lam’s TKG Demo Appliance, which is available as a VMware Fling, and read Kendrick Coleman’s introduction to TKG.

In this post, we will only use command line tools to install a multitenant TKG 1.0 setup. We will also present a few tips and tricks, and discuss best practices for the installation on vSphere 6.7U3. In our next post, we will upgrade this setup using TKG 1.1, which was released a few days ago.

Automating the download of TKG binaries

First, download the TKG binaries and OVAs available at my.vmware.com.

The easiest way to do this in an automated way is to use the vmw-cli tool. Since we have to use Docker for the TKG installation anyway, we use the vmw-cli Docker image, not the NPM-based installation method. Let’s pull the image first:

$ docker pull apnex/vmw-cli

Next, add the following two environment variables in your ~/.bash_profile and reload it (source ~/.bash_profile). If you are using another shell (e.g., zsh, tcsh, etc.) on your Linux system, adapt this step accordingly. I am using an Ubuntu 18.04.4 LTS system as jump host, but any other Linux (or MacOS) system works equally well.

$ cat >> ~/.bash_profile << EOF

export VMWUSER='<username>'

export VMWPASS='<password>'

EOF

$ source ~/.bash_profile

Don’t forget to use your own my.vmware.com username and password! The following command downloads the index files fileIndex.json and mainIndex.json into our working directory ~/tkg, which we create first.

$ mkdir ~/tkg

$ cd ~/tkg

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli index vmware-tanzu-kubernetes-grid

And finally, let’s download the TKG CLI and two OVAs.

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get tkg-linux-amd64-v1.0.0_vmware.1.gz

$ gunzip tkg-linux-amd64-v1.0.0_vmware.1.gz

$ sudo mv tkg-linux-amd64-v1.0.0_vmware.1.gz /usr/local/bin

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-v1.17.3_vmware.2.ova

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-capv-haproxy-v0.6.3_vmware.1.ova

Uploading OVA templates

To upload our OVAs, we will use govc, a very powerful command-line tool, to work with the vSphere API. Let’s install and configure it by first using some environment variables.

$ URL=https://github.com/vmware/govmomi/releases/download/v0.22.1/govc_linux_amd64.gz

$ sudo curl -L $URL | gunzip > /usr/local/bin/govc

$ sudo chmod a+x /usr/local/bin/govc

$ cat >> ~/.bash_profile << EOF

export GOVC_URL='vcsa-01a.corp.local'

export GOVC_USERNAME='administrator@corp.local'

export GOVC_PASSWORD='VMware1!'

export GOVC_DATACENTER='RegionA01'

export GOVC_NETWORK='VM-RegionA01-vDS-COMP'

export GOVC_DATASTORE='RegionA01-ISCSI02-COMP01'

export GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'

export GOVC_INSECURE=1

EOF

$ source ~/.bash_profile

$ govc ls

If you are a user of the VMware Hand-on Labs this should look familiar to you, since I am using a very similar environment for this post. Let’s create a new VM and template folder, tkg, first.

$ govc folder.create /RegionA01/vm/tkg

This is the place for our OVAs as well as all the control plane/worker VMs. Since for many different clusters belonging to different tenants it is preferable to create a folder hierarchy, you will see later how to use them with the TKG CLI, but for this demo we will work with a single folder. And as you can see, we will create different resource pools for all our clusters in a multitenant setup.

The TKG management cluster will be deployed in a resource pool on the vSphere management cluster. Similarly, the tenant clusters will be deployed to separate cluster pools on the vSphere compute cluster. The cluster pools themselves are sub-pools of the corresponding tenant pool. As the image above makes clear, the infrastructure admin has full control over the resources consumed by each cluster and each tenant.

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-2

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-2

$ govc pool.create /RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster

After creating the basic setup, it’s time to upload our OVAs. While govc can create the JSON-based configuration file automatically, we’ll start by changing a few parameters with the jq CLI tool, which we install first.

$ sudo apt install jq

Here is the shell script to use for the OVA upload:

#!/bin/bash

set -euo pipefail

NETWORK=VM-RegionA01-vDS-COMP

PHOTON=photon-3-v1.17.3_vmware.2

HAPROXY=photon-3-capv-haproxy-v0.6.3_vmware.1

GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'

cd ~/tkg

govc import.spec ${HAPROXY}.ova | jq ".Name=\"$HAPROXY\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${HAPROXY}.json

govc import.ova -options=${HAPROXY}.json ${HAPROXY}.ova

govc snapshot.create -vm ${HAPROXY} root

govc vm.markastemplate ${HAPROXY}

govc object.mv /RegionA01/vm/${HAPROXY} /RegionA01/vm/tkg

govc import.spec ${PHOTON}.ova | jq ".Name=\"$PHOTON\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${PHOTON}.json

govc import.ova -options=${PHOTON}.json ${PHOTON}.ova

govc snapshot.create -vm ${PHOTON} root

govc vm.markastemplate ${PHOTON}

govc object.mv /RegionA01/vm/${PHOTON} /RegionA01/vm/tkg

Keep in mind that we are using a shared ISCSI datastore across both vSphere clusters (REGIONA01-MGMT-01 and REGIONA-01-COMP01), but two different vSphere Distributed Switches —one for each cluster. The TKG CLI can remap the network when deploying the template, but not the storage. If you use separate datastores for the management and compute cluster, be sure to deploy the OVAs to both datastores!

As you can see in the shell script, we also create a snapshot before marking the OVA as a template. That way we can use linked clones for the deployment of the TKG clusters, which is highly recommended in nested environments (like the one we use here). VMware only supports full clones for production deployments as it is not possible to increase disk sizes with linked clones.

Installing the TKG management cluster

So far, we have downloaded, installed, and configured all needed tools, created the folder and resource pool structure for our multitenant setup, and uploaded the OVAs. Now it’s time to create the TKG management cluster. You can of course do this using the TKG web UI by executing the command

$ tkg init --ui

But since we want to automate everything, we have to create a configuration file first. Using the UI would be an easy way to create it, but it is not possible yet to save the configuration before starting the deployment (which is using fullClone instead of linkedClone by default). As a workaround for this chicken-and-egg problem, you can kick off the deployment and immediately stop it, or you can create the configuration by hand.

Let’s assume you created the following configuration file and saved it outside the ~/.tkg directory:

$ cat tkg/vsphere.yaml

VSPHERE_SERVER: vcsa-01a.corp.local

VSPHERE_USERNAME: administrator@corp.local

VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>

VSPHERE_DATACENTER: /RegionA01

VSPHERE_NETWORK: "VM-RegionA01-vDS-COMP"

VSPHERE_DATASTORE: /RegionA01/datastore/RegionA01-ISCSI02-COMP01

VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-COMP01/Resources/tkg

VSPHERE_TEMPLATE: /RegionA01/vm/tkg/photon-3-v1.17.3+vmware.2

VSPHERE_HAPROXY_TEMPLATE: /RegionA01/vm/tkg/photon-3-capv-haproxy-v0.6.3+vmware.1

VSPHERE_NUM_CPUS: "2"

VSPHERE_MEM_MIB: "4096"

VSPHERE_DISK_GIB: "40"

VSPHERE_FOLDER: /RegionA01/vm/tkg

VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa ... ubuntu@cli-vm

SERVICE_CIDR: 100.64.0.0/13

CLUSTER_CIDR: 100.96.0.0/11

Then you can create the ~/.tkg folder and its content by executing

$ tkg get management-cluster

Now add this vSphere-specific configuration to the TKG configuration file:

$ cat tkg/vsphere.yaml >> ~/.tkg/config.yaml

Inside the ~/.tkg folder you will find the providers directory with the different development/production cluster templates for vSphere and AWS. The templates use variables set in the TKG configuration file, but they can be overwritten by environment variables. And this is exactly the approach we will use to put our clusters in different resource pools.

Notably, we only overwrite the resource pool variable (VSPHERE_RESOURCE_POOL) and use one network (VSPHERE_NETWORK) for the management cluster and a different one (e.g., VLAN-based distributed port groups or NSX-T segments) for each workload cluster. You can also use separate VM sizes for the clusters, different datastores (e.g., gold, silver, bronze), or just one network for each tenant. By overwriting predefined variables you are not breaking the TKG support model, just adapting the installation to your own needs.

If you need further customizations, you could add your own variables, pre/post scripts (e.g., using a separate DNS server per tenant), or Kubernetes YAML snippets, which are automatically deployed to your own cluster templates. But don’t do that right now, because doing so is very error-prone (especially when you upgrade clusters) and not supported. We will come back to this topic when upstream Cluster API and TKG improvements will simplify these activities. TKG 1.1 already has some more variables you can play with.

Before we create the management cluster, let’s change the cloning mechanism from fullClone to linkedClone in the templates.

$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-

vsphere/v0.6.3/cluster-template-dev.yaml

$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-vsphere/v0.6.3/cluster-template-prod.yaml

Showtime! Let’s finally create our management cluster:

$ export VSPHERE_NETWORK="VM-RegionA01-vDS-MGMT"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster"

$ tkg init -i vsphere -p dev --name tkg-mgmt-cluster

As you can see, all the pre-work we have done can easily be put into the automation or CI/CD tools of your choice. As to why a development cluster with only one control plane node is using a load balancer VM in front of it, it’s because in order to yield a seamless upgrade (e.g., from TKG 1.0 to TKG 1.1) a new controlplane node is added, the load balancer is reconfigured, and the old controlplane node is deleted. And for the load balancer VM we are relying on vSphere HA—in case you were wondering ;-))

Installing the TKG workload cluster

Creating two workload clusters for two different tenants is now really easy.

$ export VSPHERE_NETWORK="tkg-cluster-1-RegionA01-vDS-COMP"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1"

$ tkg create cluster tkg-cluster-1 -p dev -k v1.17.3+vmware.2 -c 1 -w 3

$ export VSPHERE_NETWORK="tkg-cluster-2-RegionA01-vDS-COMP"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-2/tkg-cluster-2"

$ tkg create cluster tkg-cluster-2 -p dev-antrea -k v1.17.3+vmware.2 -c 1 -w 3

Final thoughts

One of the TKG prerequisites is a DHCP server on your management/workload cluster networks. Make sure your DHCP options include a default route, DNS, and NTP server so the clusters can reach the internet (assuming you are not doing an air-gapped installation) and that they also have correct time settings. NTP often gets overlooked, and the public registries used will not deliver the container images needed for the installation. And last but not least, make sure you have Docker installed on your system, since TKG initially creates a Kubernetes kind cluster for a Cluster API-based installation of the TKG management cluster, transfers its configuration to it, and deletes the kind cluster afterwards (since it is not used anymore).

That’s all for today, but there will be more to come! There are some exciting TKG releases ahead of us! ☺

This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.

About the Author

Tom Schwaller (@tom_schwaller) is a Technical Product Line Manager at VMware (MAPBU) and has been involved with IT for more than two decades. Tom specializes in technologies like Kubernetes, Tanzu Kubernetes Grid, NSX-T, automation, and deep learning. He is a former NSX, VIO, and Cloud Native Systems Engineer at VMware, and a Linux/Python user since 1993! His latest passion is the Lean proof assistant (and functional programming language).
Follow on Twitter More Content by Tom Schwaller

A Deep Dive into the Kubernetes vSphere CSI Driver with TKGI and TKG

The Kubernetes vSphere CSI driver is becoming increasingly prominent as it gradually replaces the original ...

Getting Started with VMware Tanzu Build Service for Local Development

Insert VMware's Tanzu Build Service into your development cycle to move faster and more effectively.

A Closer Look at VMware Tanzu Kubernetes Grid Multitenant Setup

Automating the download of TKG binaries

Uploading OVA templates

Installing the TKG management cluster

Installing the TKG workload cluster

Final thoughts

About the Author

Previous

Next

Related content in this Stream

Our Hands-On Labs offer you practical, real-world experience with VMware Tanzu products, Kubernetes, and modern application development.

This is an episodic video series that will explore Spring architectural patterns for cloud applications and how Tanzu enhances the experience with an optimized and integrated deployment platform.

These freely available Miro templates designed by VMware Tanzu Labs can help facilitate better collaboration among teams and run projects more effectively.

See how to get started with OpenTelemetry and Aria Operations for Applications in three simple steps—without manually instrumenting your Java application!

How to install VMware Tanzu Application Platform with transport layer security (TLS) and Microsoft Windows Azure Active Directory (Azure AD).

Hoping to sharpen your team’s expertise in modern apps? Our newest learning programs focus on advancing the specific skills your DevOps team needs: one designed for developers and one for operators.

Learn how API key management works with API portal for VMware Tanzu and Spring Cloud Gateway for Kubernetes.

How global namespaces and zero-trust policies with Tanzu Service Mesh can improve application security, resiliency, and multi-cloud operations.

This easy-to-follow guide shows how to get started with Tanzu Mission Control to provision Tanzu Kubernetes clusters and begin setting up organizational access policies.

This brief walk-through shows how to create a Tanzu Kubernetes cluster with added storage volumes to the control plane nodes, as well as the worker nodes, in less than 10 minutes.

Learn how to get started using Tanzu Mission Control to deploy Tanzu Kubernetes clusters on vSphere, and how to set up consistent policy enforcement.

This guide shows how to deploy the ExternalDNS plug-in via Tanzu Mission Control Catalog for use with AWS Route 53.

Using Let’s Encrypt and cert-manager with Tanzu Community Edition makes securing web applications a snap. Here’s how to do it.

New to Tanzu Kubernetes Grid Integrated Edition 1.12.0 is the ability to seamlessly migrate VCP volumes to the vSphere CSI driver. This post details how to do it and what happens behind the scenes.

Ready to explore Tanzu Community Edition? Follow this step-by-step guide to install and configure it in minutes

Want to learn about cloud native and Kubernetes technology but not sure where to start? Learning Paths on KubeAcademy are a straightforward way to learn what you need to and skip what you don’t.

In addition to VMware Tanzu Observability supporting various instrumentation and ingestion methods for distributed tracing, it now natively supports OpenTelemetry.

Spring Cloud Gateway for Kubernetes now supports the loading of your own extensions for customizations, and capturing metrics and trace data into your observability tools is easier than ever before.

How to customize, deploy, and manage open source software at scale in a secure, reliable, and consistent way.

Check out the many great sessions that have been accepted, and start adding them to your favorites on VMworld.com today!