A Closer Look at VMware Tanzu Kubernetes Grid Multitenant Setup

May 28, 2020 Tom Schwaller

VMware Tanzu Kubernetes Grid (TKG) is using the Kubernetes Cluster API to create Kubernetes clusters on vSphere 6.7U3, AWS, and in the future, other cloud environments as well. All you need (on vSphere) is the TKG CLI and two OVA templates — one for the Kubernetes control plane/worker nodes and one for the HAProxy, which is used to load balance the control plane nodes. If you want a fast start with TKG, use William Lam’s TKG Demo Appliance, which is available as a VMware Fling, and read Kendrick Coleman’s introduction to TKG.

In this post, we will only use command line tools to install a multitenant TKG 1.0 setup. We will also present a few tips and tricks, and discuss best practices for the installation on vSphere 6.7U3. In our next post, we will upgrade this setup using TKG 1.1, which was released a few days ago.

Automating the download of TKG binaries

First, download the TKG binaries and OVAs available at my.vmware.com.

The easiest way to do this in an automated way is to use the vmw-cli tool. Since we have to use Docker for the TKG installation anyway, we use the vmw-cli Docker image, not the NPM-based installation method. Let’s pull the image first:

$ docker pull apnex/vmw-cli

Next, add the following two environment variables in your ~/.bash_profile and reload it (source ~/.bash_profile). If you are using another shell (e.g., zsh, tcsh, etc.) on your Linux system, adapt this step accordingly. I am using an Ubuntu 18.04.4 LTS system as jump host, but any other Linux (or MacOS) system works equally well.

$ cat >> ~/.bash_profile << EOF

export VMWUSER='<username>'

export VMWPASS='<password>'

EOF

$ source ~/.bash_profile

Don’t forget to use your own my.vmware.com username and password! The following command downloads the index files fileIndex.json and mainIndex.json into our working directory ~/tkg, which we create first.

$ mkdir ~/tkg

$ cd ~/tkg

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli index vmware-tanzu-kubernetes-grid

And finally, let’s download the TKG CLI and two OVAs.

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get tkg-linux-amd64-v1.0.0_vmware.1.gz

$ gunzip tkg-linux-amd64-v1.0.0_vmware.1.gz

$ sudo mv tkg-linux-amd64-v1.0.0_vmware.1.gz /usr/local/bin

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-v1.17.3_vmware.2.ova

$ docker run -t --rm -e VMWUSER=$VMWUSER -e VMWPASS=$VMWPASS -v ${PWD}:/files apnex/vmw-cli get photon-3-capv-haproxy-v0.6.3_vmware.1.ova

Uploading OVA templates

To upload our OVAs, we will use govc, a very powerful command-line tool, to work with the vSphere API. Let’s install and configure it by first using some environment variables.

$ URL=https://github.com/vmware/govmomi/releases/download/v0.22.1/govc_linux_amd64.gz

$ sudo curl -L $URL | gunzip > /usr/local/bin/govc

$ sudo chmod a+x /usr/local/bin/govc

$ cat >> ~/.bash_profile << EOF

export GOVC_URL='vcsa-01a.corp.local'

export GOVC_USERNAME='[email protected]'

export GOVC_PASSWORD='VMware1!'

export GOVC_DATACENTER='RegionA01'

export GOVC_NETWORK='VM-RegionA01-vDS-COMP'

export GOVC_DATASTORE='RegionA01-ISCSI02-COMP01'

export GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'

export GOVC_INSECURE=1

EOF

$ source ~/.bash_profile

$ govc ls

If you are a user of the VMware Hand-on Labs this should look familiar to you, since I am using a very similar environment for this post. Let’s create a new VM and template folder, tkg, first.

$ govc folder.create /RegionA01/vm/tkg

This is the place for our OVAs as well as all the control plane/worker VMs. Since for many different clusters belonging to different tenants it is preferable to create a folder hierarchy, you will see later how to use them with the TKG CLI, but for this demo we will work with a single folder. And as you can see, we will create different resource pools for all our clusters in a multitenant setup.

The TKG management cluster will be deployed in a resource pool on the vSphere management cluster. Similarly, the tenant clusters will be deployed to separate cluster pools on the vSphere compute cluster. The cluster pools themselves are sub-pools of the corresponding tenant pool. As the image above makes clear, the infrastructure admin has full control over the resources consumed by each cluster and each tenant.

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-2

$ govc pool.create /RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-2

$ govc pool.create /RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster

After creating the basic setup, it’s time to upload our OVAs. While govc can create the JSON-based configuration file automatically, we’ll start by changing a few parameters with the jq CLI tool, which we install first.

$ sudo apt install jq

Here is the shell script to use for the OVA upload:

#!/bin/bash

set -euo pipefail

NETWORK=VM-RegionA01-vDS-COMP

PHOTON=photon-3-v1.17.3_vmware.2

HAPROXY=photon-3-capv-haproxy-v0.6.3_vmware.1

GOVC_RESOURCE_POOL='/RegionA01/host/RegionA01-COMP01/Resources/tkg'

cd ~/tkg

govc import.spec ${HAPROXY}.ova | jq ".Name=\"$HAPROXY\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${HAPROXY}.json

govc import.ova -options=${HAPROXY}.json ${HAPROXY}.ova

govc snapshot.create -vm ${HAPROXY} root

govc vm.markastemplate ${HAPROXY}

govc object.mv /RegionA01/vm/${HAPROXY} /RegionA01/vm/tkg

govc import.spec ${PHOTON}.ova | jq ".Name=\"$PHOTON\"" | jq ".NetworkMapping[0].Network=\"$NETWORK\"" > ${PHOTON}.json

govc import.ova -options=${PHOTON}.json ${PHOTON}.ova

govc snapshot.create -vm ${PHOTON} root

govc vm.markastemplate ${PHOTON}

govc object.mv /RegionA01/vm/${PHOTON} /RegionA01/vm/tkg

Keep in mind that we are using a shared ISCSI datastore across both vSphere clusters (REGIONA01-MGMT-01 and REGIONA-01-COMP01), but two different vSphere Distributed Switches —one for each cluster. The TKG CLI can remap the network when deploying the template, but not the storage. If you use separate datastores for the management and compute cluster, be sure to deploy the OVAs to both datastores!

As you can see in the shell script, we also create a snapshot before marking the OVA as a template. That way we can use linked clones for the deployment of the TKG clusters, which is highly recommended in nested environments (like the one we use here). VMware only supports full clones for production deployments as it is not possible to increase disk sizes with linked clones.

Installing the TKG management cluster

So far, we have downloaded, installed, and configured all needed tools, created the folder and resource pool structure for our multitenant setup, and uploaded the OVAs. Now it’s time to create the TKG management cluster. You can of course do this using the TKG web UI by executing the command

$ tkg init --ui

But since we want to automate everything, we have to create a configuration file first. Using the UI would be an easy way to create it, but it is not possible yet to save the configuration before starting the deployment (which is using fullClone instead of linkedClone by default). As a workaround for this chicken-and-egg problem, you can kick off the deployment and immediately stop it, or you can create the configuration by hand.

Let’s assume you created the following configuration file and saved it outside the ~/.tkg directory:

$ cat tkg/vsphere.yaml

VSPHERE_SERVER: vcsa-01a.corp.local

VSPHERE_USERNAME: [email protected]

VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>

VSPHERE_DATACENTER: /RegionA01

VSPHERE_NETWORK: "VM-RegionA01-vDS-COMP"

VSPHERE_DATASTORE: /RegionA01/datastore/RegionA01-ISCSI02-COMP01

VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-COMP01/Resources/tkg

VSPHERE_TEMPLATE: /RegionA01/vm/tkg/photon-3-v1.17.3+vmware.2

VSPHERE_HAPROXY_TEMPLATE: /RegionA01/vm/tkg/photon-3-capv-haproxy-v0.6.3+vmware.1

VSPHERE_NUM_CPUS: "2"

VSPHERE_MEM_MIB: "4096"

VSPHERE_DISK_GIB: "40"

VSPHERE_FOLDER: /RegionA01/vm/tkg

VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa ... ubuntu@cli-vm

SERVICE_CIDR: 100.64.0.0/13

CLUSTER_CIDR: 100.96.0.0/11

Then you can create the ~/.tkg folder and its content by executing

$ tkg get management-cluster

Now add this vSphere-specific configuration to the TKG configuration file:

$ cat tkg/vsphere.yaml >> ~/.tkg/config.yaml

Inside the ~/.tkg folder you will find the providers directory with the different development/production cluster templates for vSphere and AWS. The templates use variables set in the TKG configuration file, but they can be overwritten by environment variables. And this is exactly the approach we will use to put our clusters in different resource pools.

Notably, we only overwrite the resource pool variable (VSPHERE_RESOURCE_POOL) and use one network (VSPHERE_NETWORK) for the management cluster and a different one (e.g., VLAN-based distributed port groups or NSX-T segments) for each workload cluster. You can also use separate VM sizes for the clusters, different datastores (e.g., gold, silver, bronze), or just one network for each tenant. By overwriting predefined variables you are not breaking the TKG support model, just adapting the installation to your own needs.

If you need further customizations, you could add your own variables, pre/post scripts (e.g., using a separate DNS server per tenant), or Kubernetes YAML snippets, which are automatically deployed to your own cluster templates. But don’t do that right now, because doing so is very error-prone (especially when you upgrade clusters) and not supported. We will come back to this topic when upstream Cluster API and TKG improvements will simplify these activities. TKG 1.1 already has some more variables you can play with.

Before we create the management cluster, let’s change the cloning mechanism from fullClone to linkedClone in the templates.

$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-

vsphere/v0.6.3/cluster-template-dev.yaml

$ sed -i 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-vsphere/v0.6.3/cluster-template-prod.yaml

Showtime! Let’s finally create our management cluster:

$ export VSPHERE_NETWORK="VM-RegionA01-vDS-MGMT"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-MGMT01/Resources/tkg-mgmt-cluster"

$ tkg init -i vsphere -p dev --name tkg-mgmt-cluster

As you can see, all the pre-work we have done can easily be put into the automation or CI/CD tools of your choice. As to why a development cluster with only one control plane node is using a load balancer VM in front of it, it’s because in order to yield a seamless upgrade (e.g., from TKG 1.0 to TKG 1.1) a new controlplane node is added, the load balancer is reconfigured, and the old controlplane node is deleted. And for the load balancer VM we are relying on vSphere HA—in case you were wondering ;-))

Installing the TKG workload cluster

Creating two workload clusters for two different tenants is now really easy.

$ export VSPHERE_NETWORK="tkg-cluster-1-RegionA01-vDS-COMP"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-1/tkg-cluster-1"

$ tkg create cluster tkg-cluster-1 -p dev -k v1.17.3+vmware.2 -c 1 -w 3

$ export VSPHERE_NETWORK="tkg-cluster-2-RegionA01-vDS-COMP"

$ export VSPHERE_RESOURCE_POOL="/RegionA01/host/RegionA01-COMP01/Resources/tenant-2/tkg-cluster-2"

$ tkg create cluster tkg-cluster-2 -p dev-antrea -k v1.17.3+vmware.2 -c 1 -w 3

Final thoughts

One of the TKG prerequisites is a DHCP server on your management/workload cluster networks. Make sure your DHCP options include a default route, DNS, and NTP server so the clusters can reach the internet (assuming you are not doing an air-gapped installation) and that they also have correct time settings. NTP often gets overlooked, and the public registries used will not deliver the container images needed for the installation. And last but not least, make sure you have Docker installed on your system, since TKG initially creates a Kubernetes kind cluster for a Cluster API-based installation of the TKG management cluster, transfers its configuration to it, and deletes the kind cluster afterwards (since it is not used anymore).

That’s all for today, but there will be more to come! There are some exciting TKG releases ahead of us! ☺

This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.

About the Author

Tom Schwaller (@tom_schwaller) is a Technical Product Line Manager at VMware (MAPBU) and has been involved with IT for more than two decades. Tom specializes in technologies like Kubernetes, Tanzu Kubernetes Grid, NSX-T, automation, and deep learning. He is a former NSX, VIO, and Cloud Native Systems Engineer at VMware, and a Linux/Python user since 1993! His latest passion is the Lean proof assistant (and functional programming language).
Follow on Twitter More Content by Tom Schwaller

Scaling Leadership of an Agile Software Factory: Lessons Learned

Leadership is an essential component of a successful software development organization. Here are three clea...

Getting Started with VMware Tanzu Build Service for Local Development

Insert VMware's Tanzu Build Service into your development cycle to move faster and more effectively.

A Closer Look at VMware Tanzu Kubernetes Grid Multitenant Setup

Automating the download of TKG binaries

Uploading OVA templates

Installing the TKG management cluster

Installing the TKG workload cluster

Final thoughts

About the Author

Previous

Next

Related content in this Stream

To reduce your mean time to recovery (MTTR) from an outage, you have to take care of the hard stuff first.

As Spring Framework 5.3 support concludes, upgrading to Spring Framework 6 is vital for security and performance. Leverage tools and community resources to navigate this transition seamlessly.

Explore platform engineering with Tanzu for Cloud Foundry: build community, enhance product management, and scale developer support to maximize platform value and innovation.

Discover enhanced VMware Tanzu Knowledge Graph features: explore open source catalogs, assess package vulnerabilities, ensure compliance, and streamline security with new insights and tools.

Explore the Golden Commands—build, bind, deploy, scale—crucial for production paths on Tanzu's Cloud Foundry & Kubernetes. 'Build' is essential for secure, repeatable code production.

Discover how Tanzu Application Catalog empowers secure OSS use with custom container and Helm chart catalogs, offering enhanced vulnerability management and streamlined software transparency.

Dive into the complexities of securing cloud native environments. Explore custom stack challenges, integrated security's role, and insights from the latest survey on cloud native platforms.

Explore how internal hackathons boost innovation, tackle technical debt, and elevate team morale and learning, driving better business outcomes in enterprise settings.

At VMware Explore the group announced enhancements to data solutions and improved Kubernetes developer experience

Explore VMware Explore Vegas for the latest product announcements! Tanzu Platform 10 brings new features to Cloud Foundry, building on Tanzu Application Service 6.0, available October 2024.

Discover VMware Tanzu's latest blog on accelerating app delivery and enhancing data solutions with new features in Tanzu Data Solutions, driving efficiency, security, and scalability.

VMware Tanzu Platform seamlessly connects Kubernetes adoption with user experience, unifying infrastructure through centralized tools and cloud-native standards in one comprehensive solution.

Leverage Tanzu Spring's latest innovations for efficient, secure, compliant app dev. enhancements include Spring Application Advisor, Spring Boot Governance Starter, & Spring AI Seamless Integration.

Explore VMware Tanzu AI Solutions' new features for GenAI, tackling AI model management, efficiency, and governance, while boosting intelligent app delivery and observability in Java environments.

Boost Java power with Spring Boot 3.3's Class Data Sharing (CDS)! Enjoy faster startups, lower memory use, and smoother activation with DevXP. Optimize JVM for the digital era!

Explore how VMware Tanzu accelerates app delivery, offering cost savings, flexibility, and enhanced security with a unified platform. Discover the benefits of seamless integration and innovation.

Stay ahead with SpringOne. Learn from our core committers, Spring Engineering team leads, and industry experts from Microsoft, Netflix, and more. Enhance skills, network, and advance your career.

Dive into the insights of the State of Cloud Native Application Platforms 2024 report, where feedback from Tanzu Vanguard members illuminates industry trends.

Discover how the new CloudHealth user experience is aligning with the modern FinOps practitioner to help them throughout the Operate Phase of their FinOps journey.

At Explore 2024, the Tanzu Division of Broadcom will unveil the latest innovations in cloud native computing technologies and patterns that are fueling our customers’ business.