Fasten Your Seatbelts with Tanzu Kubernetes Grid 1.3

April 1, 2021 Tom Schwaller

In this post, I am going to take Tanzu Kubernetes Grid, VMware’s Cluster-API based Kubernetes solution for on-prem and multi-cloud environments, for a test drive. 

Tanzu Kubernetes Grid has come a long way since its first release in April 2020. The latest version, 1.3, adds a lot of new features and improvements, many of which I will discuss in detail.

Release dates and feature list

With version 1.3, Tanzu Kubernetes Grid is on its sixth iteration.

Take a look at this overwhelming list of new Tanzu Kubernetes Grid 1.3 features first (I will be focusing on the topics in bold).

  • Updated to Kubernetes 1.20.4
  • Ubuntu 20.04 node images distributed for all supported infrastructure providers
  • Tanzu CLI replaces Tanzu Kubernetes Grid CLI
  • Ability to update vCenter credentials post-creation
  • K8s Service of type LoadBalancer with NSX Advanced Load-Balancer Essentials for vSphere
  • Pinniped/Dex for OIDC/LDAPS integration
  • image-builder and dependencies packaged as a container
    • Create your own Photon OS, Ubuntu 20.4, Amazon Linux 2, RHEL7 images
  • ​Automatic installation of core add-ons with new add-ons management
    • Includes Antrea, Dex, Pinniped, (vSphere) CPI, (vSphere) CSI, metrics-server
    • New clusters support automatic upgrades for core add-ons
  • HTTP/S proxy support
  • Disaster recovery of workload clusters using Velero
  • Register management cluster in Tanzu Mission Control
  • Audit logging enabled
    • Kubernetes API server audit logs
    • Virtual machine (VM)-level audit logs (via auditd)
    • Audit logs can be forwarded via the Fluent Bit log forwarder extension
  • Fluent Bit Syslog output plugin (enables forwarding logs to vRLI on-prem)
  • Metrics Server installed by default on management and workload clusters
    • Enables kubectl top nodes
    • Enables kubectl top pods
  • New CRD TanzuKubernetesRelease
  • external-dns as an in-cluster extension

More details and a quick product snapshot can be found in the Release Notes.

Tanzu CLI

A major change in this release is the introduction of the Tanzu command-line interface (CLI), which will unify operations across solutions such as Tanzu Kubernetes Grid and Tanzu Mission Control. Unpacking the Tanzu Kubernetes Grid tar file creates the cli directory, which includes the Tanzu CLI binary, Tanzu Kubernetes Grid plugins, and other executables (i.e., kapp, imgpkg, kbld, ytt). 

Let’s start by installing the Tanzu CLI and plugins on a Ubuntu 20.04 Linux system using the following commands:

$ tar xvf tanzu-cli-bundle-linux-amd64.tar
$ sudo mv cli/core/v1.3.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu
$ tanzu plugin install --local cli all
$ ls -al  ~/.local/share/tanzu-cli/
$ source <(tanzu completion bash)
$ tanzu plugin list

It is highly recommended to add the Tanzu completion command to your ~/.bashrc or ~/.bash_profile file for a better user experience. The new CLI command tanzu is using a NOUN VERB syntax and is organized around the plugins, but otherwise it is very similar to the old Tanzu Kubernetes Grid CLI. The documentation includes a very exhaustive command reference and comparison table.

As a first step, we will create a Tanzu Kubernetes Grid management cluster. The following sample command will create the cluster on a Linux jump box (with IP address and expose the configuration UI on port 9080:

$ tanzu management-cluster create --ui --bind --browser none

The configuration workflow is the same as with older Tanzu Kubernetes Grid versions, you just get some additional configuration options. And in cases where you’ve already configured a management cluster, you can now restore the cached data.

NSX Advanced Load Balancer integration

The NSX Advanced Load Balancer (ALB), VMware’s enterprise-grade, software-defined load balancing and web application firewall (WAF) solution, can be used for L4 load balancing in Tanzu Kubernetes Grid workload clusters (i.e., you can create Kubernetes services of type LoadBalancer). To enable this integration, you have to configure the controller IP/DNS name, username, and password first. Once the connection is verified, you can choose the vSphere Cloud you configured in the controller and a Service Engine (SE) group in that cloud. Next, enter the VIP network name and CIDR you configured in the Infrastructure -> Networks section of the NSX ALB. 

Since the default self-signed certificate of the controller is not using a Subject Alternative Name (SAN), you have to replace it before you can add it in the Tanzu Kubernetes Grid configuration UI. This can easily be done with the NSX ALB web interface. Go to Templates -> Security -> SSL/TLS Certificates and choose Create Controller Certificate. After you create the new certificate, click the download icon, copy the certificate from the export dialog, and paste it in the Controller Certificate Authority box of the Tanzu Kubernetes Grid GUI.

Last but not least, you have to tell the NSX ALB to use this new certificate. Go to Administration -> Settings -> Access Settings, edit System Access Settings, delete all configured (default) SSL/TLS certificates, and add the custom certificate you created above. Log in to the controller again and check that the new certificate is active.

There is one last piece of information missing in the NSX ALB configuration: the cluster labels. If you do not add any, all Tanzu Kubernetes Grid workload clusters will use the NSX ALB. If you add labels, only clusters labelled correctly, for example with

$ kubectl label cluster tkg-cluster-1 team=tkg-networking

will use it. The Avi Kubernetes Operator (AKO) on the workload clusters

$ kubectl -n avi-system get pods
ako-0   1/1     Running   0          3d18h

translates Kubernetes resources (e.g., services of type LoadBalancer) into virtual services on the NSX ALB via its REST API. It is installed and configured by the AKO Operator (a meta-controller).

$ kubectl config use-context avi-mgmt-admin@avi-mgmt

$ kubectl -n tkg-system-networking get deployments
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
ako-operator-controller-manager   1/1     1            1           3d20h

The AKO Operator

  • runs on the management cluster

  • orchestrates the installation of AKO in a selective group of workload clusters

  • manages user credentials for AKO

  • instructs AKO to clean up resources when a workload cluster is being deleted

The custom resource definition (CRD) reconciled by the AKO operator, AKODeploymentConfig, has a label selector defined, which is used to match workload clusters. When a workload cluster has labels defined that can be matched by an AKODeploymentConfig, it will have the NSX ALB enabled. And last but not least, when a workload cluster's labels match multiple AKODeploymentConfig, only the first one takes effect.

Tanzu Kubernetes Grid configures one AKODeploymentConfig, called install-ako-for-all, so you can only use one SE group of the NSX ALB by default. And with a Tanzu Advanced or NSX ALB enterprise license, you can go one step further and create multiple instances of AKODeploymentConfig using other SE groups and labels. This allows you to connect different workload clusters to specific SE groups. You can even enable L7 Ingress, but that’s a different story.

The following commands show the default AKODeploymentConfig configuration without any label settings.

$ kubectl get akodeploymentconfig
NAME                  AGE
install-ako-for-all   3d23h

$ kubectl get akodeploymentconfig install-ako-for-all -o yaml | tail -n 21
    name: avi-controller-credentials
    namespace: tkg-system-networking
    name: avi-controller-ca
    namespace: tkg-system-networking
  cloudName: Default-Cloud
  controller: avi-controller.corp.local
    name: VM-RegionA01-vDS-COMP
      pullPolicy: IfNotPresent
      version: v1.3.2_vmware.1
      defaultIngressController: false
      disableIngressClass: true
  serviceEngineGroup: Default-Group
$ kubectl get akodeploymentconfig -o json | jq '.items[].spec.clusterSelector'

With labels, the last output looks different:

$ kubectl get akodeploymentconfig -o json | jq '.items[].spec.clusterSelector'
  "matchLabels": {
    "team": "tkg-networking"

Let’s check if we can create a service of type LoadBalancer using the following YAML file:

$ cat tanzu/lb.yaml
apiVersion: v1
kind: Service
  name: lb-svc
    app: lb-svc
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer
apiVersion: apps/v1 
kind: Deployment 
   name: lb-svc 
spec:   replicas: 2
      app: lb-svc
        app: lb-svc
      serviceAccountName: default
        - name: nginx

$ kubectl get pods 
NAME                      READY   STATUS    RESTARTS   AGE 
lb-svc-64f9549557-446zx   1/1     Running   0          10s 
lb-svc-64f9549557-fnfpr   1/1     Running   0          10s   

$ kubectl get svc 
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)      AGE 
kubernetes   ClusterIP                443/TCP      1d1h 
lb-svc       LoadBalancer   80:32100/TCP   15s

In the NSX ALB UI, we can see the new virtual service, as expected.

More networking features

Other important networking improvements of Tanzu Kubernetes Grid 1.3, which I noted in the introduction, are:

  • HTTP/S proxy configuration

  • Experimental support for routed pods (NO-NAT)

  • external-dns as an in-cluster Tanzu Kubernetes Grid extension

The configuration of the HTTP/S proxy settings in the UI is straightforward, just be careful with the NO PROXY entries.

NSX-T is a mandatory requirement for the implementation of the experimental feature of routed (NO_NAT) pods. All you have to do is adding a few variables to the workload cluster definition:

SERVICE_DOMAIN: "corp.tanzu"
NSXT_PASSWORD: "VMware1!VMware1!"
NSXT_ROUTER_PATH: "/infra/tier-1s/t1-tkg"

IMPORTANT: With Tanzu Kubernetes Grid 1.3 you should define each cluster in a separate YAML file and create it with a command like:

$ tanzu cluster create tkg-cluster-1 -f tkg-cluster-1.yaml -v 6

For a successful routed pod implementation, it is mandatory to advertise All Static Routes on the T1 gateway

and also on the T0 gateway, if you are using, for example, BGP (which is the case in my setup).

Why is that? In the configuration above, we used the CLUSTER_CIDR (this is the routable pod network); each cluster VM gets a /24 chunk of it (e.g., The host IP of each VM is configured as next hop for the network range it owns, as you can see in the Static Routes configuration of the T1 gateway. In addition to that, Antrea is configured in the NoEncap mode when you enable routed pods (i.e., it assumes the node network can handle routing of pod traffic across nodes).

Let’s check if our configuration was deployed successfully.

$ kubectl -n kube-system get pod metrics-server-7c6765f9df-f8jcg -o json | jq '.status.podIP'

$ ping -c 3
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=60 time=2.47 ms
64 bytes from icmp_seq=2 ttl=60 time=2.17 ms
64 bytes from icmp_seq=3 ttl=60 time=1.64 ms

--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 1.637/2.092/2.470/0.344 ms

To retrieve and view all of the available NSX-T configuration options, you can use the following command:

$ grep NSXT ~/.tanzu/tkg/providers/config_default.yaml
NSXT_SECRET_NAME: "cloud-provider-vsphere-nsxt-credentials"

Authentication & authorization

The next major improvement is the OIDC/LDAPS configuration in the UI, which is using Pinniped/Dex as the back end. Similar to AVI, the deployment of the whole setup is managed by components running on the management cluster, so you don’t have to worry about configuring it each time for the workload clusters. The following output shows a simple Active Directory configuration for a Tanzu CLI-based deployment.

$ grep LDAP tkg-mgmt-cluster.yaml
LDAP_BIND_DN: cn=Administrator,cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_HOST: controlcenter.corp.tanzu:636
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_USER_SEARCH_FILTER: (objectClass=person)

If you want to know all configuration options, search for LDAP or OIDC in the Tanzu config_default.yaml file as usual. 

IMPORTANT: Do not forget to provide the callback URI to the OIDC provider. Also, LDAPS will not work with older TLS versions; you need at least TLS 1.2. Check the official Tanzu Kubernetes Grid 1.3 documentation for additional information.

When everything is set up correctly, a developer just needs to install the Tanzu CLI and connect to the Kubernetes API server to get the correct kubeconfig.

$ tanzu login --endpoint --name tkg-mgmt-cluster

A browser window will now be opened by the Tanzu CLI to log into the developer account. 

After completing this process, Kubernetes resources can be accessed/created according to the role-based access control rules the cluster administrator has configured. A Tanzu Kubernetes Grid 1.3 cluster has approximately 80 roles available to choose from.

$ kubectl create clusterrolebinding test-rb --clusterrole edit --user test

This is the end, my friend

To wrap up, here is a command that shows the kind of information you get from the metrics-server, which is installed by default on the management and workload clusters.

$ kubectl top pods -n kube-system
NAME                                                         CPU(cores)   MEMORY(bytes)
antrea-controller-577bf7c894-8ldtv                           12m          38Mi
etcd-nsxt-cluster-1-control-plane-rgg2j                      100m         62Mi
kube-apiserver-nsxt-cluster-1-control-plane-rgg2j            205m         408Mi
kube-controller-manager-nsxt-cluster-1-control-plane-rgg2j 33m          68Mi
kube-proxy-6lw8k                                             1m           23Mi

And finally, some commands that explain how to use the new TanzuKubernetesRelease CRD.

$ tanzu kubernetes-release get
  NAME                       VERSION                  COMPATIBLE  UPGRADEAVAILABLE
  v1.17.16---vmware.2-tkg.1  v1.17.16+vmware.2-tkg.1  True        True
  v1.18.16---vmware.1-tkg.1  v1.18.16+vmware.1-tkg.1  True        True
  v1.19.8---vmware.1-tkg.1   v1.19.8+vmware.1-tkg.1   True        True
  v1.20.4---vmware.1-tkg.1   v1.20.4+vmware.1-tkg.1   True        False

$ tanzu kubernetes-release available-upgrades get  v1.19.8---vmware.1-tkg.1
  NAME                      VERSION
  v1.20.4---vmware.1-tkg.1  v1.20.4+vmware.1-tkg.1

$ tanzu cluster upgrade tkg-cluster-1
$ tanzu cluster upgrade tkg-cluster-1 --tkr v1.20.1---vmware.1-tkg.2
$ tanzu cluster upgrade tkg-cluster-1 --os-name ubuntu --os-version 20.04

The meaning of the version string v1.17.16+vmware.2-tkg.1 used for installations or updates is the following: 

  • v1.17.16 – upstream version of Kubernetes used

  • vmware.1 – compiled & signed binaries by VMware

  • tkg.1  – Tanzu Kubernetes Grid  software added on top of it

That’s all for today, but you’ll most likely agree that Tanzu Kubernetes Grid  1.3 is a pretty amazing release.

This article may contain hyperlinks to non-VMware websites that are created and maintained by third parties who are solely responsible for the content on such websites.

About the Author

Tom Schwaller (@tom_schwaller) is a Technical Product Line Manager at VMware (MAPBU) and has been involved with IT for more than two decades. Tom specializes in technologies like Kubernetes, Tanzu Kubernetes Grid, NSX-T, automation, and deep learning. He is a former NSX, VIO, and Cloud Native Systems Engineer at VMware, and a Linux/Python user since 1993! His latest passion is the Lean proof assistant (and functional programming language).

Follow on Twitter More Content by Tom Schwaller
Why Modernizing the Data Layer Requires More than New Tools
Why Modernizing the Data Layer Requires More than New Tools

Kevin Muerle of VMware Tanzu Labs shares his insights from decades helping enterprises modernize their data...

Massively Parallel Automated Model Building for Deep Learning
Massively Parallel Automated Model Building for Deep Learning

Learn how different AutoML methods can be efficiently run on VMware Tanzu Greenplum, an MPP data platform t...


Subscribe to our Newsletter

Thank you!
Error - something went wrong!