Taking Kubernetes to the People: How Cluster API Promotes Self-Service Infrastructure

December 12, 2019 Chris Milsted

Two key goals of Cluster API are to manage the full lifecycle of a Kubernetes cluster, including scaling up and scaling down the cluster, and to give infrastructure providers a common framework to build against so that everyone can have a common workflow irrespective of where they intend to consume CPUs, memory, or network resources. That’s easier said than done, of course, so here comes the first thing to get your head around—with Cluster API, Kubernetes clusters are provisioned, scaled, and de-provisioned by an external Kubernetes cluster. It is, in effect, turtles all the way down.

Tools to create and delete clusters are not new; kops and Kubespray are two examples. You can find Cluster API as well as these existing tools under the Cluster Lifecycle Special Interest Group (sig-cluster-lifecycle) within the Kubernetes project. I prefer the Cluster API approach for a couple of reasons: First, it is mimicking the out-of-tree approach that Kubernetes took once the project got too big. Cluster API has been designed in a modular way with an implementation for each infrastructure provider separate from the main code tree and in its own repository. This approach means that the development pace for each provider is decoupled and new providers are easy to add. Second, a lot of the other cluster management projects are command-line driven or describe themselves as something like “kubectl for clusters.” The Cluster API project took a different approach—an API-based approach.

Democratizing Access to Kubernetes

I think most people will agree that the approach to expose functionality through an API will democratize access to kubernetes. By this, I mean that if there is an API to create, scale, upgrade, and destroy Kubernetes clusters, teams can move to more of a self-service approach. By adopting such a self-service approach, teams can avoid getting bogged down in questions about the right size of clusters or debates over whether to have big multi-tenant clusters or smaller single-tenant clusters.

Since Kubernetes 1.7, you have been able to extend the API to define custom resources. Coupling this API-centric approach with a custom controller (Kubernetes operator) that can upgrade versions and provide security and errata patches allows everyone to use Kubernetes in a safe, secure, and scalable way.

So come full circle and you can use a Kubernetes cluster. Cluster API extends the base Kubernetes API to understand concepts such as Clusters, Machines, and Machine Deployments. Cluster API includes some controllers to do the work and profit from all the existing Kubernetes objects.

Examining the Cluster API Components

The easiest way to visualize Cluster API is as a set of the following layers:

At the core is the Cluster API project, which also contains the default bootstrap provider of kubeadm, although there is no reason that another out-of-tree provider could not be used. Then there are the infrastructure providers that live in separate projects, such as the provider for AWS and the provider for VMware vSphere. These providers extend the default cluster object as provided by Cluster API so that the specifics for each underlying provider are developed out of tree. You will sometimes see the following abbreviations used; for additional abbreviations, see the Cluster API Glossary:

CABPK - Cluster API bootstrap provider kubeadm
CAPA - Cluster API Provider for AWS
CAPD - Cluster API Provider for Docker (Kind)
CAPV - Cluster API Provider for vSphere
CAPZ - Cluster API Provider for Azure

Keep in mind that the Cluster API project is v1alpha2 and not all the functionality has been implemented in all the providers yet. The full list of provider documentation is here.

Building a Cluster

Of course, the first step is to build a Kubernetes cluster and then to extend the Kubernetes API so that it can understand all these new objects. Taking the Cluster API Provider for vSphere as an example, you need to install the following:

Great, you now have a Kubernetes cluster that knows how to define other Kubernetes clusters.

Now that you are ready to build a managed Kubernetes cluster (or many clusters, because you can now spin up clusters on demand), I am going to break this down into a couple of phases, based on how the kubeadm bootstrap provider works. The first thing you need to do is create a new cluster, and the easiest way to do that is by bringing up a single master node.

Assuming you also need fault tolerance, you would then want to scale out the control plane to three and add two or more worker nodes, and this part will be phase two of the process. To show how to scale the worker nodes up and down, I am going to look at MachineDeployments, which can have a replica count and scale factor based on a given template for the machine and bootstrap objects. Those familiar with Kubernetes should draw the analogy here to Pods and Deployments; the behaviour here is identical for Machines and MachineDeployments.

Defining the Cluster

Let’s look in more detail at the first step above, where you need to create a single node and have the bootstrap provider initialize it as a new Kubernetes master. The following information is from the Cluster API quick start guide using the vSphere provider as an example.

At a high level, you are going to define the following:

You start by defining a new object of type cluster. You also want to define the infrastructure provider-specific parameters, which are the extensions that the VSphere infrastructure provider supports over and above the base cluster definition. Below you can see some of the YAML code, which defines a cluster, links it to the infrastructure provider VSphereCluster, and then defines provider-specific parameters such as the VMware vCenter server IP address.

kind: Cluster
metadata:
  name: capi-quickstart
spec:

  infrastructureRef:
	apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
	kind: VSphereCluster
	name: capi-quickstart
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: VSphereCluster
Metadata:
cloudProviderConfiguration:

  server: 10.0.0.1

Now that you have defined the cluster, the first step is to define the machine to bootstrap by using kubeadm to become a new Kubernetes cluster. This machine is referred to in the documentation as the first control plane machine.

Next, you need to link the machine to the vSphere infrastructure provider and kubeadm bootstrap provider so that Cluster API knows how to instantiate the machine and then bootstrap it into a new Kubernetes cluster.

In the snippet below, you can see that you are defining a new machine, associating it with the bootstrap and infrastructure providers defined above, and passing in values to these providers. In the infrastructure provider, you need to provide the vSphere specific values, such as the template, network, and CPU size. For the bootstrap provider, you need to define the clusterConfiguration method. This method is critical because it tells kubeadm to perform a kudeadm init using the default values.

apiVersion: cluster.x-k8s.io/v1alpha2
kind: Machine
Metadata:
  cluster.x-k8s.io/cluster-name: capi-quickstart
  cluster.x-k8s.io/control-plane: "true"

  bootstrap:
	configRef:
  	apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
  	kind: KubeadmConfig
  	name: capi-quickstart-controlplane-0
  infrastructureRef:
	apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
	kind: VSphereMachine
	name: capi-quickstart-controlplane-0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: VSphereMachine
Metadata:
  cluster.x-k8s.io/cluster-name: capi-quickstart
  cluster.x-k8s.io/control-plane: "true"

spec:
  datacenter: SDDC-Datacenter
 diskGiB: 50
  memoryMiB: 2048
  network:
    devices:
    - dhcp4: true
      dhcp6: false
      networkName: vm-network-1
  numCPUs: 2
  template: ubuntu-1804-kube-v1.16.2

---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfig
metadata:
  name: capi-quickstart-controlplane-0
  namespace: default
spec:
  clusterConfiguration:
    apiServer:
      extraArgs:
        cloud-provider: external
    controllerManager:
      extraArgs:
        cloud-provider: external

Scaling Out the Control Plane

Once you have applied the YAML, you will have the initial machine up and running. If you want to create an HA deployment, you can now scale out the control plane. To do so, you specify additional machines, and for the control plane, it makes the most sense to create these individually. This is because some infrastructure providers support annotations to steer specific master machines into defined availability zones; here’s an example from the Cluster API Provider for AWS:

spec:
  providerSpec:
	value:
  	availabilityZone: "eu-west-2a"

For the worker nodes, where end-user workloads will run, you potentially want to scale to a large number of machines. Assuming that the machines are identical, you can use MachineDeployments, which you can define a replica count against, similar to the way a Deployment can have a replica count for Pods:

In the quick start example, you can see that a MachineDeployment has a replicas: 1 value set below. This references a vSphere Machine template and a Kubeadm ConfigTemplate, which you also define. Note that for the KubeadmConfigTemplate, you define a joinConfiguration method this time, telling the new node to perform a kubeadm join for this machine to an existing cluster. You can also use this joinConfiguration method with kubeadm to join additional masters to the control plane by adding the controlPlane stanza to this joinConfiguration code block.


apiVersion: cluster.x-k8s.io/v1alpha2
kind: MachineDeployment
metadata:
  name: capi-quickstart-worker
  labels:
    cluster.x-k8s.io/cluster-name: capi-quickstart
    # Labels beyond this point are for example purposes,
    # feel free to add more or change with something more meaningful.
    # Sync these values with spec.selector.matchLabels and spec.template.metadata.labels.
    nodepool: nodepool-0
spec:
  replicas: 1
  selector:

    spec:
      version: v1.16.2
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
          kind: KubeadmConfigTemplate
          name: capi-quickstart-worker
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
        kind: VSphereMachineTemplate
        name: capi-quickstart-worker
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha2
kind: VSphereMachineTemplate
metadata:

spec:
  template:
    spec:
      datacenter: SDDC-Datacenter
      diskGiB: 50
      memoryMiB: 2048

---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha2
kind: KubeadmConfigTemplate
metadata:

spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          kubeletExtraArgs:
            cloud-provider: external
          name: '{{ ds.meta_data.hostname }}'

These steps provide a simple way to bootstrap a new Kubernetes cluster and then scale it up. Once finished, you can also delete the “cluster” object, which (depending on the provider today) will delete the cluster and all the machines. As the providers mature, it will also be possible to do things like upgrade the Kubernetes version tag, triggering a reconciliation where the controller will upgrade the cluster in a standard fashion. Assuming your application teams are deploying workloads using quotas, requests, limits, and pod disruption budgets, the goal is to have no disruption visible to the end user.

Since all the providers are developing at different speeds in separate repositories, you should take a look at the release notes for what is possible based on the infrastructure provider you select.

Wrapping Up

There are a number of pain points that the Cluster API project solves:

The workflow for cluster creation, deletion, scaling, and upgrading is standardized for on-premise and off-premise deployments.
The overhead of cluster management trends to zero, so the pattern of many small clusters becomes possible in place of larger multi-tenant clusters. The use of many smaller clusters allows teams to innovate at their own paces and allows clusters with different access controls.
Full lifecycle management of clusters becomes possible and widely accessible.

Join the Community!

Cluster API community forum
sig-cluster-lifecycle Google Group to gain access to documents and calendars
Cluster API working group sessions—weekly on Wednesdays at 10:00 AM PT on Zoom
Provider implementer office hours—weekly on Tuesdays at 12:00 PT (Zoom) and Wednesdays at 15:00 CET (Zoom)
Chat on Slack: #cluster-api

About the Author

Chris is based in the UK and is a Staff Field Engineer for VMware. He spends most of his work wrangling Kubernetes and most of his spare time playing field hockey badly and being a taxi driver for two children who are growing up rapidly.
More Content by Chris Milsted

Get Started with Cloud Native and Kubernetes on KubeAcademy

We’re excited to announce Getting Started, a new KubeAcademy course designed to orient beginners to the clo...

Stable for the Holidays: Kubernetes 1.17 Focuses on Refining Existing Features

For Kubernetes 1.17, the SIGs representing storage, networking, and api-machinery account for over half of ...

Taking Kubernetes to the People: How Cluster API Promotes Self-Service Infrastructure

Democratizing Access to Kubernetes

Examining the Cluster API Components

Building a Cluster

Defining the Cluster

Scaling Out the Control Plane

Wrapping Up

Join the Community!

About the Author

Previous

Next

Most Recent

As Kubernetes continues to mature—rounding the corner toward its 6th birthday—we’ve started to see a shift in terms of the challenges our customers need to solve. Initially, Kubernetes...

This latest version of vSphere has numerous added features, including native integration of the Tanzu Kubernetes Grid (TKG) to drive adoption of Kubernetes through familiar tools.

Large enterprises clearly trust Kubernetes, according to our most recent State of Kubernetes survey, and are using it for applications in production.

The initial, core elements of the VMware Tanzu portfolio are now generally available. With VMware, you now can modernize the applications that matter most and automate the path to production.

As members of the VMware Skyline Site Reliability Engineering (SRE) team, we ensure the availability and performance of our production services through obsessive measurement.

The Cluster API is an open-source, cross-vendor effort to simplify cluster lifecycle management. Cluster API is a big deal. In fact, Kubernetes creators Joe...

When the systems outside Kubernetes need information about what happens to resources inside Kubernetes, Watch-Proxy, an open source project from VMware, can come in handy.

The security ecosystem for Kubernetes can be confusing. A Sysdig article from July 2019 outlined 33 security tools for Kubernetes. That number has only grown.

The Cluster Operations course is designed to help you learn how to bootstrap Kubernetes clusters using various community tools.

In this blog post, you will see how new DevSecOps thinking is necessary as we look at the impact a development-led change can have on your operational security.

In this blog, we describe the use cases of coupling vRealize Network Insight with VMware Enterprise PKS specifically and Kubernetes more generally.

Pivotal’s modern applications expertise along with VMware’s sustained engineering excellence and product innovation brings together a deep collection of solutions, skills, and people.

In Sonobuoy 0.15.4, we introduced the ability for plugins to report their plugin’s progress to Sonobuoy by using a customizable webhook.

We’re excited to announce Getting Started, a new KubeAcademy course designed to orient beginners to the cloud native ecosystem.

For Kubernetes 1.17, the SIGs representing storage, networking, and api-machinery account for over half of the enhancements that were tracked.

The Podlets show aims to elucidate and demystify unique elements to help people confidently embrace cloud native technology.

VMware Tanzu solutions are built on key cloud native open source projects—they can be found in our VMware Tanzu GitHub organization at github.com/vmware-tanzu.

As part of the Tanzu umbrella of open source projects, VMware created a new open source project – Crash Recovery and Diagnostics for Kubernetes (or Crash Diagnostics for short).

In this release, we added more enhanced features to bring a much improved management experience to our customers.

Velero 1.1 provides support to back up Kubernetes applications orchestrated on VMware Enterprise PKS. This post details how to install and configure Velero to back up and restore a stateless app.