Afraid your Kubernetes clusters will go down? Now's the time to examine PKS

April 13, 2018 Ahilan Ponnusamy

Lots of enterprises are kicking the tires on Kubernetes. And for good reason! It’s common to hear questions like how does it work? How can I use it for my organization? What are the right use cases?

Questions quickly get much more tactical from there. There’s the usual curiosity about security and compliance. Operators wonder about patching and upgrades.

For users of the new Pivotal Container Service (PKS), you enjoy many of these features out of the box. It “just works” as we like to say.

In this post, we wanted to explore one aspect in detail - high availability. Hardware failures are a fact of life. So how do you keep your Kubernetes clusters online, as user traffic ebbs and flows? Let’s take a closer look!

PKS Provides Three Layers of HA for Kubernetes

Long-time Cloud Foundry operators know HA starts with BOSH, the deployment toolchain for distributed systems. In the world of Kubernetes, PKS uses BOSH to monitor and resurrect Kubernetes processes and underlying VMs. PKS provides three levels of HA out of the box:

  1. VM failure management

  2. Kubernetes process failure management

  3. Pod failure management

Let’s take a look at how Kubernetes, on its own, manages the HA for deployed pods, the Kubernetes unit of deployment. The project uses an agent called kubelet that is deployed in the worker nodes.

Source: X-Team

As shown above, the kubelet deployed in the worker nodes monitors the pods. The kubelet also requests that the master node (the Kubernetes cluster control plane) re-create any pods that are unresponsive or destroyed. The master then proceeds to deploy the new pod in a healthy node, the node that happens to have the least amount of load. However, native Kubernetes HA stops here.

Out of the box, the project does not have the capability to monitor the kubelet agents themselves, or the VMs where the cluster is deployed. PKS - with the help of BOSH - addresses these HA scenarios. After all, most enterprises need to deliver aggressive SLAs at scale!

PKS includes the BOSH agent monit to look after your Kubernetes processes, and the underlying VMs themselves. PKS will restart the process or create a new VM in case the process (or VM) becomes unresponsive.

In a future release, BOSH will also deploy Kubernetes cluster across multiple AZs, a fourth layer of HA to your deployment . When you enable this upcoming feature, a single AZ failure won’t bring down your entire Kubernetes cluster.

Now, let’s review how PKS delivers HA for Kubernetes. We’ve seen a lot of interest in running Elasticsearch with PKS, so we’ll use that for our walkthrough.

Installing PKS and Creating a Kubernetes Cluster

Before we get to the HA features specifically, let’s review how fast (and easy) it is to get your Kubernetes cluster up and going. 

  1. Download Pivotal Container Service from Pivotal Network.

  2. Login to Operations Manager, and import PKS.

  1. Open the Harbor Registry tile, and start the PKS configuration.

  • In the PKS API tab, generate a certificate with a wild card domain name (* that you have registered with a DNS provider such as Google Domains.

  • In this example, we will use “plan 1” named “small.” Make sure to enable the use of privileged containers. This option helps make sure your pods have better access to the network stack and devices. (Learn more about this option and Privileged containers here).    

  • Double-check that your Complete vSphere configuration is correct under “kubernetes cloud provider” tab. The datacenter, datastore and VM folder names should match your vSphere configuration.

  1. In the UAA tab, enter a valid subdomain for the UAA URL field (

  1. After completing the configuration, go back to installation dashboard. Click “Apply changes.” It can take up to 30 minutes for PKS to install. Upon successful installation, you’ll notice the PKS tile has changed to green from orange.

  1. Map the PKS IP address (available under the “status” tab) to the UAA subdomain created in Step 4. This is shown below.

  1. Download pks and kubectl CLI from Pivotal Network. If you’re on a  Windows machine, rename the executables to pks.exe and kubectl.exe and add them to “path” environment variable.

  2. Let’s now create a PKS admin user in UAA:  

    • SSH into OpsMgr. (You can also install cf-uaac in your local machine.)

$   ssh ubuntu@<<OpsMgr Fully Qualified Domain Name / IP>>

  1. Target uaac to the environment, and create the admin user.

$  uaac target --skip-ssl-validation

$  uaac token client get admin -s <<password>> (Available in Ops Mgr PKS tile –> Credentials tab -> Uaa Admin Secret

$  uaac user add john --emails -p <<password>>

$  uaac member add pks.clusters.admin john

  1. Now, we’re ready to create a Kubernetes cluster! Typically, the PKS cluster is created by the operations team, before handing the environment over to the development team. In our example, John (as the administrator) will create a PKS Cluster called es_cluster (Elasticsearch cluster). The cluster will have 2 nodes (VMs) based on the sizing defined in plan 1, our small plan from earlier. We use these commands:

$   pks login -a -u john -p <<password>> --skip-ssl-verification

$   pks create-cluster es_cluster --external-hostname --plan small --num_nodes 2

You can see the cluster creation process is “In Progress”.

  1. Let’s check the cluster status:

$   pks cluster es_cluster

After a few minutes, we notice that the cluster creation has succeeded, and a Kubernetes master IP is assigned to the cluster.

  1. Map the Kubernetes master IP to es_cluster subdomain in your DNS provider (

  1. The new PKS cluster es_cluster is now ready for use. John, the administrator, will now send the config file with the cluster configuration and authentication information to the engineering teams. He does this with the get-credentials command. He then sends the config file created under $user_home/.kube folder.

$   pks get-credentials es-cluster

Deploying the Elasticsearch Pod

  1. Let’s now step through the developer interaction with PKS using the kubectl CLI. We can open a new terminal, and get  the details of the two worker nodes the admin created. In real world, developers will place the config file under $user_home/.kube folder.)

$   kubectl get nodes -o wide

  1. Download the Elasticsearch project from github.

  1. We can create Storage Class, Persistent Volume Claim, and NodePort for the Elasticsearch pod.

$   kubectl create -f storage-class-vsphere.yml

$   kubectl create -f persistent-volume-claim-es.yml

$   kubectl create -f es-svc.yml

NOTE : Instead of using NodePort, you may also use port forwarding to map a local port (e.g. 2000) to ES port 9200. This way, you can restrict pod access within the Kubernetes cluster. You can find more info about port forwarding here.

  1. Deploy the Elasticsearch pod:

$   kubectl create es-deployment.yml

  1. Get all information about the Elasticsearch deployment with the following commands. (You can find the IP address of the Worker Node, the one that hosts the Elasticsearch pod, from the Node column in get pods call output.)

$   kubectl get pods -o wide

$   kubectl get nodes -o wide

$   kubectl get svc

  1. All requests to create and access data from Elasticsearch are available as a postman file in the github project you downloaded earlier. Import the json file (ES_Requests.postman_collection.json) in Postman. (Download Postman. Prefer curl? Check out this repo.)

  1. Open the CreateIndex request. Update the IP address and the Port number of the request URL to map our environment. We’re interested in these two Information:

    • IP address - the Worker Node IP where the pod is deployed.

    • Port - NodePort the 5 digit port from get svc command output

  1. Execute the REST call. Click the Send button, and the Index will be successfully created, as shown below.

  1. Similarly, open CreateOrderType request and execute it after updating the URL to reflect your environment.

  1. Create two customer records by executing CreateCustomer and CreateCustomer2 requests.

  2. Get a Customer record to validate the data by executing the GetCustomer1 record.

Now we have our cluster, and Elasticsearch installed. Let’s now review the HA features inherent to PKS!

Now, the Good Stuff: High Availability in PKS

When customers ask about HA Kubernetes, they want to know how the system stays online even when underlying resources fail. So let’s focus on the following scenarios in PKS:

Pod Failure Management by Kubernetes

Just for fun, let’s delete an Elasticsearch pod, and see how Kubernetes recreates the pod in the second VM.

  1. Get pod information:

$   kubectl get pods -o wide

  1. Delete the pod:

$   kubectl delete pod <<pod name>>

  1. Let’s watch Kubernetes recreate the pod in the second VM. You’ll notice the pod being terminated from the first VM. A new pod is created in the next VM.

$   kubectl get pods -o wide  

  1. Execute get pods after few minutes to confirm the successful creation of the pod. You can also use watch flag (-w) to monitor the creation process, without having to re-execute the command after a few minutes.

  2. Execute the GetCustomer1 request in Postman to validate that the pod is deployed successfully and the data is persisted.

OK, on to our second scenario!

VM Failure Management by BOSH

Now, let’s shutdown a VM from the PKS cluster, and watch BOSH automatically create a new one to replace it.

  1. Find the non-pod resident VM (i.e. the VM that does not have the pod deployed). We type these commands:

$   kubectl get pods -o wide  

$   kubectl get nodes -o wide

  1. Login to the vSphere web client, and find the VM using the IP address in the search box:

  1. Shut down the VM. Click on the Red Square in the toolbar, or  select the Power Off option from right click menu. Check out a video of this sequence: 

  1. Now, watch BOSH recreate the new VM. BOSH will make sure the desired state of the Kubernetes cluster - 2 worker nodes- is met. We simply execute the command:

$   kubectl get nodes -o wide -w  

On to scenario 3!

Pod and VM Failure Management  by Kubernetes and BOSH

Let’s shutdown the VM where the pod is deployed. Here, both Kubernetes and BOSH fix the failure. In this scenario:

  • Kubernetes will create the pod in the second worker node

  • BOSH will create a new VM to replace the Shutdown VM

  1. Find the VM where the pod is deployed:

$   kubectl get pods -o wide  

$   kubectl get nodes -o wide

2. Login to the vSphere web client, and find the VM using the IP address in the search box:

3. Shut down the VM. Click on the Red Square in the toolbar.    

4. Execute the following commands in two different terminals. Watch BOSH recreate the new VM, and Kubernetes deploy the pod in the second VM:

$   kubectl get nodes -o wide -w  

$   kubectl get pods -o wide -w  

Apart from these three HA scenarios, PKS will also monitor Kubernetes processes like kubelet, and bring up the failed process as needed. PKS also communicates the failure to the admin through the configured notification channel (email, pager, and so on). You can use the BOSH -e command to check the processes that are monitored in master and worker nodes.

High Availability: Just One of Many Day 2 Features PKS Delivers Out of the Box

Customers turn to Pivotal to deliver availability for their most important systems. We’ve just demonstrated how PKS keeps your Kubernetes clusters stay online even when the underlying resources fail. This same innovation keeps you online during patching, upgrades, and when performing blue-green deployments. Want to try PKS? Download it, and then check out the documentation!

About the Author

Ahilan Ponnusamy

Ahilan Ponnusamy is a Senior Platform Architect at Pivotal. He works on Partner Solution Architecture team supporting DellTech and VMWare sales teams. Prior to joining Pivotal, Ahilan was with Oracle Technologies leading the SMB Cloud Platform Specialists team supporting North America sales team.

A Unifying Foundation for the Customer Journey at Mercedes-Benz
A Unifying Foundation for the Customer Journey at Mercedes-Benz

Find out how the luxury automaker is using modern software development and a cloud-native platform to bette...

Developing a Custom Concourse Resource
Developing a Custom Concourse Resource

This post provides a quick look at how to create your own custom Concourse resource.