One of the most powerful features of VMware Enterprise PKS is its capability to manage desired state for Kubernetes clusters. This capability is provided in part by BOSH. For example, consider the following three-worker node cluster deployed in VMware Enterprise PKS:
NAME STATUS ROLES AGE VERSION 0c04b65b-1215-49a6-b4f4-383d75ff958a Ready
86d v1.12.4 c14736b9-2b54-484c-b783-a79453e28804 Ready 134m v1.12.4 fb1cd9ad-568e-4e83-a7a8-738dfe050712 Ready 84d v1.12.4
On this Kubernetes cluster, I have deployed a simple web application with four pods and an external load balancer:
kubectl get pods NAME READY STATUS RESTARTS AGE bootcamp-95bd888fc-8j77b 1/1 Running 0 40s bootcamp-95bd888fc-dbfb7 1/1 Running 0 63s bootcamp-95bd888fc-dzkmc 1/1 Running 0 40s bootcamp-95bd888fc-tmwfz 1/1 Running 0 40s kubectl get deployments NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE bootcamp 4 4 4 4 87s kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE bootcamp LoadBalancer 10.100.200.223 10.40.14.41,100.64.96.7 8080:32013/TCP 2m7s
What is the impact of a worker node failure?
A worker node is marked as Condition Unknown after the master is unable to reach the node's kubelet agent. If the node continues in Condition Unknown, it will evict the pods. The eviction process will restart pods on other nodes if possible. At this point, Kubernetes has restored the container applications to service but we are left with a Condition Unknown worker node.
The desired state of the Kubernetes cluster is to have three worker nodes. VMware Enterprise PKS restores the desired state for the Kubernetes cluster. In order to simulate this function of VMware Enterprise PKS, we will power off a worker node.
Simulating a node failure
We can identify worker nodes in VMware vSphere by reviewing the custom attributes and looking for worker in the job field:
Powering off this worker node will produce a warning from vSphere that it is managed by BOSH:
Ignoring the well-placed warning, the machine is powered off. Soon after being powered off, the worker node is marked as
Condition Unknown in Kubernetes:
kubectl get nodes NAME STATUS ROLES AGE VERSION 0c04b65b-1215-49a6-b4f4-383d75ff958a Ready
86d v1.12.4 c14736b9-2b54-484c-b783-a79453e28804 Ready 141m v1.12.4 fb1cd9ad-568e-4e83-a7a8-738dfe050712 NotReady 84d v1.12.4
After a reasonable time, Kubernetes rebuilds the failed pods on another node (as shown by the difference in age):
kubectl get pods NAME READY STATUS RESTARTS AGE bootcamp-95bd888fc-dbfb7 1/1 Running 0 13m bootcamp-95bd888fc-jfg82 1/1 Running 0 4m51s bootcamp-95bd888fc-s797b 1/1 Running 0 4m51s bootcamp-95bd888fc-tmwfz 1/1 Running 0 13m
VMware Enterprise PKS enforces the desired state of the cluster by replacing the powered off node with a newly deployed node and removing the failed node. In a traditional Kubernetes environment, the replacement of a failed node is a manual process. VMware Enterprise PKS excels at providing high scalability in part through its high degree of automated day-two operations.
About the AuthorMore Content by Joseph Griffiths