What is Kubernetes monitoring?
Monitoring Kubernetes presents some unique challenges compared with VM-based computing environments. These challenges are the result of differences in the Kubernetes environment:
- The ephemeral nature of containers
- An increased density of objects, services, and metrics within a given Kubernetes node
- A focus on services, rather than machines
- More diverse consumers of monitoring data
- Changes in the software development lifecycle
These differences make Kubernetes container monitoring both more critical and more challenging. Anytime there’s a problem, you need to be able to determine when and where failures happened. Achieving this goal requires Kubernetes monitoring at the cluster level and at the pod and application level.
Kubernetes cluster monitoring
A Kubernetes cluster consists of a set of nodes (physical servers or VMs) that run containerized applications with local resiliency. Kubernetes cluster monitoring tracks the health of all nodes in the cluster and available capacity. This includes checking how many containers are running on each node and resource utilization on each node and the cluster, with alerting for errors and failures. Cluster monitoring ensures that the cluster is healthy so that failures or resource constraints don’t affect the performance of running applications.
Kubernetes pod monitoring
A typical application running on Kubernetes consists of multiple pods. A pod is the smallest execution unit in Kubernetes, with each pod consisting of one or more containers. Each application running on a cluster specifies the pods it requires and the number of instances of each pod. Kubernetes then works to ensure that the declared state is maintained. Kubernetes pod monitoring consists of tracking relevant metrics regarding containers, applications, availability, request latency, resource usage, and identifying problems. For example, if the number of instances of a particular pod is consistently below the number requested, it could indicate a resource constraint, or that a bug is causing instances to fail more quickly than they can be restarted.
Importance of Kubernetes application monitoring
As monolithic applications are refactored into microservices, the requirements for application monitoring change. To capture the necessary application data, Kubernetes monitoring and logging needs to function at a container level across thousands of endpoints. Because Kubernetes workloads are ephemeral by default and can start or stop at any time, application monitoring must be dynamic and aware of Kubernetes labels and namespaces. A consistent set of rules or alerts must be applied to all pods, new and old.
Monitoring and observability should always be a consideration when you’re developing new apps or refactoring existing ones. Maintaining a common layer of baseline metrics that applies to all apps and infrastructure while incorporating custom metrics for each application is extremely desirable. Adding a new metric should not trigger a major replumb of your monitoring stack.
Kubernetes application monitoring should include the following:
- Resource consumption. A single Kubernetes cluster usually runs multiple applications and limits the resources available to each application. It’s important to monitor each application’s resource consumption to ensure the application is adequately resourced and is utilizing and releasing resources normally.
- Application health. The health of an application can be measured by probing its respective pods.
- The liveness probe determines if a container within a pod needs to restart.
- The readiness probe indicates to the cluster when a pod is ready to begin processing traffic.
- The startup probe indicates when the application in the pod has started successfully.
- Application performance. Application performance depends on having enough resources and successful functioning of the Kubernetes components and services on which the application depends. For each application, it’s essential to define what “good performance” means, and what specific elements or services have the biggest impact on that application’s performance.
What to keep in mind when you’re considering Kubernetes monitoring
Monitoring Kubernetes shows you whether a Kubernetes environment and all its layers—clusters, nodes, pods, container, and application workloads—are operating as expected. Although an effective Kubernetes monitoring strategy is essential, to get the full picture of the performance of Kubernetes applications, you also need Kubernetes observability.
To implement an effective monitoring strategy for your cluster(s), you first must determine what to monitor. Kubernetes is not self-monitoring, and all the components in the Kubernetes environment need to be included for effective monitoring of Kubernetes deployments. A layered approach to monitoring makes it easy to identify the problem by drilling down through the layers until issues are identified.
As your Kubernetes environment grows, you’ll end up with many clusters that all have to be monitored and managed. Effective multicluster monitoring requires log aggregation in a centralized location with dashboards that allow you to see the health of all clusters in one location. There are a variety of tools that enable these capabilities, such as the combination of Thanos, Grafana, and Prometheus.
There are a range of tools available to assist with Kubernetes monitoring, visualization, and logging.
- Prometheus works by scraping data from configured endpoints, parsing it, and storing it in its internal time-series database. This data can then easily be displayed using a visualization tool such as Grafana.
- Thanos monitors and aggregates data from multiple Prometheus deployments and then analyzes it using Grafana.
- Grafana visualizes the performance data of a running Kubernetes cluster. It has native integration with Prometheus and Elasticsearch.
- Kibana is an open source data discovery and visualization dashboard for accessing information stored in Elasticsearch.
- Elasticsearch is an open source database for Kubernetes that is distributed, fast, and scalable for container log aggregation, search, and analytics.
- Logstash is a data-processing pipeline, commonly combined with Elasticsearch, that aggregates and ships logs to the storage engine utilized in a Kubernetes cluster.
Successful Kubernetes monitoring requires careful attention to best practices:
- Plan your monitoring strategy. There are several instrumenting strategies. It’s common to include instrumentation libraries in your containerized apps or as an agent running in a separate sidecar container. However, this can have significant resource impacts at scale. For this reason, many monitoring solutions utilize eBPF (extended Berkeley Packet Filter) probes to collect kernel-level data across nodes and clusters without added instrumentation.
- Capture historical system data. While metrics are important, in dynamic Kubernetes environments the state of the system is always changing. Historical system data is essential for understanding the state of the system at the time an event occurred.
- Use API metrics to detect app issues. Microservices utilize APIs for communication. Capturing API metrics such as request rate, error rate, and latency can help you pinpoint issues more quickly.
- Measure usability and the end user experience. With any digital service, it’s the end user experience that matters most. Therefore, it’s critical to identify metrics that reflect the user experience and the usability of the application. This includes responsiveness or the latency of user requests and may also include other application-specific metrics, like how often users abandon transactions or make mistakes.
See our Kubernetes Monitoring Checklist to ensure you don’t overlook anything.
Kubernetes monitoring at VMware
VMware and the VMware Tanzu portfolio of products and services deliver the monitoring and observability tools you need for your cloud native journey, allowing you to achieve higher levels of application health and availability for an overall improved end user experience.
VMware Tanzu Observability by Wavefront
Get enterprise observability for multi-cloud environments. Monitor everything from full-stack applications to cloud infrastructures with metrics, traces, event logs, and analytics. Use Tanzu Observability to roll out monitoring as a service to all your DevOps teams, including developers and SREs across the enterprise.
- Supports popular Kubernetes Distributions. Tanzu Observability is designed to integrate with all popular Kubernetes platforms, including the ability to provide full-stack visibility for Red Hat OpenShift environments no matter where they're running—on-premises, in the public cloud, or delivered as a managed service. The Tanzu Observability by Wavefront Certified OpenShift 4.x Operator provides developers, OpenShift operators, and SREs automated visibility into OpenShift environments.
- Multi-cloud monitoring and analytics. Get out-of-the-box, real-time visibility across all major public cloud platforms—AWS, Azure, and GCP—with 50+ public cloud integrations and prepackaged dashboards showing key metrics for all major cloud services.
Try Tanzu Observability now. With our free 30-day trial, you can get started in as little as 10 minutes, taking advantage of more than 220 built-in integrations and live support. Tanzu Observability Quickstart documentation helps you get up to speed quickly. See the full portfolio of VMware Tanzu Observability Solutions. KubeAcademy can help your team come up to speed on all topics related to Kubernetes, including an eight-part Introduction to Observability.
What is the difference between Kubernetes monitoring and observability?
Kubernetes monitoring is focused on gathering metrics, log data, and alerts from clusters and applications running in containers. It's considered a subset of observability. Observability delivers actionable information where intelligence is applied to exposed data, enabling automated changes and optimizations to infrastructure.
Why is Kubernetes monitoring important?
Kubernetes monitoring enables proactive management of clusters. It eases management of containerized infrastructure by tracking utilization of cluster resources including memory, CPU, and storage. Cluster operators can monitor and receive alerts if the desired number of pods are not running, if resource utilization is approaching critical limits, or when failures or misconfiguration affect pods.
Which monitoring tool is best for Kubernetes?
There are a variety of popular open source and commercial tools that assist with Kubernetes monitoring, logging, and visualization. For instance, many organizations use open source tools such as Prometheus and Grafana to gather metrics from a wide array of sources and visualize the data gathered in a dashboard. Tanzu Observability by Wavefront collects metrics on clusters, nodes, pods, containers, and control plane objects and provides deep visibility into the operations of your Kubernetes clusters and applications.