기술 통찰력 / Kubernetes Monitoring

Kubernetes monitoring: Modern application observability for dynamic environments

What is Kubernetes monitoring?

Monitoring Kubernetes presents some unique challenges compared with VM-based computing environments. These challenges are the result of differences in the Kubernetes environment:

  • The ephemeral nature of containers
  • An increased density of objects, services, and metrics within a given Kubernetes node
  • A focus on services, rather than machines
  • More diverse consumers of monitoring data
  • Changes in the software development lifecycle

These differences make Kubernetes container monitoring both more critical and more challenging. Anytime there’s a problem, you need to be able to determine when and where failures happened. Achieving this goal requires Kubernetes monitoring at the cluster level and at the pod and application level.

Kubernetes cluster monitoring

A Kubernetes cluster consists of a set of nodes (physical servers or VMs) that run containerized applications with local resiliency. Kubernetes cluster monitoring tracks the health of all nodes in the cluster and available capacity. This includes checking how many containers are running on each node and resource utilization on each node and the cluster, with alerting for errors and failures. Cluster monitoring ensures that the cluster is healthy so that failures or resource constraints don’t affect the performance of running applications.

Kubernetes pod monitoring

A typical application running on Kubernetes consists of multiple pods. A pod is the smallest execution unit in Kubernetes, with each pod consisting of one or more containers. Each application running on a cluster specifies the pods it requires and the number of instances of each pod. Kubernetes then works to ensure that the declared state is maintained. Kubernetes pod monitoring consists of tracking relevant metrics regarding containers, applications, availability, request latency, resource usage, and identifying problems. For example, if the number of instances of a particular pod is consistently below the number requested, it could indicate a resource constraint, or that a bug is causing instances to fail more quickly than they can be restarted.




Importance of Kubernetes application monitoring

As monolithic applications are refactored into microservices, the requirements for application monitoring change. To capture the necessary application data, Kubernetes monitoring and logging needs to function at a container level across thousands of endpoints. Because Kubernetes workloads are ephemeral by default and can start or stop at any time, application monitoring must be dynamic and aware of Kubernetes labels and namespaces. A consistent set of rules or alerts must be applied to all pods, new and old.

Monitoring and observability should always be a consideration when you’re developing new apps or refactoring existing ones. Maintaining a common layer of baseline metrics that applies to all apps and infrastructure while incorporating custom metrics for each application is extremely desirable. Adding a new metric should not trigger a major replumb of your monitoring stack.

Kubernetes application monitoring should include the following:

  • Resource consumption. A single Kubernetes cluster usually runs multiple applications and limits the resources available to each application. It’s important to monitor each application’s resource consumption to ensure the application is adequately resourced and is utilizing and releasing resources normally.
  • Application health. The health of an application can be measured by probing its respective pods.
    • The liveness probe determines if a container within a pod needs to restart.
    • The readiness probe indicates to the cluster when a pod is ready to begin processing traffic.
    • The startup probe indicates when the application in the pod has started successfully.
  • Application performance. Application performance depends on having enough resources and successful functioning of the Kubernetes components and services on which the application depends. For each application, it’s essential to define what “good performance” means, and what specific elements or services have the biggest impact on that application’s performance.




What to keep in mind when you’re considering Kubernetes monitoring

Monitoring Kubernetes shows you whether a Kubernetes environment and all its layers—clusters, nodes, pods, container, and application workloads—are operating as expected. Although an effective Kubernetes monitoring strategy is essential, to get the full picture of the performance of Kubernetes applications, you also need Kubernetes observability.


1. How to monitor clusters in Kubernetes

To implement an effective monitoring strategy for your cluster(s), you first must determine what to monitor. Kubernetes is not self-monitoring, and all the components in the Kubernetes environment need to be included for effective monitoring of Kubernetes deployments. A layered approach to monitoring makes it easy to identify the problem by drilling down through the layers until issues are identified.

As your Kubernetes environment grows, you’ll end up with many clusters that all have to be monitored and managed. Effective multicluster monitoring requires log aggregation in a centralized location with dashboards that allow you to see the health of all clusters in one location. There are a variety of tools that enable these capabilities, such as the combination of Thanos, Grafana, and Prometheus.

2. What are the popular Kubernetes monitoring tools

There are a range of tools available to assist with Kubernetes monitoring, visualization, and logging.

  • Monitoring
    • Prometheus works by scraping data from configured endpoints, parsing it, and storing it in its internal time-series database. This data can then easily be displayed using a visualization tool such as Grafana.
    • Thanos monitors and aggregates data from multiple Prometheus deployments and then analyzes it using Grafana.
  • Visualization
    • Grafana visualizes the performance data of a running Kubernetes cluster. It has native integration with Prometheus and Elasticsearch.
    • Kibana is an open source data discovery and visualization dashboard for accessing information stored in Elasticsearch.
  • Logging
    • Elasticsearch is an open source database for Kubernetes that is distributed, fast, and scalable for container log aggregation, search, and analytics.
    • Logstash is a data-processing pipeline, commonly combined with Elasticsearch, that aggregates and ships logs to the storage engine utilized in a Kubernetes cluster.

3. Pay attention to Kubernetes monitoring best practices

Successful Kubernetes monitoring requires careful attention to best practices:

  • Plan your monitoring strategy. There are several instrumenting strategies. It’s common to include instrumentation libraries in your containerized apps or as an agent running in a separate sidecar container. However, this can have significant resource impacts at scale. For this reason, many monitoring solutions utilize eBPF (extended Berkeley Packet Filter) probes to collect kernel-level data across nodes and clusters without added instrumentation.
  • Capture historical system data. While metrics are important, in dynamic Kubernetes environments the state of the system is always changing. Historical system data is essential for understanding the state of the system at the time an event occurred.
  • Use API metrics to detect app issues. Microservices utilize APIs for communication. Capturing API metrics such as request rate, error rate, and latency can help you pinpoint issues more quickly.
  • Measure usability and the end user experience. With any digital service, it’s the end user experience that matters most. Therefore, it’s critical to identify metrics that reflect the user experience and the usability of the application. This includes responsiveness or the latency of user requests and may also include other application-specific metrics, like how often users abandon transactions or make mistakes.


Kubernetes monitoring and VMware Tanzu

VMware Tanzu products and services deliver the monitoring and observability tools you need for your cloud native journey, allowing you to achieve higher levels of application health and availability for an overall improved end user experience.

VMware Tanzu Platform goes beyond monitoring and provides full-stack observability into the health and performance of workloads and clusters.

VMware Tanzu Labs can help you understand your monitoring and observability needs and implement the right processes and tools to ensure success.




FAQ

What is the difference between Kubernetes monitoring and observability?

Kubernetes monitoring is focused on gathering metrics, log data, and alerts from clusters and applications running in containers. It's considered a subset of observability. Observability delivers actionable information where intelligence is applied to exposed data, enabling automated changes and optimizations to infrastructure.

Why is Kubernetes monitoring important?

Kubernetes monitoring enables proactive management of clusters. It eases management of containerized infrastructure by tracking utilization of cluster resources including memory, CPU, and storage. Cluster operators can monitor and receive alerts if the desired number of pods are not running, if resource utilization is approaching critical limits, or when failures or misconfiguration affect pods.