With the rise of DevOps, microservices and containers running across multiple platforms, complexity has increased. When a request comes into a monolithic application, it's fairly easy to observe because it’s self-contained. However, when a request comes into a microservice, it could result in a cascading chain of calls to various other services. Pinpointing problems that occur in this chain is much more difficult. Observability—derived from control theory and sometimes referred to as controllability—is a practice that enables DevOps teams, including software developers and operators, to identify problems where and when they happen in distributed systems.
The goal of observability is to understand a complex system’s internal state by observing its external outputs. Proper instrumentation enables you to aggregate metrics, traces, logs and events from a distributed system and correlate them across various application components and services, identifying complex interactions between elements and allowing you to troubleshoot performance issues, improve management, and optimize cloud native infrastructure and applications.
What is monitoring?
The term observability is frequently used interchangeably with monitoring, but they’re not the same thing. Monitoring is the process of collecting metrics and alerts to monitor health and performance of discrete IT infrastructure components, such as servers or VMs. Using a combination of tools and instrumentation, IT teams often use infrastructure-monitoring tools to collect and analyze data across the entire stack. Additional cloud monitoring tools may be needed to monitor services deployed in a public cloud.
Observability vs. monitoring
Monitoring is a subset of observability. Infrastructure monitoring alone cannot keep up with the demands of modern distributed applications. Observability enables you to drill down into the disparate systems that make up an application—including microservices, containers, and other hardware and software components—and correlate events. Data analytics and automation enable you to answer unanticipated questions and troubleshoot problems as they arise.
Monitoring vs. Observability
Monitoring
|
Observability
|
---|---|
Tracks metrics and logs. The focus is on gathering metrics and log data, with alerts when set thresholds are exceeded. | Delivers actionable information. Intelligence is applied to telemetry data, producing actionable feedback loops and enabling automated changes and optimizations to infrastructure and application runtime deployment. |
Collects data. Infrastructure monitoring collects valuable metrics such as CPU, memory, response time, error rates, and latency. | Correlates metrics. Observability brings together metrics from disparate systems, identifying specific problems, so you can quickly understand how they relate to one another. |
Watches defined systems. Monitoring keeps track of the health of important systems. | Interprets data from complex systems. Observability allows for granular insights and debugging, enabling teams to correct problems as they happen. |
What video games can teach us about monitoring vs. observability
What are the benefits of an observability architecture?
Observability delivers both operational and business benefits:
◼ |
Streamlines problem resolution. Data and analytics provide actionable insights that help resolve any production issues quickly. |
|
◼ |
Enables self-service monitoring and troubleshooting. Collected data can be made available to a wide group of stakeholders and end users. |
|
◼ |
Improves availability for infrastructure and services. Helps speed up mean time to detection (MTTD) and mean time to resolution (MTTR) for both infrastructure and application failures. |
|
◼ |
Reduces bottlenecks and overprovisioning. Supports proactive planning to avoid infrastructure congestion, minimize infrastructure overprovisioning, and optimize capacity usage. |
|
◼ |
Accelerates app development. Data from application logs and metrics help developers fine-tune runtime and deployment. |
|
◼ |
Improves app monitoring. Distributed tracing helps teams monitor transaction health and the response times of each application component. |
What to keep in mind if you’re considering observability
Modern cloud-based application architectures can make it more difficult to troubleshoot performance issues. Being able to quickly understand how complex systems interact and behave is the essence of observability. The observability pattern can reveal the performance of a service in real time.
Observability helps you manage complexity
Cloud technologies introduce a level of technological and organizational complexity. Observability expands the scope of traditional IT monitoring by applying a system-level view to telemetry data and helps manage complexity with actionable feedback loops, enabling better visibility and automation.
Creating an observability architecture
Developers and operators may utilize a variety of tools to collect, track and analyze data as part of the observability process. These tools need to work together—or you need integration higher up the stack—to create an observability architecture that covers all the components supporting an application. Understanding measurements in context is a critical goal of observability and requires a central place to ingest all of the telemetry. You may need to be able to interrogate and quickly correlate a lot of data to identify the root cause of an issue.
Observability includes three important data sources:
◼ |
Metrics. Metrics help you quantify performance. If hardware or software is slow or broken, metrics help pinpoint where the problem originates. Metrics measure, monitor and alert based on latency, traffic, memory usage and other elements of infrastructure and applications. |
|
◼ |
Traces. From the front end to back-end services to databases, there are many possible reasons a request can take longer than usual. Distributed tracing follows requests through the infrastructure, measuring the performance and response time along the way. If a request takes 500ms, a trace can tell you which service or services are contributing to that degraded performance. Popular open source standards include Open Telemetry, Zipkin, Jaeger or Spring Cloud Sleuth. |
|
◼ |
Event logs. Collecting and analyzing logs is relatively standard but sifting through and quickly analyzing huge volumes of unstructured data can be overwhelming. Containers and microservices can generate local logs for every discreet incident, and quickly become unmanageable. To be useful, logs typically have to be aggregated in a central location. There are many available logging solutions, both open source and commercial. For example, Elasticsearch, Fluentd, and Kibana (“EFK stack”) is popular. |
Of these three, tracing is generally the hardest to implement. Although metrics and logging can rely on standard solutions, tracing requires instrumentation of every service.
Observability and VMware Tanzu
VMware Tanzu delivers the observability advice and tools you need to get started on your cloud native journey. VMware Tanzu delivers enterprise-grade observability and analytics at scale with granular controls, allowing you to achieve higher levels of application health and availability for an overall improved end user experience.
Get enterprise observability for multi-cloud environments. Monitor everything from full-stack applications to cloud infrastructures with metrics, traces, event logs, and analytics.
◼ |
VMware Tanzu Platform. Get analytics plus fully integrated visibility, tracing, and alerting into applications and platform components. |
|
◼ |
Spring Boot. Provide integrated observability for Java developers using the Spring ecosystem. |
よくある質問(FAQ)
What is the difference between observability and monitoring?
Monitoring, a subset of observability, focuses on gathering metrics and log data, with alerts when set thresholds are exceeded. In contrast, observability delivers actionable information where intelligence is applied to telemetry data, producing actionable feedback looks and enabling automated changes and optimizations to infrastructure, allowing you to drill down into the disparate systems that make up an application.
Why do we need observability?
Observability is the practice of understanding a complex system's internal state by observing its external outputs. Proper instrumentation of observability allows enterprises to aggregate metrics, traces, logs, and events from a distributed system and correlate them across various application components and services, allowing you to troubleshoot performance issues and optimize cloud native infrastructure.
What are the benefits of observability in DevOps?
Observability enables DevOps teams, including software developers and operators, to identify problems where and when they happen in distributed systems.
What are the key components of an observability architecture?
When creating an observability architecture, there are three important data sources, including metrics to help you quantify performance, traces to follow requests through the infrastructure, and event logs like Elasticsearch, Fluentd, and Kibana.