Auto-Configured Application Topology and In-Context Logs at Your Service

June 23, 2020 Gordana Neskovic

With the adoption of distributed and microservices-based applications, keeping track of service dependencies is increasingly tough. Due to the complex nature of microservices dependencies, it’s hard for SREs and developers to quickly pinpoint abnormally behaving services and isolate an incident’s root cause. To address this complexity, VMware Tanzu Observability by Wavefront (Tanzu Observability) has added application maps to its distributed tracing portfolio of visual and interactive maps.

In addition to application maps, in this post you will also learn about enhanced Tanzu Observation log integration. When you spot an anomaly on a chart or receive an alert and want to investigate, you can use in-context log integration to quickly get more information that will help you troubleshoot further.

Auto-configured application topology

While Tanzu Observability service maps show which services are involved in a particular request, the application maps display all the distributed application services. They provide  an overview of how the application and services are linked, letting you focus on a specific service or communication by enabling you to drill down to their traces.

Tanzu Observability gets things done by enabling you to:

  • Quickly understand the application topology by visualizing all your application clusters and internal services communication

  • Find anomalous services before customers are impacted 

  • Get to the incident root cause in seconds by drilling down from the application map into traces of the services and their intercommunication edges

Understand application topology

Tanzu Observability creates dynamic and real-time application maps from automatically discovered instrumented services and their interconnections. On a map, nodes represent the services and lines represent the interactions between those services (edges).

For example, for the Beach Shirt application below, Tanzu Observability creates an application map. On it, the delivery service calling the notification service is shown as a line running from the delivery node to the notification node. 

Tanzu Observability creates dynamic and real-time application maps from application distributed traces

Hover over a node to get information about the service’s languages and frameworks, as well as key health status RED metrics—the request rate, errors percentage, and span duration. You can further narrow your focus on a single service by reducing the application map radius to it and any neighboring services interacting with it.

Hover over an edge to see the highlighted direction of the requests between the two services. You can click on the edge to view its RED metrics or to navigate to its tracing browser.

With the ability to take global and narrow views of the application map, both novice and experienced engineers can quickly understand the application architecture and observe how data flows through the application. This understanding can lead them to make changes to the architecture and ultimately, delivering an improved application performance. 

Find an anomalous service

Every second that a critical cloud service slows or goes down completely can directly impact customers and revenues. And in distributed applications, thousands of services run on ephemeral containers. That makes it hard to find services that behave anomalously. 

With Tanzu Observability’s application maps, you can quickly and easily pinpoint problematic service nodes and assess their impact. Simply look closely at anything highlighted in red. Metrics can be found directly on the service node; by following the upstream and downstream services, they will lead you to the  root cause of any bottleneck. You can also create alerts that let you know when the health of your services start to degrade. For example, you can set up a multithreshold alert that sends a warning email to the on-call SRE if the shopping service error percentage rises to between 5% and 10% and a severe PagerDuty notification to both the on-call SRE and the SRE manager if the error percentage climbs above 10%.

The Tanzu Observability application map supports external nodes as well. If your applications have service nodes that call different web services or datastores, it can tell you:

  • How many calls your service makes to a web service or datastore 

  • The percentage of erroneous calls

  • The latency of all calls

It can also help you understand the impact of third-party dependencies on your application’s performance, which you can use to tighten your service-level objectives and service-level agreements.

Get to the incident root cause in seconds

Knowing which services depend on each other can help you decide where to start an investigation, guide your root-cause analysis, and enable you to reduce the mean time to repair. If you get an alert associated with a service, you can proceed as follows:

  1. Find the service in the application map.

  2. Examine the edges, which show where data comes from and where it’s going. If you click on an edge, you can navigate to the tracing browser, where you’ll get out-of-the-box visibility into traces, histograms, RED metrics, and span logs for this edge.

Examine the edge with one click, from application map to tracing browser

  1. Drill down to the traces associated with the service. If you click on the service, you can transition to the curated dashboards and traces related to it.

Drill down from the application map to curated service dashboards and traces

In-context log integration

Tanzu Observability has had external links that pass metrics to external applications for quite some time. Among them is in-context log integration; its enhanced UI connects time series data to logs from tools such as vRLI, ELK Splunk, Scalyr, and others. It transfers metrics sources, time periods, and point tags to the logging tool via a URL, and enables the logging tool to provide details necessary to troubleshoot your application.

After setting up the log integration, you can click through from Tanzu Observability time series data directly to the relevant log entry in your logging system. Doing so will instantly provide you with more information about the synchronous applications, servers, operating systems, storage, firewall, and network devices events happening at the time of the incident you have observed or have been alerted to. Having this information at the tip of your fingers allows you to quickly troubleshoot distributed applications across both hybrid and cloud environments.

 One click from a Tanzu Observability time series takes you  directly to relevant vRLI log entries

In five easy steps, you can create a log integration:

  1. Select Browse > Create Log Integration

  2. Provide a link name and description

  3. Specify a filter for limiting the application of log integrations to the time series of interest

  4. Specify the link URL template (source, start and end time of the chart window, and point tags associated with the time series)

  5. Click Save

To start exploring your own applications with Tanzu Observability application maps and in-contest logs, or to see Tanzu Observability in action, check out our free trial today.

VMware Tanzu Mission Control Now Integrates with VMware Tanzu Observability by Wavefront
VMware Tanzu Mission Control Now Integrates with VMware Tanzu Observability by Wavefront

Integrating VMware Tanzu Mission Control and VMware Tanzu Observability by Wavefront makes Kubernetes manag...

VMware Carbon Black’s Self-Healing, Auto-Scaling Infrastructure, Powered by Observability
VMware Carbon Black’s Self-Healing, Auto-Scaling Infrastructure, Powered by Observability

To build reliable, scalable, trustworthy applications, VMware's Carbon Black infrastructure team turned to ...