How to Search for Outlier Traces: A Guide to Wavefront Query Language for Distributed Tracing

September 25, 2019 Sushant Dewan

Distributed tracing is a critical piece of application observability. But, the sheer number of traces containing a ton of information can be overwhelming. In this blog, I’ll show you concrete examples of how the Wavefront Query Language can be applied to distributed tracing and quickly get you answers to all your questions, significantly reducing troubleshooting time.

Sample Microservices Application: BeachShirts

To explain the concept, we built a polyglot microservices-based sample application to order beach shirts. The following diagram shows the architecture of the BeachShirts app. In this blog, we’ll reference this BeachShirts app to identify outlier traces and then search for those using the Wavefront traces() query language.

As you can see in the above diagram, when someone tries to order shirts, the end to end workflow is depicted by the numbered sequence of operations. Here’s how an end to end trace looks like for the “orderShirts” workflow:

Wavefront Query Language for Distributed Tracing

In the above diagram, notice that the end to end trace is taking 11.74 seconds. When it comes to microservices, this latency is unacceptable. At first glance, it’s easy to blame the shopping service (“ShoppingWebResource.orderShirts” operation) since it takes 11.56 seconds. But when you look closer at the critical path highlighted with the orange line, then it’s obvious that packaging service (“Packaging.giftWrap” operation) is taking the bulk of that time and slowing down shopping service. This key piece of information is something you would be missing without Distributed Tracing.

So now that we agree that tracing is invaluable, how can you search for those traces? Wavefront has built a very intuitive Traces Browser that uses Query Builder under the hood to show traces based on your search criteria. But what if you are a more advanced user who wants to leverage the Wavefront query language for tracing? You can easily switch from the Traces Browser mode from Query Builder to Query Editor mode.

Let’s take a step by step approach to crafting a traces() query. Let’s start with the basic use case and then work up to the more advanced use case.

Use case 1: Get Me All the Traces for orderShirts Operation

In order to do this, you can either use the Query Builder or the Query Editor. This is how you would search for orderShirts operation via Query Builder.

This will result in the following query:

limit(100, traces("beachshirts.shopping.ShoppingWebResource.orderShirts"))

You could have also used the Query Editor and typed in the above query manually. This is what the output would look like:

Notice that most of the traces are in the order of seconds as opposed to milliseconds. These traces are really taking long, almost ~10 seconds. What if you want to search for only long traces? How would you go about it?

Use case 2: Get me all traces longer than a certain threshold

Let’s say you only want to search for traces greater than a certain min (> 10 seconds) but less than some max (< 20 seconds). You can do this with the Duration filter.

It will result in the following query:

limit(100, lowpass(20s, highpass(10s, traces("beachshirts.shopping.ShoppingWebResource.orderShirts"))))

But wait for a second. This isn’t that helpful because as per the above trace, you’ll notice that the bottleneck lies in ‘Packaging.giftWrap’ operation. Here Packaging.gitfWrap is the critical span highlighted with the orange line. How would you narrow down traces where a certain span is a bottleneck? What if you knew that and wanted to search for all long traces with that span? How would you go about it?

Use case 3: Get me all the long traces for Packaging.gitWrap operation

You would issue a similar query but this time select ‘Packaging.giftWrap’ operation. The query would be as follows:

limit(100, lowpass(20s, highpass(10s, traces(spans("beachshirts.packaging.Packaging.giftWrap")))))

When you do this you would still get long traces where Packaging.giftWrap might not be the bottleneck. How do we narrow this down even further?

Use case 4: Filter all traces where a certain span (Packaging.gitWrap) is greater than a certain threshold

What you really want to do is search for traces where a Packaging.giftWrap is taking longer than a certain threshold. If you want to narrow down traces where Packaging.giftWrap is taking longer than 7 seconds, then you could wrap that span in a highpass filter. This is how that query would look like:

limit(100, lowpass(20s, highpass(10s, traces(highpass(7s, spans("beachshirts.packaging.Packaging.giftWrap"))))))

Use case 5: Only show me traces where Packaging.gitWrap had an error

Wait, we’re not done yet. What if you want to search for traces where Packaging.giftWrap had an error. Easy! Just apply error=true filter for that span, i.e. spans(“beachshirts.packaging.Packaging.giftWrap”, “error”=”true”)

You can narrow it down even further if you want to apply some tags to Packaging.giftWrap operation. Say if you want to retrieve traces where that Packaging.giftWrap is applicable for Production environment and location=Palo Alto, then your final query would look like this:

limit(100, lowpass(20s, highpass(10s, traces(highpass(7s, spans("beachshirts.packaging.Packaging.giftWrap", "error"="true" and "env"="Production" and "location"="Palo Alto"))))))

Conclusion

Tracing is most context-rich instrumented data that you can extract from your microservices application. At the same time, this data can be really verbose and noisy.

The Wavefront Query Language for traces provides the most flexible, powerful and extensible way to narrow down your search radius and get to the outlier traces faster. For more information, refer to our documentation and checkout our free trial today!

Get Started with Wavefront Follow @sushantdewan Follow @WavefrontHQ

The post How to Search for Outlier Traces: A Guide to Wavefront Query Language for Distributed Tracing appeared first on Wavefront by VMware.

About the Author

Sushant Dewan is a Staff Engineer at VMware and works on Wavefront by VMware. Wavefront by VMware is a monitoring and analytics platform that handles the high-scale requirements of modern cloud-native applications. Currently, Sushant is working on building different pillars of observability to monitor next-generation microservices applications deployed on containers and serverless platforms.
Follow on Twitter More Content by Sushant Dewan

How to Find Silent Failures in Your Cloud Services Faster with Join() Function

How do you find unknown unknows? How do you detect silent failures in your cloud services involving hidden ...

SLO Alerting with Wavefront

Editor’s Note: Author will be hosting a webinar on this topic Tuesday, November 19, 2019 at 10:00 AM PST. C...

How to Search for Outlier Traces: A Guide to Wavefront Query Language for Distributed Tracing

Sample Microservices Application: BeachShirts

Wavefront Query Language for Distributed Tracing

Use case 1: Get Me All the Traces for orderShirts Operation

Use case 2: Get me all traces longer than a certain threshold

Use case 3: Get me all the long traces for Packaging.gitWrap operation

Use case 4: Filter all traces where a certain span (Packaging.gitWrap) is greater than a certain threshold

Use case 5: Only show me traces where Packaging.gitWrap had an error

Conclusion

About the Author

Previous

Next

How to Search for Outlier Traces: A Guide to Wavefront Query Language for Distributed Tracing

Sample Microservices Application: BeachShirts

Wavefront Query Language for Distributed Tracing

Use case 1: Get Me All the Traces for orderShirts Operation

Use case 2: Get me all traces longer than a certain threshold

Use case 3: Get me all the long traces for Packaging.gitWrap operation

Use case 4: Filter all traces where a certain span (Packaging.gitWrap) is greater than a certain threshold

Use case 5: Only show me traces where Packaging.gitWrap had an error

Conclusion

About the Author

Previous

Next

Related content in this Stream

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.