Replacing Nagios Checks with Wavefront Intelligent Alerting

November 9, 2016 Parag Sanghavi

Nagios monitors IT infrastructure components including servers, network devices, operating systems, and application sub-systems.

First launched in 1999, Nagios has community-provided plugins for monitoring a large set of components. It has grown to become the near de facto way to do basic monitoring and “checks” (simple alerts) of servers and networks.

However, most users of Nagios will agree that:

It’s very hard to scale for a large modern cloud application infrastructure and particularly as micro services are used.
Checks are slow and expensive to run, which is why many users supporting larger estates do them no more than every 5 minutes.
Tech Ops is often drowned in alert storms, which quickly lead to alert fatigue and people ignoring Nagios.
It has the potential to be easily broken, if you don’t know what you’re doing.
Configurations can quickly get messy, if not carefully managed.
Its web user interface is complicated and hard to understand for new users.

If you’ve encountered any of the issues listed above, then I suggest the time is now for a different mindset in how to monitor and alert on your cloud application environment.

Specifically, rather than setting an alert condition based on a discrete metric value at a single point-in-time, set it based on analytics over a time-series of metric values with Wavefront.

In this later approach, Wavefront evaluates a stream of continuous values and then applies a meaningful combination of analytics functions to determine an alert condition.

This time-series and analytics-based approach enables far more intelligent alerts – e.g. based on dynamic baselines – as well as far more efficient management of all the alerts watching over your estate.

wavefront-alerts
Figure 1. Wavefront makes it easy to manage and maintain hundreds of alerts across your entire stack, administered by each team’s alerts or by all the alerts in aggregate.

A key advantage of Wavefront over Nagios is that, with Wavefront, the analytics for defining an alert condition resides in its backend and you fully control the analytics logic.

With Nagios, the alert condition logic is in its plugin which can be error prone and hard to get right. This also means that when you make changes to your production environment, you may likely need to rewrite the particular Nagios plugin, while with Wavefront, you can easily update the alert from its simple, alert management web portal (and back-test the updated alert on historical data!).

Figure 2. Use Wavefront’s Backtesting feature to see how often your new alert would have fired on historical data, increasing alert quality before it’s rolled into production.

How do we go about doing it with Wavefront

To illustrate with a very simple example, we’re going to monitor disk space across multiple server instances.

We’ll be collecting disk space related metrics using collectd, however, the logic to alert when the free disk space threshold crosses is done within the Wavefront cloud service for all the server instances.

The advantage with this collectd approach is that the collectd.conf file never changes and you’re less prone to a configuration error as your software changes.

Creating alerts in Wavefront is extremely easy. We use a simple but powerful query language to notify when the time-series metric stream for free disk space, ‘df.available’, falls below 1G.

Additionally, we can further filter and refine the alert using tags (which provide additional metadata about the time-series metric). We then check only for hosts which have the “prod” tag and the file system type, “ext4”.

Figure 3. Wavefront alerts are analytics-driven, and any query can be converted into an alert from a chart. Analytics-driven alerts reduce false positives and false negatives.

As you see on the screen above, from a single, web portal, we’re able to specify alert logic that can be replicated to multiple instances, making it easy to scale alerts as the environment grows and as the software disk profile changes.

As you’d also expect, Wavefront alerts can be forwarded to a variety of other systems, from email to tools like PagerDuty for on-call notification, or via Webhooks, to practically any other system that processes alerts.

Final thoughts: think augment-&-extend when rip-&-replace isn’t practical

For most of our customers, it isn’t practical to simply rip out Nagios and replace it with Wavefront. Instead, Wavefront is deployed on top of Nagios, and the tools co-exist for some time.

By augmenting the legacy Nagios installation – Wavefront ingests the Nagios metrics – Wavefront then extends monitoring and alerting to the new parts of the environment.

This “augment-&-extend” approach gives Tech Ops teams all the time they need to migrate off Nagios methodically, plus start to get the immediate value of Wavefront.

Wavefront helps you scale with more intelligent, analytics-driven alerts that reduce alert fatigue, alerting on the things that matter and need immediate action.

Its architecture enables alerting to scale across enterprise environments, while its alert management capabilities help you to consolidate more of your alerting onto a single platform, making it easier to maintain as your cloud applications grow.

If you’re currently using Nagios, and you’re interested in a different and better way to do alerting, then talk to us. We can set you up with a Wavefront trial account immediately. You’re only a few minutes away from intelligent alerting that will make you look smarter too.

Try It Now!

The post Replacing Nagios Checks with Wavefront Intelligent Alerting appeared first on Wavefront by VMware.

Monitoring Metamorphosis: How To Create Metrics from Log Data in Wavefront

The post Monitoring Metamorphosis: How To Create Metrics from Log Data in Wavefront appeared first on Wavef...

Collectd vs. Telegraf: Comparing Metric Collection Agents

The post Collectd vs. Telegraf: Comparing Metric Collection Agents appeared first on Wavefront by VMware.

Visionary in Gartner® Magic Quadrant™

Learn More

Return to Home

Replacing Nagios Checks with Wavefront Intelligent Alerting

How do we go about doing it with Wavefront

Final thoughts: think augment-&-extend when rip-&-replace isn’t practical

Previous

Next

Replacing Nagios Checks with Wavefront Intelligent Alerting

How do we go about doing it with Wavefront

Final thoughts: think augment-&-extend when rip-&-replace isn’t practical

Previous

Next

Related content in this Stream

Monitoring collects data, while observability offers contextualization and strategic insights into complex systems. Learn more about the differences and why observability is so powerful.

The unified observability platform in VMware Aria Operations for Applications brings together metrics, traces, and log management to deliver critical business outcomes.

With nearly 100 percent compatibility with Grafana dashboard queries, VMware Tanzu Observability delivers excellent support for PromQL.

VMware Tanzu Observability offers easy integration with AWS CloudTrail, enabling operators to view events related to governance, compliance, and operational and risk auditing for your AWS account.

See how VMware Tanzu Observability gave a British smart meter company unprecedented visibility into its platform and smoothed the path creating more innovative products.

A change to Grafana licensing means limited functionality for users of some platforms that rely on it. Here’s how Tanzu Observability can fill the gaps.

OpenShift users can now take advantage of VMware’s revamped full-stack monitoring solution of Kubernetes clusters with Tanzu Observability by Wavefront.

Updates to VMware Tanzu Observability include new ecosystem integrations and usability features designed to improve incident response.

We are holding two different design studio research sessions at VMworld that will give you the opportunity to influence the direction of VMware Tanzu Observability.

In addition to VMware Tanzu Observability supporting various instrumentation and ingestion methods for distributed tracing, it now natively supports OpenTelemetry.

Highlights from SpringOne Day 2 include more details about Tanzu Application Platform, demos of Application Accelerator and Tanzu Observability, plus summaries of some of our favorite talks.

We’re excited to announce enhancements to the VMware Tanzu Observability by Wavefront platform.

The integration of Jaeger with Tanzu Observability will help you visualize the application traces and identify any errors or performance issues.

We at VMware Tanzu recently published our first-ever summary of the current state of observability, a report entitled The State of Observability 2021.

The VMware Tanzu Observability by Wavefront engineering team recently completed 30 days of improvement focused on query quality.

VMware Tanzu Observability was named as a fast-moving leader in technology research and analysis provider GigaOm's forward-looking assessment of the cloud observability vendor space in 2021.

VMware recently announced that Apdex is now available in Tanzu Observability by Wavefront.

Companies running cloud-native apps and infrastructure will improve the user experience and boost app availability by adopting real-time alerting and predictive analysis.

New functionalities of Tanzu Observability by Wavefront accelerate analytics-driven insights and data onboarding for DevOps teams, including developers, Kubernetes operators, and wider ops teams.

Looking for a way to proactively troubleshoot complex application performance issues? Look no further than Tanzu Observability by Wavefront.