Wavefront Adds Kudu and Impala Integrations for a Holistic View of the Hadoop Ecosystem

December 18, 2018 Humphrey Huang

Apache Kudu is an open source storage technology designed for the Apache Hadoop ecosystem. It complements HDFS and Apache HBase by supporting both fast data insertion and queries across large volumes of data. Since Kudu is a storage engine, it is often used in tandem with Apache Impala, an open source SQL query engine featuring massively parallel processing capabilities. Both of these tools are designed to be highly performant.

As key pieces of the Hadoop ecosystem supporting streaming (“fast”) data, it is important to ensure that these two services are up and running smoothly. With Wavefront’s Apache Kudu and Apache Impala integrations, you can not only ensure that Kudu and Impala are up and running, but also monitor them alongside other services in your Hadoop ecosystem (such as, HDFS, YARN, Spark, Kafka, Solr, and ZooKeeper) while correlating other infrastructure and application metrics for a complete end to end view.

These integrations allow Wavefront to be the first pane of glass view into Kudu and Impala, as well as the rest of the Hadoop ecosystem. Administrators can quickly identify and address performance issues or plan for future resource requirements. Developers can observe real-time metrics and past trends to understand whether, say, schema design or service configurations need refinement or whether queries need to be tuned.

Monitoring Kudu for Optimal Performance

Let’s start with monitoring Kudu. The Kudu integration collects metrics for the Master and Tablet Server processes as well as for individual tablets. Tablet metrics are tagged with the table name.

At a high level, when we are monitoring Kudu, we can start by tracking the number of running Masters and Tablet Servers. Multiple Masters is recommended for high availability, so it is helpful to see, at a glance, that we have enough Masters up and running. Similarly, to ensure that data is always available, we want to make sure that we have multiple Tablet Servers up to host tablet replicas based on the replication factor configured for each table.

It is also helpful to keep track of the number of tablets each Tablet Server is hosting. At a glance, we can see whether tablets are replicated as expected or if we are nearing any limitations. For instance, at the time of this writing, the recommended maximum number of tablets per Tablet Server is 2000. With the Wavefront Kudu integration charts, we can quickly tell if any Tablet Server is approaching this limit.

Performance Metrics

From a performance standpoint, we can observe average read and write times on the Tablet Servers. We can also observe the various percentiles for these times, such as the 95th or 99.99th percentiles. If we observe a slowdown, that can prompt further investigation into the Tablet Servers.

We can also bring in host metrics and correlate resource usage. We can see, for example, when Tablet Server data directories are approaching their maximum allocated disk space. Kudu emits metrics for the number of full data directories, but OS metrics (easily obtained by setting up Wavefront Operating System integrations) can also help us see when the overall disk space of the host is running low.

The out-of-the-box dashboard for Kudu includes charts for many of the scenarios described above. Kudu emits many more metrics and creating your own custom Kudu dashboard is as simple as cloning the built-in dashboard and adding your own custom charts.

Run, Impala, Run!

Now, let’s discuss how we can monitor Impala to keep it running at peak efficiency. The Impala integration collects metrics for the Impala Daemons, Catalog Server, and Statestore processes.

When we’re monitoring Impala, at the top level, it’s helpful to ensure that the expected number of Impala Daemons, as well as Catalog Server and Statestore processes, are up and running. Since Impala Daemons access data on each Data Node directly, we want to ensure that we have one running on each node. Therefore, by tracking the number of Impala Daemons that are up, we can quickly see when this number doesn’t match the number of Data Nodes. At the same time, while an unexpected drop in Impala Daemons is a more critical issue, we still want to ensure that both the Catalog Server and the Statestore are up and running. So, we can also keep track of the number of running Catalog Servers and Statestores.

Monitoring Impala Daemons Memory Usage

Impala Daemon memory usage is another critical area to monitor since Impala queries are often memory intensive. Impala Daemons emit metrics such as total memory usage (TCMalloc and buffer pool), Resident Set Size of the process, and JVM usage. We can also track these on our dashboard to easily tell when the daemons have (or will have) a memory issue.

Furthermore, when Impala is close to exceeding memory limits on the host, queries can spill to disk. This additional IO affects performance. Therefore, we can also track several of the metrics that indicate the occurrence of spilling. For example, we can observe metrics that track the total number of queries spilling to disk. We can also observe metrics tracking the number of bytes written to disk by the IO manager. During spill to disk, temporary files are written to scratch directories so if we observe that the number of bytes written to disk is spiking, we know that queries are spilling to disk and thus, we are running close to memory limits.

Catalog Server and Statestore Metrics

The Catalog Server process also emits memory metrics. When there are a large number of tables, partitions, and data files, the Catalog Server may run out of memory. Tracking memory metrics will help with proactive mitigation of OOM (Out of Memory) errors.

Another Catalog Server metric that is helpful to track is active connections. Since the Catalog Server provides metadata updates (resulting from changes to tables or data) to all the Impala Daemons, we want to ensure that each daemon is connected.

Similarly, we also want to make sure that each Impala role is communicating with the Statestore since the Statestore tracks health statuses and broadcasts these updates. So, in our dashboard, we track the metric for active connections to the Statestore. The image earlier with Single Stat charts illustrate monitoring active connections for both the Catalog Server and Statestore.

On the subject of broadcasts, it will also be helpful to track how long it took for the Statestore to send updates to its subscribers. This duration usually is short and is useful to know when setting up appropriate receive timeouts on the subscriber side. However, if this duration increases beyond the timeout subscribers may think that the Statestore has failed. Additionally, we can also track completion time for heartbeats that the Statestore sends to its subscribers. The Statestore sends heartbeats to its subscribers to verify that subscribers are up and running. When this heartbeat duration increases substantially, there may be an issue with the network or with the subscribers. Subscribers are typically the Impala Daemons and the Catalog Server.

The out-of-the-box dashboard for Impala contains charts for many of the scenarios above. Impala emits many more metrics and creating your custom Impala dashboard is as simple as cloning the built-in dashboard and adding your own custom charts.

A Holistic View into Your Hadoop Ecosystem

Wavefront has integrations for many other services in the Hadoop ecosystem including HDFS, YARN, Spark, Kafka, Solr, and ZooKeeper. With these integrations, as well as other infrastructure and application metrics, you can have a holistic view of your entire environment. Start sending in your Kudu and Impala metrics today with a free trial!

Get Started with Wavefront Follow @humphreyATX Follow @WavefrontHQ

The post Wavefront Adds Kudu and Impala Integrations for a Holistic View of the Hadoop Ecosystem appeared first on Wavefront by VMware.

Previous
Visualize Your Application and Networking Metrics Together with Wavefront’s SNMP Integration
Visualize Your Application and Networking Metrics Together with Wavefront’s SNMP Integration

Simple Network Management Protocol (SNMP) is a protocol widely used since 1988 to monitor and manage networ...

Next
5 Proven Tips for Instrumenting Your Application Code with Open-Source Metrics Libraries and SDKs
5 Proven Tips for Instrumenting Your Application Code with Open-Source Metrics Libraries and SDKs

In distributed cloud applications, there are inherently many points of failure. They can stem from frequent...

Visionary in Gartner® Magic Quadrant™

Learn More