Containers are Leaky Abstractions (and other truths I hide from my kids)

April 10, 2017 Dev Nag

I recently sat on a panel at the ContainerWorld 2017 conference, discussing the maturity model of containers — and where we were (as an ecosystem) on the spectrum from stable technology, to monitoring, to basic management, to being part of the “invisible background” of technology infrastructure like TCP/IP stacks.

All of the panelists agreed that we had just passed the milestone of stable(-ish) technology, but that monitoring, management, and full maturity were well into our collective future. I went further, giving a reason that tried to explain why we seemed to be in the perpetual spring of this transformative (and very well-invested) technology:

“Containers are leaky abstractions.”

My co-panelists stopped and nodded.

I gave a brief explanation for the audience, but I wanted to elaborate and motivate my comment a bit more deeply. First, I’m going to take a step back, and explain what leaky abstractions are (and where they come from). Then, I’ll give you a concrete example of how to work with leaky abstractions using Wavefront — by the end, you’ll understand why leaky abstractions require dynamic monitoring platforms.

What is a Leaky Abstraction?

First, some context. What is a leaky abstraction? Software systems are built through hierarchical decomposition — no one can track the activity of every logic gate or transistor in the execution of typical software — so we abstract over complex layers with simpler (not necessarily simple) interfaces. We commoditize different systems to look similar, which is how I can write a python script on my laptop and run it on either a production server or on my iPhone.

And as any experienced engineer can tell you, almost all abstractions are leaky. Their ‘leakiness’ falls on a spectrum, from relatively low to surprisingly high (browser compatibility, I’m looking at you!). Outside of mathematical models, no abstraction is perfect. The underlying reality will always assert itself — through performance, selective implementations, side effects, and many other behavioral clues. Joel Spolsky’s “Law of Leaky Abstractions” sums it up well — any non-trivial abstraction is leaky to some degree. I’d clarify that to say that any abstraction which is not merely the identity function is leaky, to some degree. And such an “abstraction” would hardly deserve the name.

What determines how leaky an abstraction is? As with so many of The Big Questions, it comes down to information theory. Think of the underlying system — and the abstraction/interface above that system — as being two linked but separate entities. The first has a complexity (entropy, degrees of freedom, dimensionality) of behavior. The second has a complexity of specification.

The mismatch between those two complexity measures tells you how ultimately leaky the abstraction will be (or provides, at the very least, a lower bound). When the mismatch is wide — encapsulating inherently complicated systems with simple interfaces— the interface can fail to provide the right measure of understanding and control on the underlying system.

For operational systems that we need to manage, this means that we’re forced to understand multiple levels simultaneously — holding one hand on the steering wheel even as we poke around the engine.

This will remind some readers about the trade-offs in compression. You can only compress a given set of data so far without lossiness, based on its entropy (or more accurately, the Kolmogorov complexity). Interfaces compress behavior in the same way, and if you try and stretch the compression ratio too high, you’ll end up with abstraction “lossiness”: some underlying behavior that isn’t accounted for in your interface, and which could (and probably will) come back to bite you.

The Container Challenge

Containers are explicitly an abstraction. That’s why we use the word “container” — because we’re trying to virtualize the underlying operating system across multiple contexts, each of which is (somewhat) contained. The benefits that containers promise regarding agility, dependency management, and continuous delivery are unmistakable.

Anyone who has used Docker for more than a year is painfully aware of the leakiness of the container abstraction — breaking changes between versions, the revolving door of recommended storage drivers, I/O performance in certain configurations (cough loopback cough), garbage collection of old images, etc. As my co-panelists agreed, they were willing to take some pain to get the benefits — they just wanted a better way to handle the monitoring and management side, now.

Proper monitoring matches the problem space. When the problem area includes an array of leaky abstractions, that means being able to provide fast insight at dynamic abstraction layers as well as intermingling different levels of details/description. I’ll give you a simple example using Wavefront.

Uncover Hidden Battles for Resources with Wavefront Unified Metric Analytics

Suppose you’re running two microservices, each isolated to their container, each responsible for processing some pipeline of external requests (perhaps browser vs. API). And suppose those two containers share some set of resources — say, CPU and storage but not network. Here are the request rates for each of the services in “peacetime”:

blog1

Note that the two request rates don’t look connected to each other. But then, all of a sudden, at about 07:00, you see the behavior change to this:

blog2

Now, we have an almost perfect inverse correlation between the two metrics. What’s going on?

This pattern is often a tell-tale sign of a shared constraint between the two services. Normally, their rates are driven by uncorrelated external demand. But when some shared resource gets pegged, they’re fighting a zero-sum game over internal supply. And what was uncorrelated becomes, well, correlated. When one service gets more, the other gets less, and neither of them is getting all the capacity they want. (Parents with multiple kids will know exactly what I’m talking about.)

Now, which shared resource is the problem? Is it the CPU or the storage? Let’s pierce the abstraction a little bit and look at those two underlying resources directly:

blog3

Interesting. The CPU (blue, left axis) is slightly higher than it was, certainly, but it seems to have the same basic shape as during peacetime. Storage IOPS (orange, right axis), however, has hit some ceiling and is spending most of its time near the max. Interpreted as distributions, the CPU distribution has shifted slightly while the IOPS distribution has undergone a very large shift as well as showing massive negative skew.

Using Wavefront Metrics for Aggregated Container and Microservices Monitoring

What if we wanted to track all four of these metrics in the same conceptual space? We can actually derive a conditional metric — one which shows up only in certain conditions (here, when the API and browser metrics show a very strong negative correlation, and when either of the resource metrics is showing signs of saturation). In this case, the correlation (green) drops to near -1, and the IOPS line (in orange) starts showing up:

blog4

If we put this into a table, we can just read off the symptoms and the most likely cause at a glance. Using Wavefront platform we can see the User/API correlation has dropped to near -1, and the IOPS looks awfully suspicious just below it:

blog5

There are two things going on in our example. First, we were able to see different levels of the system in a single context. Second, we’re able to create a new abstraction layer (our conditional metric) instead of just passively tracking the abstraction(s) provided to us. Especially in cloud applications, those existing abstractions just aren’t enough.

We can further iterate that dynamic abstraction layer based on our ability to manage the system to our own KPIs; two teams that use microservices and containers in different ways may have very different perspectives on what matters for their abstraction.

This ability to create dynamic abstractions wasn’t part of the table stakes in monitoring traditional three-tier architectures a generation ago, but it has become so in the modern era. If you want a more hands-on experience with what I’m talking about on your data, try Wavefront.

Attending Dockercon 2017? Stop by Wavefront booth to see our container monitoring and other demos.

Get Started with Wavefront Follow @devnag Follow @WavefrontHQ

The post Containers are Leaky Abstractions (and other truths I hide from my kids) appeared first on Wavefront by VMware.

CEO’s Letter to Wavefront Customers, Users, Partners, and Friends

To customers, users, partners and friends of Wavefront, This morning, VMware announced its intention to acq...

AWS Lambda Serverless, Automatic Rollbacks, and Wavefront Monitoring

Automatic rollbacks are a way to quickly revert back to a previous version of your application to limit the...

Visionary in Gartner® Magic Quadrant™

Learn More

Return to Home

Containers are Leaky Abstractions (and other truths I hide from my kids)

What is a Leaky Abstraction?

The Container Challenge

Uncover Hidden Battles for Resources with Wavefront Unified Metric Analytics

Using Wavefront Metrics for Aggregated Container and Microservices Monitoring

Previous

Next

Containers are Leaky Abstractions (and other truths I hide from my kids)

What is a Leaky Abstraction?

The Container Challenge

Uncover Hidden Battles for Resources with Wavefront Unified Metric Analytics

Using Wavefront Metrics for Aggregated Container and Microservices Monitoring

Previous

Next

Related content in this Stream

Monitoring collects data, while observability offers contextualization and strategic insights into complex systems. Learn more about the differences and why observability is so powerful.

The unified observability platform in VMware Aria Operations for Applications brings together metrics, traces, and log management to deliver critical business outcomes.

With nearly 100 percent compatibility with Grafana dashboard queries, VMware Tanzu Observability delivers excellent support for PromQL.

VMware Tanzu Observability offers easy integration with AWS CloudTrail, enabling operators to view events related to governance, compliance, and operational and risk auditing for your AWS account.

See how VMware Tanzu Observability gave a British smart meter company unprecedented visibility into its platform and smoothed the path creating more innovative products.

A change to Grafana licensing means limited functionality for users of some platforms that rely on it. Here’s how Tanzu Observability can fill the gaps.

OpenShift users can now take advantage of VMware’s revamped full-stack monitoring solution of Kubernetes clusters with Tanzu Observability by Wavefront.

Updates to VMware Tanzu Observability include new ecosystem integrations and usability features designed to improve incident response.

We are holding two different design studio research sessions at VMworld that will give you the opportunity to influence the direction of VMware Tanzu Observability.

In addition to VMware Tanzu Observability supporting various instrumentation and ingestion methods for distributed tracing, it now natively supports OpenTelemetry.

Highlights from SpringOne Day 2 include more details about Tanzu Application Platform, demos of Application Accelerator and Tanzu Observability, plus summaries of some of our favorite talks.

We’re excited to announce enhancements to the VMware Tanzu Observability by Wavefront platform.

The integration of Jaeger with Tanzu Observability will help you visualize the application traces and identify any errors or performance issues.

We at VMware Tanzu recently published our first-ever summary of the current state of observability, a report entitled The State of Observability 2021.

The VMware Tanzu Observability by Wavefront engineering team recently completed 30 days of improvement focused on query quality.

VMware Tanzu Observability was named as a fast-moving leader in technology research and analysis provider GigaOm's forward-looking assessment of the cloud observability vendor space in 2021.

VMware recently announced that Apdex is now available in Tanzu Observability by Wavefront.

Companies running cloud-native apps and infrastructure will improve the user experience and boost app availability by adopting real-time alerting and predictive analysis.

New functionalities of Tanzu Observability by Wavefront accelerate analytics-driven insights and data onboarding for DevOps teams, including developers, Kubernetes operators, and wider ops teams.

Looking for a way to proactively troubleshoot complex application performance issues? Look no further than Tanzu Observability by Wavefront.