Metric Store: A Cloud-Native Time Series Database for Cloud Foundry

April 10, 2019 Todd Persen

Cloud Foundry strives to simplify operational tasks for the application developer whenever possible.

For example, Cloud Foundry (CF) makes application deployment and application monitoring relatively trivial for developers. Self-service access to telemetry is a big part of this convenience. CF serves up logs, events, and metrics so developers can better understand the health of their app.

This experience within CF has gotten even better recently with the addition of the Log Cache, an in-memory firehose cache.

Today, we are pleased to announce a new feature that even further improves the access to telemetry: Metric Store, a time-series database for Cloud Foundry. Let’s talk about how this new capability improves on the idea behind Log Cache, and ultimately, the developer experience.

When we initially released Log Cache, one of the most frequent user requests was a longer cache duration and durability across VM restarts. We addressed some of those desires with more VMs and more memory. But we quickly understood that the community was asking for a different product altogether. When we continued hearing requests for persistence, compression, and a robust query interface, it became clear that what they really wanted was a time series database.

Metric Store features the auth model from Log Cache, the PromQL API from Prometheus, and the Time-Structured Merge storage engine from InfluxDB. After mixing this all together, we created Metric Store, a new component that persists all metrics from the Cloud Foundry Loggregator pipeline on its disk.

Here are three things you need to know about this new data store:

  1. It’s multi-tenant aware. You only have access to metrics from your apps.

  2. It’s easy to query. Metric Store is 100% compatible with the Prometheus API.

  3. It has a powerful storage engine. The InfluxDB storage engine has built-in compression and a memory-efficient series index.

Install Metric Store, and you’ll get a single VM to ingest all the counter and gauge metrics from the Loggregator Reverse Log Proxy. And you’ll be able to query them immediately using the PromQL HTTP API.

Metric Store is now available for open source users of Cloud Foundry. You can deploy a single-node Metric Store with the BOSH release on the official BOSH release registry. The code is up on GitHub.

We are also exploring a commercial version for use with Pivotal Cloud Foundry. We can imagine that some customers might be interested in multi-node deployments, data replication, hinted handoff, and load balancing in a highly-available configuration. (We’re exploring how Metric Store might work for Kubernetes as well.)

Now you know how Metric Store improves app observability in Cloud Foundry. Let’s examine how it works.

An Inside Look at Metric Store

Metric Store is composed of four processes:

  1. The Nozzle connects to the RLP and provides filtered and formatted data to Metric Store.

  2. The Gateway is a gRPC to JSON converter that allows incoming HTTPS queries from GoRouter.

  3. The Auth Proxy takes in valid PromQL queries and filters access based on user-provided UAA tokens.

  4. And finally, Metric Store is responsible for data storage and query processing.

This diagram shows how they all fit together:

With the exception of Metric Store itself, most of the other components are very similar to their counterparts in Log Cache. There were some additions made to the Auth Proxy to allow expanded PromQL support. But otherwise very little has changed.

The Metric Store process is where the magic happens. As mentioned above, we made the decision to use the InfluxDB storage engine for data persistence. This delivers great read and write performance in addition to native compression. We are also using the new TSI index format, which provides a tremendous reduction in the memory needed to store high-cardinality data. Additionally, on-disk shards span one day, so it’s easy to truncate an entire day of data when it reaches the user-configurable retention period.

The storage engine features a flexible query interface, which in turn simplified the implementation of the PromQL query parser. The end result is a full-featured PromQL API. That means Metric Store can operate with other Prometheus-compatible tools, such as Grafana.

How to Deploy Metric Store with Cloud Foundry

Want to test Metric Store with an open-source Cloud Foundry installation? You can use the operations file available in the Metric Store release. We recommend you deploy this along cf-deployment. It will provide you with a metric store that is available at https://ossms.SYSTEM_DOMAIN. It will automatically ingest all application metrics and platform metrics.

Check out the readme of the metric-store-release repository for additional details and useful information.

Once you have Metric Store up and running, you can start using it for application or platform monitoring. In the following example, we explain the basics of the Prometheus Query language. Then, we show the power of the Metric Store with the help of a few sample use cases.

The Basics

The Prometheus Query Language allows you to query for a metric, and then filter the results by tag. (See here for more details). For example, when you want to see the current CPU consumption of the third instance of your app, you can use the following query:

cpu{source_id=APP_GUID,instance_index=3}

You can also get past data. If you are interested memory consumption of the same app instance over the last 3 hours, you can modify the above query like this:

memory{source_id=APP_GUID,instance_index=3}[3h]

Use Case: Find all apps that in average used less than 25% of their allocated memory over the last six hours

Imagine you want to create a list of apps that don’t use their allocated memory efficiently. You could downscale these apps to save money. You can run the following query against Metric Store to identify the potential candidates for downscaling:

curl -s -G -k -H "Authorization: $(cf oauth-token)" http://ossms.system.johannes.loggr.cf-app.com/api/v1/query --data- urlencode  "query=avg(avg_over_time(memory[5m])) by (source_id) / avg(avg_over_time(memory_quota[5m])) by (source_id) < 0.25" |  jq  .

This query provides a result like the following, which identifies overscaled applications with their application GUID and their current memory consumption.

{
 "status": "success",
 "data": {
   "resultType": "vector",
   "result": [
     {
       "metric": {
         "source_id": "a44b33e4-82dd-4566-9814-8b23a42a4558"
       },
       "value": [
         1553836160.157,
         "0.07624240294098855"
       ]
     },
     {
       "metric": {
         "source_id": "184487e6-0153-4162-b30c-f9c1b72d9dcd"
       },
       "value": [
         1553836160.157,
         "0.062369791418313975"
       ]
     }
   ]
 }
}

Note: You have to have admin privileges to run this query.

What’s Next for Metric Store

Next up for Metric Store: operability enhancements. As part of that, we are improving the monitoring aspects of the Metric Store, and adoption of the Monitoring Indicator Protocol. We also plan to continue to explore the commercial version, with enhanced replication and scaling capabilities. We’re also looking at how this feature might be used with Kubernetes. After that, we want to add support for recording rules and ingestion of Prometheus-compatible scraping endpoints.

Tell Us What You Think!

If you have any questions, comments, or thoughts about the new Metric Store, we would love to hear from you. You can find us in the #metric-store channel of the Cloud Foundry Slack. Also, feel free to open an issue in our Github repository.

SAFE HARBOR STATEMENT

This blog contains statements relating to Pivotal’s expectations, projections, beliefs and prospects which are "forward-looking statements” within the meaning of the federal securities laws and by their nature are uncertain. Words such as "believe," "may," "will," "estimate," "continue," "anticipate," "intend," "expect," "plans," and similar expressions are intended to identify forward-looking statements. Such forward-looking statements are not guarantees of future performance, and you are cautioned not to place undue reliance on these forward-looking statements. Actual results could differ materially from those projected in the forward-looking statements as a result of many factors, including but not limited to: (i) our limited operating history as an independent company, which makes it difficult to evaluate our prospects; (ii) the substantial losses we have incurred and the risks of not being able to generate sufficient revenue to achieve and sustain profitability; (iii) our future success depending in large part on the growth of our target markets; (iv) our future growth depending largely on Pivotal Cloud Foundry and our platform-related services; (v) our subscription revenue growth rate not being indicative of our future performance or ability to grow; (vi) our business and prospects being harmed if our customers do not renew their subscriptions or expand their use of our platform; (vii) any failure by us to compete effectively; (viii) our long and unpredictable sales cycles that vary seasonally and which can cause significant variation in the number and size of transactions that can close in a particular quarter; (ix) our lack of control of and inability to predict the future course of open-source technologies, including those used in Pivotal Cloud Foundry; and (x) any security or privacy breaches. All information set forth in this release is current as of the date of this release. These forward-looking statements are based on current expectations and are subject to uncertainties, risks, assumptions, and changes in condition, significance, value and effect as well as other risks disclosed previously and from time to time in documents filed by us with the U.S. Securities and Exchange Commission (SEC), including our prospectus dated April 19, 2018, and filed pursuant to Rule 424(b) under the U.S. Securities Act of 1933, as amended. Additional information will be made available in our quarterly report on Form 10-Q and other future reports that we may file with the SEC, which could cause actual results to vary from expectations. We disclaim any obligation to, and do not currently intend to, update any such forward-looking statements, whether written or oral, that may be made from time to time except as required by law.

This blog also contains statements which are intended to outline the general direction of certain of Pivotal's offerings. It is intended for information purposes only and may not be incorporated into any contract.  Any information regarding the pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and is subject to change. All software releases are on an if and when available basis and are subject to change. This information is provided without warranty or any kind, express or implied, and is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions regarding Pivotal's offerings. Any purchasing decisions should only be based on features currently available.  The development, release, and timing of any features or functionality described for Pivotal's offerings in this blog remain at the sole discretion of Pivotal. Pivotal has no obligation to update forward-looking information in this blog.

About the Author

Todd Persen

Todd Persen is a Principal Software Engineer with Pivotal. He was a co-founder and CTO at InfluxData, the company behind the InfluxDB time series database.

Follow on Twitter
Previous
It’s Time to Marry DevOps and Cybersecurity
It’s Time to Marry DevOps and Cybersecurity

How to integrate DevOps and Cybersecurity and have a healthier organisation.

Next
Betting On Community: Why Pivotal is All-In On Open Source
Betting On Community: Why Pivotal is All-In On Open Source

Learn why Pivotal is all in on open source.