Feature request: allow Prometheus to scrape log-cache directly

Hello there,

I’m an operator of a multi-tenant OSS CF for the UK Government.

We provide a mechanism for our tenants to extract their app metrics and store them in Prometheus. To do this, we've had to build a few moving parts, with one significant piece being a `/metrics` endpoint which sits in front of log-cache and provides a view of the user visible metrics (based on the Authorization header) in a format which standard Prometheus scraping can ingest.

### Suggested feature

We understand from this project's README and some [#logcache Slack chat](https://cloudfoundry.slack.com/archives/CBFB7NP9B/p1540392230000100) that a future aim is to provide endpoints which would enable existing PromQL API clients to talk to log-cache natively. We wonder if there would be any interest in also providing a `/metrics` endpoint such that Prometheus *itself* could scrape log-cache directly - effectively providing a [prometheus exporter](https://prometheus.io/docs/instrumenting/writing_exporters/) for log-cache?

This would bring several advantages for consumers, allowing:

- teams to use Prometheus' AlertManager and other non-PromQL ecosystem components
- application teams to decouple their stats from the platform
- operators to use existing Promethesis, which can be persistent and durable in a manner which we don't believe log-cache is currently designed (or aiming) to achieve.

We feel the `/metrics` contract is a good one for log-cache to expose to the Prometheus universe, as log-cache already uses the concept of "you'll see all the stats for which your API token gives you visibility" via the `Authorization` header. 

Prometheus also contains the concept of a `/federate` API, which is similar to `/metrics`. `/federate` requires (or, at least, strongly suggests) that a consumer feeds the endpoint with a filter for stats it would like to see.

We think that `/federate` is probably not a great fit for log-cache's use case, where the set of metrics that a single OAuth token can access is implicit within the system. Providing a secondary restriction over the top of that set seems to run counter to log-cache's existing approach: just exposing all the stats which are available to the requestor.

### Potential difficulties

Prometheus expects some metrics to reset (or be removed) when their last value becomes stale. If this isn't done properly [we have observed](https://github.com/alphagov/paas-metric-exporter/pull/33) issues like:

* metrics aren't removed when apps are deleted
* metrics for a given cell aren't removed when the app migrates to another cell
* metrics for an instance aren't removed when the app scales down

This may be solvable only by reading logs from Doppler, or may require log-cache to reach out to other parts of the system (which may be undesirable).

### Next steps

If adding a `/metrics` endpoint aligns with your plans for log-cache we ([GOV.UK PaaS](https://www.cloud.service.gov.uk/])) would be happy to contribute design and code as required. In the short term it's likely we'll implement something similar in spirit ourselves, as we already have live tenants using Prometheus via the projects mentioned [below](#user-content-references).

If this is not something that is likely to be added to log-cache then we may alter the design for our own metrics solutions to make them a more long-term part of our platform.

<a id="references"></a>
### References

- https://github.com/alphagov/paas-metric-exporter - an internal project (useful for context only) which exports metrics to statsd and prometheus from Doppler.
- https://github.com/alphagov/paas-log-cache-adapter - an internal project (useful for context only) which exports metrics to prometheus from log-cache.
- https://cloudfoundry.slack.com/archives/CBFB7NP9B/p1540392230000100 - #logcache Slack chat
- https://prometheus.io/docs/instrumenting/writing_exporters/ - writing prometheus exporters
- https://github.com/alphagov/paas-metric-exporter/pull/33: an example of non-obvious difficulties we've found while working in this problem space

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: allow Prometheus to scrape log-cache directly #96

Suggested feature

Potential difficulties

Next steps

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: allow Prometheus to scrape log-cache directly #96

Description

Suggested feature

Potential difficulties

Next steps

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions