Skip to content
This repository was archived by the owner on Aug 24, 2020. It is now read-only.
This repository was archived by the owner on Aug 24, 2020. It is now read-only.

Feature request: allow Prometheus to scrape log-cache directly #96

@richardTowers

Description

@richardTowers

Hello there,

I’m an operator of a multi-tenant OSS CF for the UK Government.

We provide a mechanism for our tenants to extract their app metrics and store them in Prometheus. To do this, we've had to build a few moving parts, with one significant piece being a /metrics endpoint which sits in front of log-cache and provides a view of the user visible metrics (based on the Authorization header) in a format which standard Prometheus scraping can ingest.

Suggested feature

We understand from this project's README and some #logcache Slack chat that a future aim is to provide endpoints which would enable existing PromQL API clients to talk to log-cache natively. We wonder if there would be any interest in also providing a /metrics endpoint such that Prometheus itself could scrape log-cache directly - effectively providing a prometheus exporter for log-cache?

This would bring several advantages for consumers, allowing:

  • teams to use Prometheus' AlertManager and other non-PromQL ecosystem components
  • application teams to decouple their stats from the platform
  • operators to use existing Promethesis, which can be persistent and durable in a manner which we don't believe log-cache is currently designed (or aiming) to achieve.

We feel the /metrics contract is a good one for log-cache to expose to the Prometheus universe, as log-cache already uses the concept of "you'll see all the stats for which your API token gives you visibility" via the Authorization header.

Prometheus also contains the concept of a /federate API, which is similar to /metrics. /federate requires (or, at least, strongly suggests) that a consumer feeds the endpoint with a filter for stats it would like to see.

We think that /federate is probably not a great fit for log-cache's use case, where the set of metrics that a single OAuth token can access is implicit within the system. Providing a secondary restriction over the top of that set seems to run counter to log-cache's existing approach: just exposing all the stats which are available to the requestor.

Potential difficulties

Prometheus expects some metrics to reset (or be removed) when their last value becomes stale. If this isn't done properly we have observed issues like:

  • metrics aren't removed when apps are deleted
  • metrics for a given cell aren't removed when the app migrates to another cell
  • metrics for an instance aren't removed when the app scales down

This may be solvable only by reading logs from Doppler, or may require log-cache to reach out to other parts of the system (which may be undesirable).

Next steps

If adding a /metrics endpoint aligns with your plans for log-cache we (GOV.UK PaaS) would be happy to contribute design and code as required. In the short term it's likely we'll implement something similar in spirit ourselves, as we already have live tenants using Prometheus via the projects mentioned below.

If this is not something that is likely to be added to log-cache then we may alter the design for our own metrics solutions to make them a more long-term part of our platform.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions