Skip to content

Consider safeguards for prom sink memory use #387

@lukesteensen

Description

@lukesteensen

As of #374 the prometheus sink will collect and aggregate all metric events it receives. The aggregation is done by key, so it relies on there being a finite, reasonable number of total keys that it will see. In the case of a misconfiguration or bad input data, this might not be true (e.g. a user could accidentally parse a timestamp into a field intended as a service name and then use that as part of a metric key).

There are a couple things we could do in this area:

  1. Expire metric aggregations after a given period of idleness. This could be desirable even in a correct configuration that simply changes the naming scheme of certain metrics over time. However, it likely wouldn't keep up with a pathological case like that of a timestamp key.
  2. Set an upper limit on the number of metrics the sink will aggregate. It's hard to know what a reasonable value here would be, so we'd probably want to work backwards from what we'd consider problematic memory use.
  3. Nothing. We could decide this isn't likely enough to worry about, and just let users rely on normal debugging to figure out the issue should it happen (or maybe add some logging that would lead them in the right direction).

Metadata

Metadata

Assignees

No one assigned

    Labels

    domain: data modelAnything related to Vector's internal data modeldomain: metricsAnything related to Vector's metrics eventssink: prometheus_exporterAnything `prometheus_exporter` sink related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions