As of #374 the prometheus sink will collect and aggregate all metric events it receives. The aggregation is done by key, so it relies on there being a finite, reasonable number of total keys that it will see. In the case of a misconfiguration or bad input data, this might not be true (e.g. a user could accidentally parse a timestamp into a field intended as a service name and then use that as part of a metric key).
There are a couple things we could do in this area:
- Expire metric aggregations after a given period of idleness. This could be desirable even in a correct configuration that simply changes the naming scheme of certain metrics over time. However, it likely wouldn't keep up with a pathological case like that of a timestamp key.
- Set an upper limit on the number of metrics the sink will aggregate. It's hard to know what a reasonable value here would be, so we'd probably want to work backwards from what we'd consider problematic memory use.
- Nothing. We could decide this isn't likely enough to worry about, and just let users rely on normal debugging to figure out the issue should it happen (or maybe add some logging that would lead them in the right direction).
As of #374 the prometheus sink will collect and aggregate all metric events it receives. The aggregation is done by key, so it relies on there being a finite, reasonable number of total keys that it will see. In the case of a misconfiguration or bad input data, this might not be true (e.g. a user could accidentally parse a timestamp into a field intended as a service name and then use that as part of a metric key).
There are a couple things we could do in this area: