The CUDA OOM detector (#303, landed in chore/v1-rc1-knowledge-gaps wave) consumes hw.gpu.memory.{free,total} log records but the prometheus-scrape recipe does not yet project these from DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_FREE. Until this lands, the detector cannot fire on a real install.
Cross-ref:
Add OTTL stanza to docs/integrations/prometheus-scrape.md (or a new dcgm-extension recipe) that projects FB used+free as log records the detector library consumes.
The CUDA OOM detector (#303, landed in
chore/v1-rc1-knowledge-gapswave) consumeshw.gpu.memory.{free,total}log records but the prometheus-scrape recipe does not yet project these fromDCGM_FI_DEV_FB_USED/DCGM_FI_DEV_FB_FREE. Until this lands, the detector cannot fire on a real install.Cross-ref:
docs/patterns/10-cuda-oom-deceptive.md§Signal sourcesmodule/processor/patterndetectorprocessor/cuda_oom.goAdd OTTL stanza to
docs/integrations/prometheus-scrape.md(or a new dcgm-extension recipe) that projects FB used+free as log records the detector library consumes.