Skip to content

[rc1-prep] OTTL recipe: project DCGM_FI_DEV_FB_USED/FB_FREE → hw.gpu.memory.{free,total} log shape (pattern #10 wiring) #337

Description

@trilamsr

The CUDA OOM detector (#303, landed in chore/v1-rc1-knowledge-gaps wave) consumes hw.gpu.memory.{free,total} log records but the prometheus-scrape recipe does not yet project these from DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_FREE. Until this lands, the detector cannot fire on a real install.

Cross-ref:

Add OTTL stanza to docs/integrations/prometheus-scrape.md (or a new dcgm-extension recipe) that projects FB used+free as log records the detector library consumes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions