Problem
PR #266 (closed) added docs/integrations/dcgm-exporter.md + example with valuable operator-facing content:
- Helm install command for upstream dcgm-exporter (Apache-2.0)
- kubectl port-forward + curl /metrics verify step
- Failure modes table (missing GPU driver, NVML unreachable, RBAC denied)
- Operator-facing metric mapping table (DCGM_FI_* → tracecore hw.* → consumer pattern)
Closed instead of merged because the OTTL transform in the example YAML had blockers (wrong NVLink metric family, missing attrs) — that work is already shipped correctly via PR #267 (PR-A: DCGM→hw.* recipe + ADR) which lives at docs/integrations/prometheus-scrape.md.
The remaining operator-onboarding content from #266 (helm install + verify + failure modes) is still missing from prometheus-scrape.md. Operators following pattern docs end-to-end need it.
Proposed fix
Add new sections to docs/integrations/prometheus-scrape.md (not a new file):
- § Install dcgm-exporter — helm install command + minimal values.yaml override (nodeSelector, serviceMonitor: false, service.port)
- § Verify dcgm-exporter is scrapable — kubectl port-forward + curl + expected metric prefixes
- § Failure modes — table: symptom → root cause → fix (no GPU driver, NVML unreachable, RBAC denied, dcgm-exporter pod CrashLoop, prometheusreceiver scrape timeout)
DO NOT touch the OTTL transform — PR #267 owns that.
Acceptance
prometheus-scrape.md is the single source of truth for the DCGM-exporter → tracecore wire path.
- Operators can helm install dcgm-exporter + wire tracecore by following ONE doc.
- No new file under
docs/integrations/.
Out of scope
Problem
PR #266 (closed) added
docs/integrations/dcgm-exporter.md+ example with valuable operator-facing content:Closed instead of merged because the OTTL transform in the example YAML had blockers (wrong NVLink metric family, missing attrs) — that work is already shipped correctly via PR #267 (PR-A: DCGM→hw.* recipe + ADR) which lives at
docs/integrations/prometheus-scrape.md.The remaining operator-onboarding content from #266 (helm install + verify + failure modes) is still missing from
prometheus-scrape.md. Operators following pattern docs end-to-end need it.Proposed fix
Add new sections to
docs/integrations/prometheus-scrape.md(not a new file):DO NOT touch the OTTL transform — PR #267 owns that.
Acceptance
prometheus-scrape.mdis the single source of truth for the DCGM-exporter → tracecore wire path.docs/integrations/.Out of scope