feat(recipe): pattern-2 IB link flap OTTL stanza (#393)#415
Conversation
Project node_exporter --collector.infiniband's node_infiniband_port_state_id Gauge onto the customer-stable hw.network.ib.* namespace (hw.network.ib.port.state int + hw.network.ib.device + hw.network.ib.port.num) so pattern #2's IBLinkFlapDetector consumes the same wire shape regardless of whether the underlying source is node_exporter, a Mellanox exporter, or the mlx5_core journald stream. The detector library + processor wiring already shipped (PR #391, closes #300); only the metric-side OTTL projection was missing. The metrics-to-logs bridge emitter (RFC-0014 PR-B) is still upstream-blocked and shared with patterns #3/#4/#5/#10 — the bridge log-record schema for pattern #2 is pinned in the same section of the recipe so PR-B has no per-pattern reconstruction work to do. Closes #393. - docs/integrations/examples/prometheus-scrape.yaml: new transform/ib_to_hw_semconv stanza; wired into metrics/scrape pipeline. Validates with tracecore validate (exit 0). - docs/integrations/prometheus-scrape.md: new "Pattern #2" projection section + bridge log-record contract subsection. - docs/patterns/pattern-2-ib-link-flap.md: flipped "Integration gap" prose to "Integration recipe" pointing at the shipped stanza; updated "Why node_exporter sees it" prose to drop "pending" wording. - module/processor/patterndetectorprocessor/ib_link_flap_recipe_test.go: new recipe-pin gate. TestRecipe_IBLinkFlap_StanzaPinsWireContract asserts the recipe YAML carries the load-bearing tokens (source-metric name, three hw.network.ib.* attrs, port-label Int() cast, transform name, pipeline wiring) so a rename in either direction is caught at unit-test time, not at deploy. TestRecipe_IBLinkFlap_RoundTripFiresVerdict simulates the end-to-end path with recipe-shaped log records and asserts a flap verdict is emitted. Mutation-verified: dropping any pinned token from the recipe fails TestRecipe_IBLinkFlap_StanzaPinsWireContract with the remediation message naming the missing identifier. Signed-off-by: Tri Lam <tree@lumalabs.ai>
|
APPROVED. Independent review complete. Top findings
SHIP |
## What this PR does Closes #427 via the reviewer-recommended **option B**: keep current pattern OTTL stanza placement (`docs/integrations/examples/<target>.yaml`), fix the misleading PR-title convention going forward, and add an automated gate so it doesn't recur. Audit issue #421 flagged that PR #406 ("feat(recipes): OTTL stanzas + bridge for pattern #7") and PR #415 ("feat(recipe): pattern-2 IB link flap OTTL stanza") imply a top-level `recipes/pattern-N/{ottl.yaml,README.md}` directory layout — but `find . -maxdepth 2 -type d -name 'recipes'` returns nothing. The actual placement is `docs/integrations/examples/<target>.yaml`. Option A (migrate) would rot cross-links across six-plus docs; option B (fix the convention) costs near-zero discoverability. ## Linked issue(s) Closes #427. Refs #406 #415 #421. ## Changes - **`STYLE.md` §"Commits"** — new bullet pointing PR titles / commit subjects at the real path and naming the three accepted subject shapes: - `feat(integrations/examples): pattern-N OTTL stanza` - `feat(<target>): pattern-N ...` - `feat(pattern-N): OTTL stanza in docs/integrations/examples/` - **`.github/PULL_REQUEST_TEMPLATE.md`** — brief reminder pointing at STYLE.md §"Commits" and issue #427 for context. - **`scripts/recipes-path-check.sh` + `_test.sh`** — TDD-driven gate. Two rules: 1. Literal path `recipes/pattern-N` (with negative lookahead so `internal/recipes/`, `module/recipes/`, etc. pass). 2. Bare `feat(recipes):` / `feat(recipe):` scope paired with a `pattern N` mention — the exact shape #421 flagged on PRs #406 / #415. Test fixture: 5 reject cases (including the two historical PR titles verbatim) + 6 accept cases (real placement, per-target scope, `pattern-N` scope, unrelated subjects, the Go `./internal/recipes/...` package path from `HARDWARE-TESTING.md`, empty input). - **`Makefile`** — `recipes-path-check` target wired into `ci-fast` and `ci-full`. Runs the regression test suite, not a tree scan; the gate itself operates on a subject string passed as `$1`. - **`.github/workflows/pr-lint.yml`** — new step invokes `scripts/recipes-path-check.sh` with `${{ github.event.pull_request.title }}` and fails the workflow on a forbidden shape. ## Audit results - `grep -rn 'recipes/pattern-' docs/` — only hit is `./internal/recipes/...` in `docs/HARDWARE-TESTING.md` (a Go package path, not a docs path). The gate accepts it. No stale references to fix. - `grep -rn 'recipes/' docs/ .github/ CONTRIBUTING.md PRINCIPLES.md README.md | grep -v 'docs/integrations/examples'` — same single hit. Clean. ## Hard rules honoured - **No files migrated.** Option B is documentation + lint only. - **No bureaucratic process docs.** One bullet added to STYLE.md, one block in PR template, one TDD-tested gate script. Total diff: 6 files, +159 / -3. ## Release notes ```release-notes NONE ``` ## Test plan - [x] `bash scripts/recipes-path-check_test.sh` — 11/11 fixtures pass (5 reject + 6 accept). - [x] `make recipes-path-check` — runs the regression test via the new make target. - [x] `make ci-fast` — passes end-to-end including the new gate (lint + vet + mod-verify + attribute-namespace-check + doc-check + recipes-path-check). - [x] `go tool actionlint .github/workflows/pr-lint.yml` — clean. - [x] `make zizmor` — no findings; 28 suppressed, 32 ignored (baseline unchanged). - [x] Verified the gate rejects both historical titles verbatim: `feat(recipes): OTTL stanzas + bridge for pattern #7 (#364 #365)` and `feat(recipe): pattern-2 IB link flap OTTL stanza (#393)`. ## Checklist - [x] Tests added or updated (TDD: test written before gate; both ship in this PR). - [x] `make ci-fast` passes. - [x] Commits are signed off. - [x] PR title and Summary reflect the current diff. --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
Summary
Closes #393. Ships the metric-side OTTL projection from
node_exporter --collector.infiniband'snode_infiniband_port_state_idGauge onto the customer-stable
hw.network.ib.*namespace(
hw.network.ib.port.stateint +hw.network.ib.device+hw.network.ib.port.num) so pattern #2'sIBLinkFlapDetectorconsumes the same vendor-neutral wire shape regardless of whether the
underlying source is node_exporter, a Mellanox exporter, or the
mlx5_corejournald stream.Detector library + processor wiring already shipped in #391 (closed
#300). Only the metric-side input recipe was missing — pattern #2 was
configured-but-quiet on real deployments. This PR closes that gap.
Wire contract (node_exporter raw → hw.network.ib.*)
The future RFC-0014 PR-B metrics→logs bridge emitter (shared with
patterns #3/#4/#5/#10) will lift these three attributes onto a log
record at emit time. The bridge log-record schema for pattern #2 is
pinned in
docs/integrations/prometheus-scrape.md §Pattern #2 — hw.network.ib.port.state (issue #393)so PR-B has no per-patternreconstruction work to do.
The companion series
node_infiniband_state{state="<name>"}(stringlabel) is intentionally NOT mapped — the detector
(
module/processor/patterndetectorprocessor/ib_link_flap.go) comparesstate.Int()againstpatterns.IBPortState*integer constants, sothe string variant would round-trip wrong.
No detector code change required
The detector reads three attribute names off a log record:
hw.network.ib.port.state,hw.network.ib.device,hw.network.ib.port.num. The recipe stanza stamps the exact samethree names on the metric datapoint. The wire format
port.Int()expects (the projector at
module/processor/patterndetectorprocessor/ib_link_flap.goline 39calls
int(port.Int())) is satisfied because the OTTLInt()caston the Prometheus
portstring label produces a pdata int Value.Confirmed by the new
TestRecipe_IBLinkFlap_RoundTripFiresVerdicttest.
Root cause + scope
metrics→logs bridge emitter. Upstream-blocked at OTel-contrib v0.130
—
transformprocessor'smetric_statementscannot referencelog.*paths and no contrib connector emits log records from ametrics pipeline (per
RFC-0014).
The recipe doc explicitly documents this gating relationship; PR-B
is shared with patterns Add NORTHSTARS, RFC-0002, and Q1 MILESTONES #3/Wave 1: governance bootstrap (CODEOWNERS, DCO, signing) #4/Apply Wave-1 follow-ups: tests, automation, doc-truth #5/Tighten developer and PR feedback loops #10 and lands the bridge once.
Files changed
docs/integrations/examples/prometheus-scrape.yaml— newtransform/ib_to_hw_semconvprocessor; wired into themetrics/scrapepipeline. Validates with./_build/tracecore validate(exit 0).docs/integrations/prometheus-scrape.md— new "Pattern Bump the gh-actions group across 1 directory with 4 updates #2 —InfiniBand link flap" projection section, intro updated from "Two"
to "Three OTTL transforms", and bridge log-record contract
subsection added under the "Metrics-to-logs bridge contract"
section.
docs/patterns/pattern-2-ib-link-flap.md— deleted "Integrationgap" section, replaced with "Integration recipe" pointing at the
shipped stanza; updated "Why node_exporter sees it" prose to drop
the "pending" hedge.
module/processor/patterndetectorprocessor/ib_link_flap_recipe_test.go— new file.
TestRecipe_IBLinkFlap_StanzaPinsWireContractparsesthe example YAML and asserts every load-bearing token is present
(source metric name, three
hw.network.ib.*attrs, theInt()cast on the port label, the transform name, and the pipeline
wiring).
TestRecipe_IBLinkFlap_RoundTripFiresVerdictsimulatesthe end-to-end path: builds
plog.Logswith the exact attributeshape the recipe stamps and asserts the processor emits a flap
verdict.
Test plan
./_build/tracecore validate --config=docs/integrations/examples/prometheus-scrape.yaml→ exit 0bash scripts/validator-recipe.sh→ 9 validated, 3 skipped (non-linux host)bash scripts/doc-check.sh→ clean (no orphan test refs)go test ./module/processor/patterndetectorprocessor/... -count=1→ PASS (incl. the two new tests + all 5 existing IB tests)go build ./...andgo vet ./...→ cleanhw.network.ib.port.statefrom the recipe yaml failsTestRecipe_IBLinkFlap_StanzaPinsWireContractwith the expected remediation message naming the missing identifier