Skip to content

feat(recipe): pattern-2 IB link flap OTTL stanza (#393)#415

Merged
trilamsr merged 1 commit into
mainfrom
feat/pattern2-ottl-recipe-393
Jun 1, 2026
Merged

feat(recipe): pattern-2 IB link flap OTTL stanza (#393)#415
trilamsr merged 1 commit into
mainfrom
feat/pattern2-ottl-recipe-393

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #393. Ships the metric-side OTTL projection from
node_exporter --collector.infiniband's node_infiniband_port_state_id
Gauge onto the customer-stable hw.network.ib.* namespace
(hw.network.ib.port.state int + hw.network.ib.device +
hw.network.ib.port.num) so pattern #2's IBLinkFlapDetector
consumes the same vendor-neutral wire shape regardless of whether the
underlying source is node_exporter, a Mellanox exporter, or the
mlx5_core journald stream.

Detector library + processor wiring already shipped in #391 (closed
#300). Only the metric-side input recipe was missing — pattern #2 was
configured-but-quiet on real deployments. This PR closes that gap.

Wire contract (node_exporter raw → hw.network.ib.*)

node_infiniband_port_state_id{device="mlx5_0", port="1"} = 4
                                                          (IBA phys_state ID)
            ↓ transform/ib_to_hw_semconv
Gauge metric "hw.network.ib.port.state" with datapoint attrs:
  hw.network.ib.device  = "mlx5_0"        (str, from `device` label)
  hw.network.ib.port.num = 1              (int, from Int(`port` label))
  value                  = 4              (int, the phys_state ID)

The future RFC-0014 PR-B metrics→logs bridge emitter (shared with
patterns #3/#4/#5/#10) will lift these three attributes onto a log
record at emit time. The bridge log-record schema for pattern #2 is
pinned in docs/integrations/prometheus-scrape.md §Pattern #2 — hw.network.ib.port.state (issue #393) so PR-B has no per-pattern
reconstruction work to do.

The companion series node_infiniband_state{state="<name>"} (string
label) is intentionally NOT mapped — the detector
(module/processor/patterndetectorprocessor/ib_link_flap.go) compares
state.Int() against patterns.IBPortState* integer constants, so
the string variant would round-trip wrong.

No detector code change required

The detector reads three attribute names off a log record:
hw.network.ib.port.state, hw.network.ib.device,
hw.network.ib.port.num. The recipe stanza stamps the exact same
three names on the metric datapoint. The wire format port.Int()
expects (the projector at
module/processor/patterndetectorprocessor/ib_link_flap.go line 39
calls int(port.Int())) is satisfied because the OTTL Int() cast
on the Prometheus port string label produces a pdata int Value.
Confirmed by the new TestRecipe_IBLinkFlap_RoundTripFiresVerdict
test.

Root cause + scope

Files changed

  • docs/integrations/examples/prometheus-scrape.yaml — new
    transform/ib_to_hw_semconv processor; wired into the
    metrics/scrape pipeline. Validates with ./_build/tracecore validate (exit 0).
  • docs/integrations/prometheus-scrape.md — new "Pattern Bump the gh-actions group across 1 directory with 4 updates #2
    InfiniBand link flap" projection section, intro updated from "Two"
    to "Three OTTL transforms", and bridge log-record contract
    subsection added under the "Metrics-to-logs bridge contract"
    section.
  • docs/patterns/pattern-2-ib-link-flap.md — deleted "Integration
    gap" section, replaced with "Integration recipe" pointing at the
    shipped stanza; updated "Why node_exporter sees it" prose to drop
    the "pending" hedge.
  • module/processor/patterndetectorprocessor/ib_link_flap_recipe_test.go
    — new file. TestRecipe_IBLinkFlap_StanzaPinsWireContract parses
    the example YAML and asserts every load-bearing token is present
    (source metric name, three hw.network.ib.* attrs, the Int()
    cast on the port label, the transform name, and the pipeline
    wiring). TestRecipe_IBLinkFlap_RoundTripFiresVerdict simulates
    the end-to-end path: builds plog.Logs with the exact attribute
    shape the recipe stamps and asserts the processor emits a flap
    verdict.

Test plan

  • ./_build/tracecore validate --config=docs/integrations/examples/prometheus-scrape.yaml → exit 0
  • bash scripts/validator-recipe.sh → 9 validated, 3 skipped (non-linux host)
  • bash scripts/doc-check.sh → clean (no orphan test refs)
  • go test ./module/processor/patterndetectorprocessor/... -count=1 → PASS (incl. the two new tests + all 5 existing IB tests)
  • go build ./... and go vet ./... → clean
  • Pre-commit hooks: golangci-lint 0 issues, go mod verify, attribute-namespace-check 100/100
  • Mutation-verified: dropping hw.network.ib.port.state from the recipe yaml fails TestRecipe_IBLinkFlap_StanzaPinsWireContract with the expected remediation message naming the missing identifier
  • CI on the PR (waiting on push)
feat(recipe): InfiniBand link-flap OTTL stanza projecting node_exporter's `node_infiniband_port_state_id` onto the tracecore-canonical `hw.network.ib.*` namespace (`hw.network.ib.port.state` int + `hw.network.ib.device` + `hw.network.ib.port.num`). Pattern #2's `IBLinkFlapDetector` now has its metric-side input wired; metrics→logs bridge emitter remains gated on RFC-0014 PR-B (#260).

Project node_exporter --collector.infiniband's
node_infiniband_port_state_id Gauge onto the customer-stable
hw.network.ib.* namespace (hw.network.ib.port.state int +
hw.network.ib.device + hw.network.ib.port.num) so pattern #2's
IBLinkFlapDetector consumes the same wire shape regardless of
whether the underlying source is node_exporter, a Mellanox
exporter, or the mlx5_core journald stream.

The detector library + processor wiring already shipped (PR #391,
closes #300); only the metric-side OTTL projection was missing.
The metrics-to-logs bridge emitter (RFC-0014 PR-B) is still
upstream-blocked and shared with patterns #3/#4/#5/#10 — the
bridge log-record schema for pattern #2 is pinned in the same
section of the recipe so PR-B has no per-pattern reconstruction
work to do.

Closes #393.

- docs/integrations/examples/prometheus-scrape.yaml: new
  transform/ib_to_hw_semconv stanza; wired into metrics/scrape
  pipeline. Validates with tracecore validate (exit 0).
- docs/integrations/prometheus-scrape.md: new "Pattern #2"
  projection section + bridge log-record contract subsection.
- docs/patterns/pattern-2-ib-link-flap.md: flipped "Integration
  gap" prose to "Integration recipe" pointing at the shipped
  stanza; updated "Why node_exporter sees it" prose to drop
  "pending" wording.
- module/processor/patterndetectorprocessor/ib_link_flap_recipe_test.go:
  new recipe-pin gate. TestRecipe_IBLinkFlap_StanzaPinsWireContract
  asserts the recipe YAML carries the load-bearing tokens
  (source-metric name, three hw.network.ib.* attrs, port-label
  Int() cast, transform name, pipeline wiring) so a rename in
  either direction is caught at unit-test time, not at deploy.
  TestRecipe_IBLinkFlap_RoundTripFiresVerdict simulates the
  end-to-end path with recipe-shaped log records and asserts a
  flap verdict is emitted.

Mutation-verified: dropping any pinned token from the recipe
fails TestRecipe_IBLinkFlap_StanzaPinsWireContract with the
remediation message naming the missing identifier.

Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr

trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

APPROVED. Independent review complete.

Top findings

  1. Recipe placement: CONSISTENT. Uses docs/integrations/examples/prometheus-scrape.yaml matching feat(recipes): OTTL stanzas + bridge for pattern #7 (#364 #365) #406 (pattern-7 OTTL recipe). No drift from audit audit(wave-2026-06-01): post-wave cross-cut review #421.

  2. Attribute cross-check: VERIFIED. OTTL stanza stamps hw.network.ib.port.state (int) + hw.network.ib.device (str) + hw.network.ib.port.num (int cast via Int()) — exact match to pattern-2 walkthrough (docs(pattern-2): author IB link flap operator walkthrough #391) and detector reads (module/processor/patterndetectorprocessor/ib_link_flap.go::projectIBPortStateRecord).

  3. Test coverage: SOUND. Two new tests pin the contract end-to-end:

    • TestRecipe_IBLinkFlap_StanzaPinsWireContract — semantic gate on recipe YAML; mutation-verified (PR body confirms dropping any attribute fails the test).
    • TestRecipe_IBLinkFlap_RoundTripFiresVerdict — round-trip from metric-shaped attributes → detector verdict. Uses existing extractIBLinkFlapVerdicts helper (not broken out-of-thin-air).
  4. RFC-0014 PR-B blocker: HONESTLY NAMED. PR body, docs, and code comments all pin the metrics→logs emitter as out-of-scope (separate PR-B, OTel-contrib gated). Bridge log-record schema pinned in prometheus-scrape.md §Pattern #2 bridge contract so PR-B has no per-pattern reconstruction work.

  5. Verdict fixture alignment (test(sdk): single-source verdict fixtures across Go + Python (#368) #398): CORRECT. Pattern ID = "2" (not "21"). Detector emits PatternIDIBLinkFlap as built from patterns.IBPortStateRecord projector output. All prior wire-format tests in ib_link_flap_test.go continue to pass unchanged.

SHIP

@trilamsr trilamsr merged commit bc0f6fc into main Jun 1, 2026
12 checks passed
@trilamsr trilamsr deleted the feat/pattern2-ottl-recipe-393 branch June 1, 2026 23:11
trilamsr added a commit that referenced this pull request Jun 2, 2026
## What this PR does

Closes #427 via the reviewer-recommended **option B**: keep current
pattern OTTL stanza placement
(`docs/integrations/examples/<target>.yaml`), fix the misleading
PR-title convention going forward, and add an automated gate so it
doesn't recur.

Audit issue #421 flagged that PR #406 ("feat(recipes): OTTL stanzas +
bridge for pattern #7") and PR #415 ("feat(recipe): pattern-2 IB link
flap OTTL stanza") imply a top-level
`recipes/pattern-N/{ottl.yaml,README.md}` directory layout — but `find .
-maxdepth 2 -type d -name 'recipes'` returns nothing. The actual
placement is `docs/integrations/examples/<target>.yaml`. Option A
(migrate) would rot cross-links across six-plus docs; option B (fix the
convention) costs near-zero discoverability.

## Linked issue(s)

Closes #427. Refs #406 #415 #421.

## Changes

- **`STYLE.md` §"Commits"** — new bullet pointing PR titles / commit
subjects at the real path and naming the three accepted subject shapes:
  - `feat(integrations/examples): pattern-N OTTL stanza`
  - `feat(<target>): pattern-N ...`
  - `feat(pattern-N): OTTL stanza in docs/integrations/examples/`
- **`.github/PULL_REQUEST_TEMPLATE.md`** — brief reminder pointing at
STYLE.md §"Commits" and issue #427 for context.
- **`scripts/recipes-path-check.sh` + `_test.sh`** — TDD-driven gate.
Two rules:
1. Literal path `recipes/pattern-N` (with negative lookahead so
`internal/recipes/`, `module/recipes/`, etc. pass).
2. Bare `feat(recipes):` / `feat(recipe):` scope paired with a `pattern
N` mention — the exact shape #421 flagged on PRs #406 / #415.

Test fixture: 5 reject cases (including the two historical PR titles
verbatim) + 6 accept cases (real placement, per-target scope,
`pattern-N` scope, unrelated subjects, the Go `./internal/recipes/...`
package path from `HARDWARE-TESTING.md`, empty input).
- **`Makefile`** — `recipes-path-check` target wired into `ci-fast` and
`ci-full`. Runs the regression test suite, not a tree scan; the gate
itself operates on a subject string passed as `$1`.
- **`.github/workflows/pr-lint.yml`** — new step invokes
`scripts/recipes-path-check.sh` with `${{
github.event.pull_request.title }}` and fails the workflow on a
forbidden shape.

## Audit results

- `grep -rn 'recipes/pattern-' docs/` — only hit is
`./internal/recipes/...` in `docs/HARDWARE-TESTING.md` (a Go package
path, not a docs path). The gate accepts it. No stale references to fix.
- `grep -rn 'recipes/' docs/ .github/ CONTRIBUTING.md PRINCIPLES.md
README.md | grep -v 'docs/integrations/examples'` — same single hit.
Clean.

## Hard rules honoured

- **No files migrated.** Option B is documentation + lint only.
- **No bureaucratic process docs.** One bullet added to STYLE.md, one
block in PR template, one TDD-tested gate script. Total diff: 6 files,
+159 / -3.

## Release notes

```release-notes
NONE
```

## Test plan

- [x] `bash scripts/recipes-path-check_test.sh` — 11/11 fixtures pass (5
reject + 6 accept).
- [x] `make recipes-path-check` — runs the regression test via the new
make target.
- [x] `make ci-fast` — passes end-to-end including the new gate (lint +
vet + mod-verify + attribute-namespace-check + doc-check +
recipes-path-check).
- [x] `go tool actionlint .github/workflows/pr-lint.yml` — clean.
- [x] `make zizmor` — no findings; 28 suppressed, 32 ignored (baseline
unchanged).
- [x] Verified the gate rejects both historical titles verbatim:
`feat(recipes): OTTL stanzas + bridge for pattern #7 (#364 #365)` and
`feat(recipe): pattern-2 IB link flap OTTL stanza (#393)`.

## Checklist

- [x] Tests added or updated (TDD: test written before gate; both ship
in this PR).
- [x] `make ci-fast` passes.
- [x] Commits are signed off.
- [x] PR title and Summary reflect the current diff.

---------

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ottl(pattern-2): node_infiniband -> hw.network.ib.* recipe

1 participant