test(patterndetector): cover SetDoubleValue + reverse-order cross-stream by trilamsr · Pull Request #467 · TraceCoreAI/tracecore

trilamsr · 2026-06-02T03:12:32Z

Summary

Closes #463. Adds two test cases flagged by the reviewer of #461 (#437) as reachable-but-uncovered branches in the metrics-path consumer.

Root cause

PR #461 shipped the metrics-path consumer + cross-stream join for cuda_oom with 10 projection tests. Two branches were reachable but unexercised:

projectFBMemoryFromMetrics uses int64(dp.DoubleValue()) at metrics.go:344 — the SetDoubleValue truncation path. All 10 existing tests used SetIntValue. A regression that swapped truncation for rounding (or returned 0) would have shipped silently.
TestMetricsProcessor_CrossStreamJoinFragmentationVerdict fed FB-then-OOM only. The reverse order (OOM arrives before the next dcgm scrape) was an implicit assumption — the detector's empty-buffer branch + the buffer-survives-across-calls invariant were both uncovered for the metrics-path wiring.

Changes

Two new tests in module/processor/patterndetectorprocessor/metrics_test.go:

TestProjectFBMemoryFromMetrics_DoubleValueTruncatesToInt64 — feeds a SetDoubleValue(42.7) free + SetDoubleValue(80 GiB) total; asserts FreeBytes == 42 (truncate, not round to 43; not zero).
TestMetricsProcessor_CrossStreamReverseOrderPartialThenJoin — OOM log first (buffer empty) → asserts partial verdict kind=unknown, Confidence=Partial. Then FB metric arrives, retry OOM log → asserts full verdict kind=fragmentation (locks the buffer-survives-across-calls invariant).

Test-only; no production-code changes.

Mutation results

Each test verified to fail when its target code is broken (then restored):

Mutation	Test caught it?
`numberDataPointIntValue` Double branch -> `return 0`	yes (`expected: 42, actual: 0`)
`numberDataPointIntValue` Double branch -> rounding `(x + 0.5)`	yes (`expected: 42, actual: 43`)
`runCUDAOOMDetector` buffer-drain skipped (`if false && buffer != nil`)	yes (retry OOM falls back to `unknown` instead of `fragmentation`)
Drop all partial verdicts in `runCUDAOOMDetector`	yes (first OOM emits 0 verdicts instead of 1)

A+ exhaustiveness audit (non-blocking notes)

Other projection branches checked:

Gauge / Sum / non-numeric metric types — covered (HappyPathGaugeAndSum, DropsNonNumericDataPoints, IgnoresNonFBMetrics).
Missing gpu.id (resource + DP) — covered (DropsRecordsMissingGPUID).
Empty-value DPs — covered (DropsNonNumericDataPoints).
DP-level gpu.id overriding resource — covered (MultiGPUSplit).
Histogram/Summary/ExponentialHistogram metric types named hw.gpu.memory.* (the default branch in numberDataPoints) — technically untested but the customer-stable scrape format pins these to Gauge/Sum; not load-bearing enough to file.

Tests-only: cover `SetDoubleValue` truncation + reverse-order cross-stream join in the cuda_oom metrics-path consumer (closes #463).

Test plan

go test -race -count=1 ./processor/patterndetectorprocessor/... — green
Each new test fails when its target is mutated; passes when restored (table above)
Pre-commit hooks: golangci-lint, go vet, go mod verify, attribute-namespace-check — green

Closes #463. Two reviewer-flagged gaps from #461 (#437): 1. projectFBMemoryFromMetrics SetDoubleValue branch (metrics.go:344) was reachable but uncovered — all ten existing projection tests used SetIntValue. A regression that swapped truncation for rounding (or returned 0) would have shipped silently. New test pins 42.7 -> int64(42) truncate-toward-zero. 2. Cross-stream reverse-order: TestMetricsProcessor_CrossStreamJoin* fed FB-then-OOM. The reverse path (OOM arrives before the next dcgm scrape lands) was implicit. New test feeds OOM first (asserts partial verdict kind=unknown), then FB metric + retry OOM (asserts the buffer survives across ConsumeLogs/Metrics calls and the retry correlates to a full fragmentation verdict). Test-only. Mutation-verified: - numberDataPointIntValue Double branch returning 0 -> caught - ...+0.5 rounding -> caught - buffer-drain skip in runCUDAOOMDetector -> caught - partial-verdict suppression -> caught Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr · 2026-06-02T03:17:11Z

Review: PR #467 (SetDoubleValue + reverse-order cross-stream tests)

Grade: A+

B/A/A+ Criteria

✓ Scope: Test-only, 2 new tests for reachable-but-uncovered branches in PR #461 metrics-path consumer.

✓ Risk assessment: Low. No production code changes. Branches already exist; tests lock them against regression.

✓ Test proportionality: 2 tests for 2 orthogonal branches.

Test 1: SetDoubleValue truncation path (48 lines; 2 data points; 5 surgical assertions)
Test 2: reverse-order cross-stream contract (102 lines; 3-part sequence; 10+ assertions locking buffer invariant)

✓ Mutation coverage: All 4 claimed mutations verified reachable and would be caught:

numberDataPointIntValue returning 0 on Double → Test 1 asserts FreeBytes == 42
numberDataPointIntValue rounding Double → Test 1 asserts 42.7 → 42 (not 43)
Buffer drain skipped → Test 2 L632 asserts retry OOM yields kind=fragmentation (not unknown)
Partial verdicts dropped → Test 2 L579 asserts first OOM emits 1 verdict

✓ Exhaustiveness audit (A+ defensible):

Existing tests cover Gauge + Sum shapes, missing gpu.id, empty values, non-FB metrics, multi-GPU split, bounded buffer eviction, time-bound stale eviction.
Histogram/Summary/ExponentialHistogram omitted — production code explicitly documents these are "not FB-Counter shapes"; customer-stable scrape format (dcgm-exporter) pins to Gauge/Sum. Not load-bearing.

✓ Production code untouched: Diff shows only metrics_test.go changed.

Simplification Sweep

Test 1 (DoubleValue): No bloat.

Comments justify why Double appears (OTTL rewrite artifact), cite issue test(patterndetector): cover SetDoubleValue + reverse-order cross-stream cases (#461 follow-ups) #463 and PR feat(patterndetector): metrics-path consumer for hw.gpu.memory.* (#437) #461, explain truncation-vs-rounding risk.
Fixture: 2 data points (one fractional to test the boundary, one large to pin the path).
Assertions: Focused on GPUID, Node, truncation (42 not 43), total, timestamp.

Test 2 (ReverseOrder): No bloat.

Comments explain the contract (OOM-before-FB → partial verdict; FB-after-OOM → full verdict), why the implicit assumption needs to be locked, what emit-partial config regression would surface.
Fixture: 3-part sequence necessary to lock the cross-stream invariant (buffer-empty OOM → metrics arrive → retry OOM correlates).
Assertions: Surgical — Len checks catch missing verdicts; Kind/Confidence assertions lock the contract branches.

Test Execution

✓ go test -race ./module/processor/patterndetectorprocessor/...
  - TestProjectFBMemoryFromMetrics_DoubleValueTruncatesToInt64: PASS
  - TestMetricsProcessor_CrossStreamReverseOrderPartialThenJoin: PASS
  - All 10 existing tests: PASS

Release notes

Fenced ```release-notes block, concise, test-only, closes #463. ✓

Verdict

Ship. A+. Tests are minimal, well-commented, mutation-verified, and lock reachable-but-uncovered branches. No simplification needed. No auto-merge required (manual sign-off fine).

trilamsr enabled auto-merge (squash) June 2, 2026 03:17

trilamsr merged commit ea0494b into main Jun 2, 2026
12 checks passed

trilamsr deleted the test/463-metrics-projection-coverage branch June 2, 2026 03:21

trilamsr mentioned this pull request Jun 2, 2026

audit(wave-2026-06-02): autonomous-wave cross-cut review #488

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(patterndetector): cover SetDoubleValue + reverse-order cross-stream#467

test(patterndetector): cover SetDoubleValue + reverse-order cross-stream#467
trilamsr merged 1 commit into
mainfrom
test/463-metrics-projection-coverage

trilamsr commented Jun 2, 2026

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented Jun 2, 2026

Summary

Root cause

Changes

Mutation results

A+ exhaustiveness audit (non-blocking notes)

Test plan

Uh oh!

trilamsr commented Jun 2, 2026

Review: PR #467 (SetDoubleValue + reverse-order cross-stream tests)

B/A/A+ Criteria

Simplification Sweep

Test Execution

Release notes

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant