Skip to content

test(patterndetector): cover SetDoubleValue + reverse-order cross-stream#467

Merged
trilamsr merged 1 commit into
mainfrom
test/463-metrics-projection-coverage
Jun 2, 2026
Merged

test(patterndetector): cover SetDoubleValue + reverse-order cross-stream#467
trilamsr merged 1 commit into
mainfrom
test/463-metrics-projection-coverage

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #463. Adds two test cases flagged by the reviewer of #461 (#437) as reachable-but-uncovered branches in the metrics-path consumer.

Root cause

PR #461 shipped the metrics-path consumer + cross-stream join for cuda_oom with 10 projection tests. Two branches were reachable but unexercised:

  1. projectFBMemoryFromMetrics uses int64(dp.DoubleValue()) at metrics.go:344 — the SetDoubleValue truncation path. All 10 existing tests used SetIntValue. A regression that swapped truncation for rounding (or returned 0) would have shipped silently.
  2. TestMetricsProcessor_CrossStreamJoinFragmentationVerdict fed FB-then-OOM only. The reverse order (OOM arrives before the next dcgm scrape) was an implicit assumption — the detector's empty-buffer branch + the buffer-survives-across-calls invariant were both uncovered for the metrics-path wiring.

Changes

Two new tests in module/processor/patterndetectorprocessor/metrics_test.go:

  • TestProjectFBMemoryFromMetrics_DoubleValueTruncatesToInt64 — feeds a SetDoubleValue(42.7) free + SetDoubleValue(80 GiB) total; asserts FreeBytes == 42 (truncate, not round to 43; not zero).
  • TestMetricsProcessor_CrossStreamReverseOrderPartialThenJoin — OOM log first (buffer empty) → asserts partial verdict kind=unknown, Confidence=Partial. Then FB metric arrives, retry OOM log → asserts full verdict kind=fragmentation (locks the buffer-survives-across-calls invariant).

Test-only; no production-code changes.

Mutation results

Each test verified to fail when its target code is broken (then restored):

Mutation Test caught it?
numberDataPointIntValue Double branch -> return 0 yes (expected: 42, actual: 0)
numberDataPointIntValue Double branch -> rounding (x + 0.5) yes (expected: 42, actual: 43)
runCUDAOOMDetector buffer-drain skipped (if false && buffer != nil) yes (retry OOM falls back to unknown instead of fragmentation)
Drop all partial verdicts in runCUDAOOMDetector yes (first OOM emits 0 verdicts instead of 1)

A+ exhaustiveness audit (non-blocking notes)

Other projection branches checked:

  • Gauge / Sum / non-numeric metric types — covered (HappyPathGaugeAndSum, DropsNonNumericDataPoints, IgnoresNonFBMetrics).
  • Missing gpu.id (resource + DP) — covered (DropsRecordsMissingGPUID).
  • Empty-value DPs — covered (DropsNonNumericDataPoints).
  • DP-level gpu.id overriding resource — covered (MultiGPUSplit).
  • Histogram/Summary/ExponentialHistogram metric types named hw.gpu.memory.* (the default branch in numberDataPoints) — technically untested but the customer-stable scrape format pins these to Gauge/Sum; not load-bearing enough to file.
Tests-only: cover `SetDoubleValue` truncation + reverse-order cross-stream join in the cuda_oom metrics-path consumer (closes #463).

Test plan

  • go test -race -count=1 ./processor/patterndetectorprocessor/... — green
  • Each new test fails when its target is mutated; passes when restored (table above)
  • Pre-commit hooks: golangci-lint, go vet, go mod verify, attribute-namespace-check — green

Closes #463. Two reviewer-flagged gaps from #461 (#437):

1. projectFBMemoryFromMetrics SetDoubleValue branch (metrics.go:344)
   was reachable but uncovered — all ten existing projection tests
   used SetIntValue. A regression that swapped truncation for
   rounding (or returned 0) would have shipped silently. New test
   pins 42.7 -> int64(42) truncate-toward-zero.

2. Cross-stream reverse-order: TestMetricsProcessor_CrossStreamJoin*
   fed FB-then-OOM. The reverse path (OOM arrives before the next
   dcgm scrape lands) was implicit. New test feeds OOM first
   (asserts partial verdict kind=unknown), then FB metric + retry
   OOM (asserts the buffer survives across ConsumeLogs/Metrics calls
   and the retry correlates to a full fragmentation verdict).

Test-only. Mutation-verified:
- numberDataPointIntValue Double branch returning 0 -> caught
- ...+0.5 rounding -> caught
- buffer-drain skip in runCUDAOOMDetector -> caught
- partial-verdict suppression -> caught

Signed-off-by: Tri Lam <tree@lumalabs.ai>
@trilamsr

trilamsr commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Review: PR #467 (SetDoubleValue + reverse-order cross-stream tests)

Grade: A+

B/A/A+ Criteria

Scope: Test-only, 2 new tests for reachable-but-uncovered branches in PR #461 metrics-path consumer.

Risk assessment: Low. No production code changes. Branches already exist; tests lock them against regression.

Test proportionality: 2 tests for 2 orthogonal branches.

  • Test 1: SetDoubleValue truncation path (48 lines; 2 data points; 5 surgical assertions)
  • Test 2: reverse-order cross-stream contract (102 lines; 3-part sequence; 10+ assertions locking buffer invariant)

Mutation coverage: All 4 claimed mutations verified reachable and would be caught:

  • numberDataPointIntValue returning 0 on Double → Test 1 asserts FreeBytes == 42
  • numberDataPointIntValue rounding Double → Test 1 asserts 42.7 → 42 (not 43)
  • Buffer drain skipped → Test 2 L632 asserts retry OOM yields kind=fragmentation (not unknown)
  • Partial verdicts dropped → Test 2 L579 asserts first OOM emits 1 verdict

Exhaustiveness audit (A+ defensible):

  • Existing tests cover Gauge + Sum shapes, missing gpu.id, empty values, non-FB metrics, multi-GPU split, bounded buffer eviction, time-bound stale eviction.
  • Histogram/Summary/ExponentialHistogram omitted — production code explicitly documents these are "not FB-Counter shapes"; customer-stable scrape format (dcgm-exporter) pins to Gauge/Sum. Not load-bearing.

Production code untouched: Diff shows only metrics_test.go changed.

Simplification Sweep

Test 1 (DoubleValue): No bloat.

Test 2 (ReverseOrder): No bloat.

  • Comments explain the contract (OOM-before-FB → partial verdict; FB-after-OOM → full verdict), why the implicit assumption needs to be locked, what emit-partial config regression would surface.
  • Fixture: 3-part sequence necessary to lock the cross-stream invariant (buffer-empty OOM → metrics arrive → retry OOM correlates).
  • Assertions: Surgical — Len checks catch missing verdicts; Kind/Confidence assertions lock the contract branches.

Test Execution

✓ go test -race ./module/processor/patterndetectorprocessor/...
  - TestProjectFBMemoryFromMetrics_DoubleValueTruncatesToInt64: PASS
  - TestMetricsProcessor_CrossStreamReverseOrderPartialThenJoin: PASS
  - All 10 existing tests: PASS

Release notes

Fenced ```release-notes block, concise, test-only, closes #463. ✓

Verdict

Ship. A+. Tests are minimal, well-commented, mutation-verified, and lock reachable-but-uncovered branches. No simplification needed. No auto-merge required (manual sign-off fine).

@trilamsr trilamsr enabled auto-merge (squash) June 2, 2026 03:17
@trilamsr trilamsr merged commit ea0494b into main Jun 2, 2026
12 checks passed
@trilamsr trilamsr deleted the test/463-metrics-projection-coverage branch June 2, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(patterndetector): cover SetDoubleValue + reverse-order cross-stream cases (#461 follow-ups)

1 participant