[ci] cpusteal_test: relax hang-sentinel bounds (flake-pattern audit) by trilamsr · Pull Request #83 · TraceCoreAI/tracecore

trilamsr · 2026-05-19T08:23:18Z

Summary

Flake-pattern audit follow-up to PR #76 + #78. Two assertions in tools/failure-inject/cpusteal/cpusteal_test.go match the same shape we fixed in TestReceiver_SLIBudget and TestReceiver_SetDegraded: hard absolute upper bound on observed timing, calibrated to fast-runner expectations.

Before	After	What changed
`require.Less(elapsed, 500ms)` for 100ms request	`require.Less(elapsed, 2s)`	Hang sentinel, not perf bound — busy-loop scheduler delay under contention can run a 100ms request to 300-400ms
`require.Less(elapsed, 250ms)` for cancel response	`require.Less(elapsed, 2s)`	Same — context-cancellation latency varies by an order of magnitude under contention

The lower-bound assertion on TestRun_HonorsDuration (elapsed >= 95ms) still pins the real contract (busy-loop runs for the requested time). The upper bounds only catch "never returned." This matches the lesson landed in AGENTS.md via PR #81 — match perf-budget assertions by the invariant only.

Test plan

Local: go test -race -count=3 -v ./tools/failure-inject/cpusteal/ — all 4 tests PASS each iteration.
make lint clean.
make vet clean.
Audit completeness verified: broader grep sweep (require.Less.*Millisecond, assert.Less.*Millisecond, elapsed > N*time.X, WithinDuration, Budget callsites, isRaceBuild callsites) found no other instances of the same shape outside the kernelevents SLI test we already covered.
CI on this PR.

Rollback

Single Edit to restore the original numeric bounds. No dependents; the bounds are local to two test functions.

NONE — test stability only. Relaxes two absolute-time assertions in cpusteal's test to hang sentinels rather than performance bounds, matching the pattern landed in PR #76 and #78. No production behavior change.

Caught by the flake-pattern audit (FOLLOWUPS entry post-PR #76). `require.Less(elapsed, 500ms)` for a 100ms request and `require.Less(elapsed, 250ms)` for a context-cancel were both calibrated to fast-runner expectations — same shape as the SLI and SetDegraded flakes already fixed this session. Under GH Actions runner contention, scheduler delays on a busy-loop or context-cancellation latency can exceed those bounds without any regression in the receiver under test. Relaxed both upper bounds to 2s as hang sentinels rather than perf bounds. The lower-bound assertion on `TestRun_HonorsDuration` (`elapsed >= 95ms`) still pins the real contract (busy-loop ran for the requested time); the upper bound just catches "never returned". Same fix shape applied to `TestRun_HonorsContextCancellation`. Local: 3 isolated runs under -race, all 4 cpusteal tests PASS. `make lint` clean, `make vet` clean. Anchor for the audit: `AGENTS.md` lesson "Match perf-budget assertions by the invariant only" (PR #81); FOLLOWUPS § "CI flake hygiene". Signed-off-by: Tri Lam <trilamsr@gmail.com>

trilamsr enabled auto-merge (squash) May 19, 2026 08:23

trilamsr merged commit 7d39606 into main May 19, 2026
12 checks passed

trilamsr deleted the ci/audit-flake-regex-pattern branch May 19, 2026 08:27

trilamsr mentioned this pull request May 19, 2026

[chore] tag-protection: A+ uplift (manifest + idempotency hardening) — follow-up to #85 #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ci] cpusteal_test: relax hang-sentinel bounds (flake-pattern audit)#83

[ci] cpusteal_test: relax hang-sentinel bounds (flake-pattern audit)#83
trilamsr merged 1 commit into
mainfrom
ci/audit-flake-regex-pattern

trilamsr commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented May 19, 2026

Summary

Test plan

Rollback

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant