[ci] cpusteal_test: relax hang-sentinel bounds (flake-pattern audit)#83
Merged
Conversation
Caught by the flake-pattern audit (FOLLOWUPS entry post-PR #76). `require.Less(elapsed, 500ms)` for a 100ms request and `require.Less(elapsed, 250ms)` for a context-cancel were both calibrated to fast-runner expectations — same shape as the SLI and SetDegraded flakes already fixed this session. Under GH Actions runner contention, scheduler delays on a busy-loop or context-cancellation latency can exceed those bounds without any regression in the receiver under test. Relaxed both upper bounds to 2s as hang sentinels rather than perf bounds. The lower-bound assertion on `TestRun_HonorsDuration` (`elapsed >= 95ms`) still pins the real contract (busy-loop ran for the requested time); the upper bound just catches "never returned". Same fix shape applied to `TestRun_HonorsContextCancellation`. Local: 3 isolated runs under -race, all 4 cpusteal tests PASS. `make lint` clean, `make vet` clean. Anchor for the audit: `AGENTS.md` lesson "Match perf-budget assertions by the invariant only" (PR #81); FOLLOWUPS § "CI flake hygiene". Signed-off-by: Tri Lam <trilamsr@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Flake-pattern audit follow-up to PR #76 + #78. Two assertions in
tools/failure-inject/cpusteal/cpusteal_test.gomatch the same shape we fixed inTestReceiver_SLIBudgetandTestReceiver_SetDegraded: hard absolute upper bound on observed timing, calibrated to fast-runner expectations.require.Less(elapsed, 500ms)for 100ms requestrequire.Less(elapsed, 2s)require.Less(elapsed, 250ms)for cancel responserequire.Less(elapsed, 2s)The lower-bound assertion on
TestRun_HonorsDuration(elapsed >= 95ms) still pins the real contract (busy-loop runs for the requested time). The upper bounds only catch "never returned." This matches the lesson landed inAGENTS.mdvia PR #81 — match perf-budget assertions by the invariant only.Test plan
go test -race -count=3 -v ./tools/failure-inject/cpusteal/— all 4 tests PASS each iteration.make lintclean.make vetclean.require.Less.*Millisecond,assert.Less.*Millisecond,elapsed > N*time.X,WithinDuration,Budgetcallsites,isRaceBuildcallsites) found no other instances of the same shape outside the kernelevents SLI test we already covered.Rollback
Single Edit to restore the original numeric bounds. No dependents; the bounds are local to two test functions.