[ci] selftelemetry: relax degraded-counter regex to accept any positive value#78
Merged
Merged
Conversation
…ve value TestReceiver_SetDegraded asserted the degraded_seconds_total counter matched `0\.0[0-9]+` — an upper-bounded shape requiring the value to fall in `[0.0, 0.1)`. The test's stated intent (per its comment "Value must be >0 (>= ~0.05s)") is just "counter advanced past zero", so the upper bound is brittle artifact, not load-bearing. Caught on PR #76's rerun, where a slow GH Actions runner observed 0.126s of degraded time — well past the regex's implicit 0.1s ceiling, but well within the test's actual intent. Same flake shape as the SLI deflake landed in PR #76: a CI-timing assertion calibrated to fast-runner expectations failing on a slow runner. Fix: regex becomes `\d+\.[0-9]*[1-9]` — match any value with at least one non-zero digit after the decimal point. Accepts 0.05, 0.126, 1.5, etc.; correctly rejects 0, 0.0, 0.000. Verified locally: 10 isolated runs PASS; 20 iterations under GOMAXPROCS=2 + full package PASS. Signed-off-by: Tri Lam <trilamsr@gmail.com>
5 tasks
trilamsr
added a commit
that referenced
this pull request
May 19, 2026
…83) ## Summary Flake-pattern audit follow-up to PR #76 + #78. Two assertions in `tools/failure-inject/cpusteal/cpusteal_test.go` match the same shape we fixed in `TestReceiver_SLIBudget` and `TestReceiver_SetDegraded`: hard absolute upper bound on observed timing, calibrated to fast-runner expectations. | Before | After | What changed | |---|---|---| | `require.Less(elapsed, 500ms)` for 100ms request | `require.Less(elapsed, 2s)` | Hang sentinel, not perf bound — busy-loop scheduler delay under contention can run a 100ms request to 300-400ms | | `require.Less(elapsed, 250ms)` for cancel response | `require.Less(elapsed, 2s)` | Same — context-cancellation latency varies by an order of magnitude under contention | The lower-bound assertion on `TestRun_HonorsDuration` (`elapsed >= 95ms`) still pins the real contract (busy-loop runs for the requested time). The upper bounds only catch "never returned." This matches the lesson landed in `AGENTS.md` via PR #81 — *match perf-budget assertions by the invariant only*. ## Test plan - [x] Local: `go test -race -count=3 -v ./tools/failure-inject/cpusteal/` — all 4 tests PASS each iteration. - [x] `make lint` clean. - [x] `make vet` clean. - [x] Audit completeness verified: broader grep sweep (`require.Less.*Millisecond`, `assert.Less.*Millisecond`, `elapsed > N*time.X`, `WithinDuration`, `Budget` callsites, `isRaceBuild` callsites) found no other instances of the same shape outside the kernelevents SLI test we already covered. - [ ] CI on this PR. ## Rollback Single Edit to restore the original numeric bounds. No dependents; the bounds are local to two test functions. ```release-notes NONE — test stability only. Relaxes two absolute-time assertions in cpusteal's test to hang sentinels rather than performance bounds, matching the pattern landed in PR #76 and #78. No production behavior change. ``` Signed-off-by: Tri Lam <trilamsr@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TestReceiver_SetDegradedasserted thedegraded_seconds_totalcounter matched0\.0[0-9]+— an implicit upper bound of[0.0, 0.1). The test's stated intent (per its comment, "Value must be >0") is just "counter advanced past zero"; the upper bound is brittle artifact.Caught on PR #76's rerun, where a slow GitHub Actions runner observed 0.126s of degraded time — well past the regex's implicit 0.1s ceiling, but well within the test's actual intent. Same flake shape as the SLI deflake landed in PR #76: a CI-timing assertion calibrated to fast-runner expectations failing on a slow runner.
Fix: regex becomes
\d+\.[0-9]*[1-9]— match any value with at least one non-zero digit after the decimal point. Accepts0.05,0.126,1.5, etc.; correctly rejects0,0.0,0.000.Test plan
TestReceiver_SetDegradedunder-race: all PASS.GOMAXPROCS=2+ fullinternal/selftelemetrypackage: all PASS (2.554s).require.Regexpcalls in the file inspected (lines 72, 75, 92, 110, 154): all assert exact integer counter values (2,1,8, etc.) or histogram counts — none have the same upper-bound-on-zero shape. No further changes needed.verify-testpasses on this PR.