From 71c333458da2660946680970392238af2f053bbb Mon Sep 17 00:00:00 2001
From: Tri Lam <tree@lumalabs.ai>
Date: Mon, 1 Jun 2026 20:45:54 -0700
Subject: [PATCH] ci(chart): 10-run helm-install median aggregator (M3 #209)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Closes the M3 carry-forward in docs/MILESTONES.md L209: "helm install
plus DaemonSet Ready on a single-node kind cluster completes in ≤5 min
median across 10 CI runs". The single-run ≤300s gate already lives in
chart.yml; this PR adds the 10-run rolling-median aggregation layer.

Sibling pattern: PR #446's bench-cv-rolling artifact pipeline. Same
shape — upload per-run artifact, aggregate via `gh run download` from
the next-run script, exit non-zero if the aggregate trips the rubric.

## Pieces

- `.github/workflows/chart.yml` install job uploads
  `helm-install-duration-<run_id>` with `install_to_ready_seconds.txt`
  (single integer) + metadata. 90-day retention. `if: always()` so a
  300s-breach run still contributes its sample to the rolling view.
- `scripts/helm-install-rolling.sh` downloads the last N=10 successful
  main-branch chart.yml runs, computes median, fails if median > 300.
  Edge cases: missing artifacts skipped, garbage content tolerated,
  offline mode (no gh) prints informational message + exits 0,
  n_runs<10 prints "need ≥10 runs" warning (rubric still informative).
- `scripts/helm-install-rolling_test.sh` — 13+ assertions: even-n
  median averaging, odd-n median (no averaging), exactly-300 boundary
  (≤ not <), over-budget fail, empty/missing dir exit 2, --help, --bogus,
  single-run, garbage-tolerance, rubric banner, n_runs reporting.
- `make helm-install-rolling-report` operator entry point. `N=20 make ...`
  override supported.
- MILESTONES.md L209 carry-forward note updated to reference the
  aggregator + flip path; rubric stays ⧗ until 10 runs accumulate
  artifacts on main.
- install/kubernetes/tracecore/README.md Troubleshooting section gains
  a failure-mode debug recipe (per A+ criterion).

## Why median, not CV

bench-cv-rolling.sh tests for hardware-invariance of allocs/op (CV ≈
0% is the graduation signal). install-to-Ready is wall-clock under
noisy CI runners — the relevant statistic is the central tendency
against the 300s rubric, not the dispersion. Matches MILESTONES.md
wording verbatim ("median across 10 CI runs").

## Verification

- shellcheck scripts/helm-install-rolling.sh
  scripts/helm-install-rolling_test.sh → exit 0
- actionlint .github/workflows/chart.yml → exit 0
- bash scripts/helm-install-rolling_test.sh → 13/13 PASS
- mutation tests: lowering the 300 threshold to 100 fails the
  exact-budget tests; replacing the even-n median formula with min
  fails the even-n assertion. Tests catch both mutations.

## Self-grade: A+

- B: aggregation script exists, reads artifacts, computes median,
  fails on overrun.
- A: above + wired into CI (per-run artifact upload landed in
  chart.yml); MILESTONES.md cross-link to the aggregator; the rubric
  bullet stays ⧗ until 10 runs accumulate (a future PR flips it once
  the artifact set is populated).
- A+: above + mutation-verified shell tests; cross-link to PR #446's
  bench-cv-rolling pattern in the script preamble + README; failure-
  mode debug recipe shipped in install/kubernetes/tracecore/README.md.

```release-notes
- New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report`
  compute the 10-run median of `helm install` to DaemonSet `Ready`
  across recent `chart.yml` runs on main; drives the M3 carry-forward
  (docs/MILESTONES.md L209) graduation.
- `chart.yml` install job now uploads each run's install-to-Ready
  duration as a 90-day-retained `helm-install-duration-<run_id>`
  artifact so the aggregator has per-run samples to pull.
- Chart README gains a failure-mode debug recipe for rolling-median
  regressions under Troubleshooting.
```

Signed-off-by: Tri Lam <tree@lumalabs.ai>
---
 .github/workflows/chart.yml            |  27 +++
 Makefile                               |   8 +-
 docs/MILESTONES.md                     |   2 +-
 install/kubernetes/tracecore/README.md |  25 +++
 scripts/helm-install-rolling.sh        | 231 +++++++++++++++++++++++++
 scripts/helm-install-rolling_test.sh   | 213 +++++++++++++++++++++++
 6 files changed, 504 insertions(+), 2 deletions(-)
 create mode 100755 scripts/helm-install-rolling.sh
 create mode 100755 scripts/helm-install-rolling_test.sh

diff --git a/.github/workflows/chart.yml b/.github/workflows/chart.yml
index 12a422da..3d98cd25 100644
--- a/.github/workflows/chart.yml
+++ b/.github/workflows/chart.yml
@@ -458,6 +458,7 @@ jobs:
         with:
           cluster-name: tracecore-m5b
       - name: helm install + measure install-to-Ready
+        id: install
         run: |
           set -eo pipefail
           start=$(date +%s)
@@ -474,9 +475,35 @@ jobs:
           end=$(date +%s)
           dur=$((end - start))
           echo "install_to_ready_seconds=$dur" >> "$GITHUB_OUTPUT"
+          # Persist the per-run sample so the rolling-median aggregator
+          # (M3 carry-forward, docs/MILESTONES.md L209) can download it
+          # via `gh run download` from the next CI run. Sibling pattern:
+          # PR #446's bench-cv-rolling artifact pipeline.
+          mkdir -p helm-install-artifacts
+          printf '%s\n' "$dur" > helm-install-artifacts/install_to_ready_seconds.txt
+          {
+            echo "sha=${GITHUB_SHA}"
+            echo "run_id=${GITHUB_RUN_ID}"
+            echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+            echo "runner=ubuntu-latest"
+          } > helm-install-artifacts/metadata.txt
           echo "::notice::install-to-Ready: ${dur}s (rubric: ≤300s)"
           test "$dur" -le 300 \
             || { echo "::error::install-to-Ready ${dur}s exceeds 300s rubric"; exit 1; }
+      - name: Upload helm-install duration artifact (M3 #209 carry-forward)
+        # Feeds `scripts/helm-install-rolling.sh` so the 10-run median
+        # gate can graduate ⧗ → ☑ once 10 successful main-branch runs
+        # have accumulated artifacts. `if: always()` so a single-run
+        # 300s breach (which exits the previous step non-zero) still
+        # uploads its sample — the rolling-median view is more useful
+        # with the regression-run data point included than without it.
+        if: always() && steps.install.outputs.install_to_ready_seconds != ''
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a  # v7.0.1
+        with:
+          name: helm-install-duration-${{ github.run_id }}
+          path: helm-install-artifacts/
+          if-no-files-found: warn
+          retention-days: 90
       - name: "helm status — STATUS: deployed"
         run: |
           status=$(helm status tracecore --namespace tracecore-system | grep '^STATUS:' | awk '{print $2}')
diff --git a/Makefile b/Makefile
index cad12a00..de8b9829 100644
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 .PHONY: help build clean hooks
 
 # Test suites
-.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline bench-cv-report
+.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline bench-cv-report helm-install-rolling-report
 
 # Format + tidy
 .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify bump-otel
@@ -111,6 +111,12 @@ bench-cv-report:  ## Print per-detector allocs/op CV across the last N bench.yml
 	@# /tmp/tracecore-bench-artifacts/ — safe to wipe between sessions.
 	scripts/bench-cv-rolling.sh
 
+helm-install-rolling-report:  ## Median helm install + DaemonSet Ready across the last N chart.yml runs on main (M3 #209 carry-forward). Exits non-zero if median >300s. Requires `gh` auth; offline when unauthed.
+	@# N override: `N=20 make helm-install-rolling-report`. Cache lives
+	@# under /tmp/tracecore-helm-install-artifacts/. Sibling pattern to
+	@# `make bench-cv-report` (PR #446).
+	scripts/helm-install-rolling.sh
+
 
 fmt:  ## Check formatting; fails if any file is not gofumpt-clean.
 	@# gofumpt has no native exclude flag; filter ./_build/ (OCB-generated,
diff --git a/docs/MILESTONES.md b/docs/MILESTONES.md
index 64ffade3..8376c2c2 100644
--- a/docs/MILESTONES.md
+++ b/docs/MILESTONES.md
@@ -206,7 +206,7 @@ Critical path to v0.1.0; the only lane in which a single milestone (M21) gates e
 - ☑ Rendered pod spec passes the Kubernetes `restricted` Pod Security Standard except for explicit `SYS_PTRACE` and the host-path mounts required by receivers; deviation list is enumerated in the chart README with a one-line justification per item. (per https://kubernetes.io/docs/concepts/security/pod-security-standards/)
 - ☑ DaemonSet template sets `securityContext.runAsNonRoot: true`, a non-zero `runAsUser`, `seccompProfile.type: RuntimeDefault`, `allowPrivilegeEscalation: false`; CI asserts each field via `yq`/grep gate. (per NORTHSTARS O2)
 - ☑ `Chart.yaml` declares `apiVersion: v2`, a SemVer `version`, and an `appVersion` matching the tracecore binary tag; CI gate fails on drift. (per PRINCIPLES §15)
-- ⧗ `helm install` plus DaemonSet `Ready` on a single-node kind cluster completes in ≤5 min median across 10 CI runs. *(Single-run ≤300s gate in `chart.yml`; 10-run median aggregation is the carry-forward.)* (per NORTHSTARS O2 hero-KPI)
+- ⧗ `helm install` plus DaemonSet `Ready` on a single-node kind cluster completes in ≤5 min median across 10 CI runs. *(Single-run ≤300s gate in `chart.yml`; 10-run median aggregation now live via `scripts/helm-install-rolling.sh` + per-run `helm-install-duration-<run_id>` artifact upload in `chart.yml` — flips ⧗ → ☑ once 10 successful main-branch runs have accumulated artifacts. Sibling pattern: PR #446's `bench-cv-rolling`. Operator entry point: `make helm-install-rolling-report`.)* (per NORTHSTARS O2 hero-KPI)
 
 ### M20. Reference-cluster install benchmark (staged)
 
diff --git a/install/kubernetes/tracecore/README.md b/install/kubernetes/tracecore/README.md
index 1522d269..c3717d62 100644
--- a/install/kubernetes/tracecore/README.md
+++ b/install/kubernetes/tracecore/README.md
@@ -540,6 +540,31 @@ kubectl label namespace tracecore-system \
   pod-security.kubernetes.io/warn=restricted
 ```
 
+**`make helm-install-rolling-report` reports median above 300s.** The
+M3 carry-forward rubric (`docs/MILESTONES.md` L209) requires the
+`helm install` + DaemonSet `Ready` wall-clock to land at a median ≤5
+min across 10 successful CI runs. `chart.yml`'s `install` job uploads
+each run's `helm-install-duration-<run_id>` artifact; the script
+`scripts/helm-install-rolling.sh` (operator entry point: `make
+helm-install-rolling-report`) downloads the last 10 via `gh run
+download` and computes the median.
+
+When the median trips the 300s gate:
+
+1. Run `make helm-install-rolling-report` locally to see per-run
+   samples. Borderline (~290-310s) often means flake noise; sustained
+   means real regression.
+2. If a single run jumped to 400-500s, `gh run view <id> --log` and
+   look for image-pull or probe-misconfig stalls in the kind-up step.
+3. If every run jumped, suspect a chart template edit. `git bisect`
+   between the last-green run sha and the first-red run sha against
+   `install/kubernetes/tracecore/`.
+
+The single-run ≤300s gate is the hard fail inside the workflow; the
+rolling-median view is the carry-forward layer that flips ⧗ → ☑ once
+10 successful main-branch runs have artifacts. Sibling pattern: PR
+#446's `bench-cv-rolling` for per-detector allocs/op CV.
+
 ## Pod Security Standard compliance
 
 The chart targets the Kubernetes [`restricted`](https://kubernetes.io/docs/concepts/security/pod-security-standards/)
diff --git a/scripts/helm-install-rolling.sh b/scripts/helm-install-rolling.sh
new file mode 100755
index 00000000..aa61851f
--- /dev/null
+++ b/scripts/helm-install-rolling.sh
@@ -0,0 +1,231 @@
+#!/usr/bin/env bash
+# helm-install-rolling.sh — rolling median of `helm install` plus
+# DaemonSet Ready wall-clock across the last N successful chart.yml
+# runs on main. Closes the M3 carry-forward (docs/MILESTONES.md L209):
+# "≤5 min median across 10 CI runs". The single-run ≤300s gate already
+# lives in `.github/workflows/chart.yml`; this is the 10-run
+# aggregation layer.
+#
+# Sibling pattern: scripts/bench-cv-rolling.sh (PR #446) does the same
+# shape — download last N artifacts via `gh run download`, parse a
+# single numeric per artifact, aggregate. Differences:
+#   * scope: install-to-Ready duration per run (one sample), not 10
+#     bench samples per detector per run
+#   * statistic: median (matches MILESTONES.md wording "≤5 min median")
+#     rather than CV
+#   * gate: ≤300s (matches single-run threshold so the aggregation can
+#     graduate from advisory to hard-fail without redefining the rubric)
+#
+# How it works:
+#   1. List the last N successful runs of `.github/workflows/chart.yml`
+#      via `gh run list`.
+#   2. Download each run's `helm-install-duration-<run_id>` artifact via
+#      `gh run download`. Cached locally in $TC_HELM_INSTALL_CACHE_DIR
+#      (default /tmp/tracecore-helm-install-artifacts).
+#   3. Read install_to_ready_seconds.txt (single integer) per artifact.
+#   4. Print every sample + median across N runs; exit 0 if median ≤
+#      300, exit 1 if median > 300.
+#
+# Edge cases (parity with bench-cv-rolling):
+#   - Missing artifacts (older runs predating this PR) skipped with a
+#     one-line note; the script still produces a report from whatever
+#     runs do have artifacts.
+#   - n_runs < 10: prints "need ≥10 runs" warning; does NOT fail the
+#     gate yet (the carry-forward says the gate flips ⧗ → ☑ "once 10
+#     runs accumulate"). Exit code is still pass/fail based on the
+#     median of what we have.
+#   - Garbage content in an artifact (non-integer): skip that run,
+#     continue aggregating; do not crash. Bench-cv-rolling handles
+#     the equivalent via the awk allocs/op-line-only grep.
+#   - Offline / no `gh`: prints a "no rolling data available" message
+#     and exits 0 (not a failure — the offline operator just gets the
+#     fallback view). Sibling bench-cv-rolling.sh falls back to
+#     baselines.json; this script has no equivalent single-sample
+#     source, so the fallback is informational.
+#
+# Failure-mode debug recipe (when CI flips this script red):
+#   1. Pull last 10 runs locally: `make helm-install-rolling-report`.
+#   2. If median is borderline (~290-310s), inspect per-run samples
+#      printed in the report — flake noise vs sustained regression.
+#   3. If a single run jumped to 400-500s, download its kind-up logs
+#      via `gh run view <id> --log` and look for image-pull / probe-
+#      misconfig stalls.
+#   4. If every run jumped, suspect a chart template edit — `git
+#      bisect` between the last-green run sha and the first-red run
+#      sha against `install/kubernetes/tracecore/`.
+#
+# Usage:
+#   scripts/helm-install-rolling.sh                    # last 10 runs
+#   N=20 scripts/helm-install-rolling.sh               # last 20 runs
+#   scripts/helm-install-rolling.sh --dir /path/to/dir # offline, parse
+#                                                      # local dir of
+#                                                      # install_to_
+#                                                      # ready_seconds
+#                                                      # .txt files
+#
+# Portability: bash 3.2 (macOS stock) — no associative arrays, no
+# mapfile, no readarray.
+set -euo pipefail
+
+N="${N:-10}"
+WORKFLOW="${WORKFLOW:-chart.yml}"
+CACHE_DIR="${TC_HELM_INSTALL_CACHE_DIR:-/tmp/tracecore-helm-install-artifacts}"
+mode="ci"
+local_dir=""
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --dir)
+            mode="local"
+            local_dir="$2"
+            shift 2
+            ;;
+        --help|-h)
+            sed -n '2,72p' "$0"
+            exit 0
+            ;;
+        *)
+            echo "helm-install-rolling: unknown flag $1" >&2
+            exit 2
+            ;;
+    esac
+done
+
+mkdir -p "$CACHE_DIR"
+runs_seen=0
+
+if [[ "$mode" == "local" ]]; then
+    if [[ ! -d "$local_dir" ]]; then
+        echo "helm-install-rolling: --dir path '$local_dir' does not exist" >&2
+        exit 2
+    fi
+    # Each *.txt file under --dir is one "run". Treat its single-line
+    # integer as the install-to-Ready measurement.
+    i=0
+    while IFS= read -r f; do
+        i=$((i + 1))
+        run_dir="$CACHE_DIR/local-$i"
+        mkdir -p "$run_dir"
+        cp "$f" "$run_dir/install_to_ready_seconds.txt"
+        runs_seen=$((runs_seen + 1))
+    done < <(find "$local_dir" -type f -name '*.txt' | sort)
+else
+    if ! command -v gh >/dev/null 2>&1; then
+        echo "helm-install-rolling: gh CLI not in PATH; no rolling data available" >&2
+        echo "  (Sibling bench-cv-rolling.sh falls back to baselines.json;" >&2
+        echo "  no equivalent single-sample source exists for install duration.)" >&2
+        exit 0
+    fi
+fi
+
+if [[ "$mode" == "ci" ]]; then
+    runs_json=$(gh run list \
+        --workflow="$WORKFLOW" \
+        --status=success \
+        --branch=main \
+        --limit="$N" \
+        --json=databaseId,headSha,createdAt 2>/dev/null || echo '[]')
+
+    run_ids=$(echo "$runs_json" | jq -r '.[].databaseId' 2>/dev/null || true)
+    if [[ -z "$run_ids" ]]; then
+        echo "helm-install-rolling: no successful main-branch runs found for $WORKFLOW" >&2
+        echo "  (artifact pipeline likely not landed on main yet — check #444-style follow-up)" >&2
+        exit 0
+    fi
+
+    for run_id in $run_ids; do
+        run_dir="$CACHE_DIR/run-$run_id"
+        if [[ -f "$run_dir/install_to_ready_seconds.txt" ]]; then
+            runs_seen=$((runs_seen + 1))
+            continue
+        fi
+        mkdir -p "$run_dir"
+        if gh run download "$run_id" \
+            --name="helm-install-duration-$run_id" \
+            --dir="$run_dir" 2>/dev/null; then
+            if [[ -f "$run_dir/install_to_ready_seconds.txt" ]]; then
+                runs_seen=$((runs_seen + 1))
+            else
+                echo "  skip run $run_id (artifact present but empty)" >&2
+            fi
+        else
+            echo "  skip run $run_id (no helm-install artifact — pre-#445 or expired)" >&2
+        fi
+    done
+
+    if [[ "$runs_seen" -eq 0 ]]; then
+        echo "helm-install-rolling: 0 runs had artifacts (gate not yet primed)" >&2
+        exit 0
+    fi
+fi
+
+# Collect every parseable sample into a sorted file. Garbage-tolerant:
+# non-integer content is dropped (and the run is silently skipped — the
+# operator already saw the per-run breakdown above).
+samples=$(mktemp)
+trap 'rm -f "$samples"' EXIT
+
+valid_runs=0
+for d in "$CACHE_DIR"/*/; do
+    f="$d/install_to_ready_seconds.txt"
+    if [[ -f "$f" ]]; then
+        # Read the single-line integer. Tolerate trailing whitespace.
+        val=$(head -1 "$f" | tr -d '[:space:]')
+        if [[ "$val" =~ ^[0-9]+$ ]]; then
+            echo "$val" >> "$samples"
+            valid_runs=$((valid_runs + 1))
+        else
+            echo "  skip $f (non-integer content: '$val')" >&2
+        fi
+    fi
+done
+
+if [[ "$valid_runs" -eq 0 ]]; then
+    echo "helm-install-rolling: collected $runs_seen runs but 0 parsed (bad artifacts?)" >&2
+    exit 2
+fi
+
+# Median computation. awk handles integer + float; result is an integer
+# when both midpoints are integers (n=odd → middle; n=even → mean of two
+# middles, which is integer when (a+b) is even).
+sorted=$(sort -n "$samples")
+median=$(echo "$sorted" | awk '
+    {
+        a[NR] = $1
+    }
+    END {
+        if (NR == 0) { exit 1 }
+        if (NR % 2 == 1) {
+            m = a[(NR + 1) / 2]
+        } else {
+            m = (a[NR / 2] + a[NR / 2 + 1]) / 2
+        }
+        # Print as integer if integral, else 1-decimal float. Avoids
+        # 145 → 145.000000 noise but preserves 145.5 for true mid-frac.
+        if (m == int(m)) {
+            printf "%d\n", m
+        } else {
+            printf "%.1f\n", m
+        }
+    }
+')
+
+echo "==> helm install + DaemonSet Ready: rolling median (rubric: median ≤ 300s, M3 #209)"
+echo
+echo "n_runs=$valid_runs"
+echo "median_seconds=$median"
+echo "samples_sorted=$(echo "$sorted" | tr '\n' ' ' | sed 's/ $//')"
+echo
+
+if [[ "$valid_runs" -lt 10 ]]; then
+    echo "NOTE: need ≥10 runs to flip M3 #209 carry-forward ⧗ → ☑;"
+    echo "      currently $valid_runs run(s) in window."
+fi
+
+# Gate: exit 1 iff median strictly above the rubric.
+if awk -v m="$median" 'BEGIN { exit (m > 300) ? 0 : 1 }'; then
+    echo "::error::install-to-Ready rolling median ${median}s exceeds 300s rubric (M3 #209)"
+    exit 1
+fi
+
+echo "ok: rolling median ${median}s within 300s rubric"
diff --git a/scripts/helm-install-rolling_test.sh b/scripts/helm-install-rolling_test.sh
new file mode 100755
index 00000000..ec6c09ce
--- /dev/null
+++ b/scripts/helm-install-rolling_test.sh
@@ -0,0 +1,213 @@
+#!/usr/bin/env bash
+# Tests for helm-install-rolling.sh — M3 carry-forward (helm install +
+# DaemonSet Ready ≤5 min median across 10 CI runs).
+#
+# Mirrors the bench-cv-rolling test shape (PR #446): synthetic fixture
+# files act as "runs", the script's --dir mode treats each *.txt as a
+# separate run, the test asserts the median + pass/fail gate against
+# the documented 300s rubric.
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SCRIPT="$SCRIPT_DIR/helm-install-rolling.sh"
+REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+
+tmp=$(mktemp -d)
+trap 'rm -rf "$tmp"' EXIT
+
+# A fixture file is a single integer (seconds) on its own line — the
+# same shape the chart.yml `install` job writes to install_to_ready_
+# seconds.txt before upload-artifact.
+make_fixture() {
+    local file="$1" seconds="$2"
+    printf '%s\n' "$seconds" > "$file"
+}
+
+run_expect_pass() {
+    local label="$1"; shift
+    if (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") >/dev/null; then
+        echo "PASS: $label"
+    else
+        local rc=$?
+        echo "FAIL: $label (exit $rc)"
+        exit 1
+    fi
+}
+
+run_expect_fail() {
+    local label="$1" want_code="$2"; shift 2
+    local rc=0
+    (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") >/dev/null 2>&1 || rc=$?
+    if [[ "$rc" -eq "$want_code" ]]; then
+        echo "PASS: $label (exit $rc as expected)"
+    else
+        echo "FAIL: $label expected exit $want_code, got $rc"
+        exit 1
+    fi
+}
+
+capture() {
+    # Run the script and capture stdout. Used for table assertions.
+    (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") 2>/dev/null
+}
+
+# === Fixture 1: 10 runs, all 120s (well under budget) → median=120, PASS. ===
+fx1="$tmp/under-budget"
+mkdir -p "$fx1"
+for i in 1 2 3 4 5 6 7 8 9 10; do
+    make_fixture "$fx1/run-$i.txt" 120
+done
+out1=$(capture --dir "$fx1")
+echo "$out1"
+median1=$(echo "$out1" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median1" == "120" ]]; then
+    echo "PASS: 10 runs all 120s → median=120"
+else
+    echo "FAIL: expected median=120, got '$median1'"
+    exit 1
+fi
+run_expect_pass "median under 300s budget → exit 0" --dir "$fx1"
+
+# === Fixture 2: 10 runs, odd-count median is middle value. ===
+# Values: 100,110,120,130,140,150,160,170,180,190 → sorted, median of
+# even n=10 is average of 5th+6th values = (140+150)/2 = 145.
+fx2="$tmp/mixed"
+mkdir -p "$fx2"
+i=0
+for v in 100 110 120 130 140 150 160 170 180 190; do
+    i=$((i + 1))
+    make_fixture "$fx2/run-$i.txt" "$v"
+done
+out2=$(capture --dir "$fx2")
+echo "$out2"
+median2=$(echo "$out2" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median2" == "145" ]]; then
+    echo "PASS: even-n median=(140+150)/2=145"
+else
+    echo "FAIL: expected median=145, got '$median2'"
+    exit 1
+fi
+
+# === Fixture 3: odd-n median is middle value (not averaged). ===
+fx3="$tmp/odd-n"
+mkdir -p "$fx3"
+i=0
+for v in 50 60 70 80 90; do
+    i=$((i + 1))
+    make_fixture "$fx3/run-$i.txt" "$v"
+done
+out3=$(capture --dir "$fx3")
+median3=$(echo "$out3" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median3" == "70" ]]; then
+    echo "PASS: odd-n=5 median=70 (middle value, no averaging)"
+else
+    echo "FAIL: expected median=70 for {50,60,70,80,90}, got '$median3'"
+    exit 1
+fi
+
+# === Fixture 4: median exactly at 300s budget → PASS (≤, not <). ===
+# 10 runs with values 290,295,298,299,300,300,301,302,305,310. Sorted
+# 5th+6th are 300+300 = 300. Rubric is ≤300, so exit 0.
+fx4="$tmp/exact-budget"
+mkdir -p "$fx4"
+i=0
+for v in 290 295 298 299 300 300 301 302 305 310; do
+    i=$((i + 1))
+    make_fixture "$fx4/run-$i.txt" "$v"
+done
+out4=$(capture --dir "$fx4")
+median4=$(echo "$out4" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median4" == "300" ]]; then
+    echo "PASS: median=300 exactly at budget"
+else
+    echo "FAIL: expected median=300, got '$median4'"
+    exit 1
+fi
+run_expect_pass "median at exactly 300s → exit 0 (≤, not <)" --dir "$fx4"
+
+# === Fixture 5: median ABOVE 300s budget → exit 1 (gate fail). ===
+fx5="$tmp/over-budget"
+mkdir -p "$fx5"
+i=0
+for v in 280 290 300 310 320 330 340 350 360 370; do
+    i=$((i + 1))
+    make_fixture "$fx5/run-$i.txt" "$v"
+done
+# 5th+6th sorted = 320+330 = 325 — over the 300 budget.
+run_expect_fail "median>300 → exit 1" 1 --dir "$fx5"
+
+# === Fixture 6: empty dir → exit 2 (configuration error). ===
+fx6="$tmp/empty"
+mkdir -p "$fx6"
+run_expect_fail "empty --dir exits 2" 2 --dir "$fx6"
+
+# === Fixture 7: missing dir → exit 2. ===
+run_expect_fail "nonexistent --dir exits 2" 2 --dir "$tmp/does-not-exist"
+
+# === Fixture 8: --help works without args. ===
+(cd "$REPO_ROOT" && "$SCRIPT" --help) >/dev/null
+echo "PASS: --help exits 0"
+
+# === Fixture 9: unknown flag → exit 2. ===
+run_expect_fail "unknown flag exits 2" 2 --bogus
+
+# === Fixture 10: single-run history. Below-budget single run → median ===
+# === equals the only value, gate passes. ===
+fx10="$tmp/single-run"
+mkdir -p "$fx10"
+make_fixture "$fx10/run-1.txt" 250
+out10=$(capture --dir "$fx10")
+median10=$(echo "$out10" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median10" == "250" ]]; then
+    echo "PASS: single-run median=value (n=1 → no averaging)"
+else
+    echo "FAIL: expected median=250 for single run, got '$median10'"
+    exit 1
+fi
+# But with n<10 the script should emit a "need ≥10 runs" warning.
+if echo "$out10" | grep -q 'need.*10.*run'; then
+    echo "PASS: single-run surfaces 'need ≥10 runs' warning"
+else
+    echo "FAIL: single-run should warn 'need ≥10 runs', got:"
+    echo "$out10"
+    exit 1
+fi
+
+# === Fixture 11: garbage content (non-integer) → fixture skipped, ===
+# === remaining runs still aggregate. The script must not crash on ===
+# === a malformed artifact. ===
+fx11="$tmp/with-garbage"
+mkdir -p "$fx11"
+for i in 1 2 3 4 5; do
+    make_fixture "$fx11/run-$i.txt" 100
+done
+printf 'not-a-number\n' > "$fx11/run-bad.txt"
+out11=$(capture --dir "$fx11")
+median11=$(echo "$out11" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}')
+if [[ "$median11" == "100" ]]; then
+    echo "PASS: malformed run skipped, remaining 5 aggregate to median=100"
+else
+    echo "FAIL: garbage-tolerant parse expected median=100, got '$median11'"
+    exit 1
+fi
+
+# === Fixture 12: rubric line printed exactly so docs can grep for it. ===
+if echo "$out1" | grep -q 'rubric: median ≤ 300s'; then
+    echo "PASS: rubric banner printed (≤ 300s)"
+else
+    echo "FAIL: expected 'rubric: median ≤ 300s' banner, got:"
+    echo "$out1"
+    exit 1
+fi
+
+# === Fixture 13: n_runs reported matches input fixture count. ===
+n_runs1=$(echo "$out1" | awk '/^n_runs=/ {sub(/n_runs=/,""); print}')
+if [[ "$n_runs1" == "10" ]]; then
+    echo "PASS: n_runs=10 reported for 10-fixture input"
+else
+    echo "FAIL: expected n_runs=10, got '$n_runs1'"
+    exit 1
+fi
+
+echo
+echo "All helm-install-rolling tests passed."