From 71c333458da2660946680970392238af2f053bbb Mon Sep 17 00:00:00 2001 From: Tri Lam Date: Mon, 1 Jun 2026 20:45:54 -0700 Subject: [PATCH] ci(chart): 10-run helm-install median aggregator (M3 #209) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the M3 carry-forward in docs/MILESTONES.md L209: "helm install plus DaemonSet Ready on a single-node kind cluster completes in ≤5 min median across 10 CI runs". The single-run ≤300s gate already lives in chart.yml; this PR adds the 10-run rolling-median aggregation layer. Sibling pattern: PR #446's bench-cv-rolling artifact pipeline. Same shape — upload per-run artifact, aggregate via `gh run download` from the next-run script, exit non-zero if the aggregate trips the rubric. ## Pieces - `.github/workflows/chart.yml` install job uploads `helm-install-duration-` with `install_to_ready_seconds.txt` (single integer) + metadata. 90-day retention. `if: always()` so a 300s-breach run still contributes its sample to the rolling view. - `scripts/helm-install-rolling.sh` downloads the last N=10 successful main-branch chart.yml runs, computes median, fails if median > 300. Edge cases: missing artifacts skipped, garbage content tolerated, offline mode (no gh) prints informational message + exits 0, n_runs<10 prints "need ≥10 runs" warning (rubric still informative). - `scripts/helm-install-rolling_test.sh` — 13+ assertions: even-n median averaging, odd-n median (no averaging), exactly-300 boundary (≤ not <), over-budget fail, empty/missing dir exit 2, --help, --bogus, single-run, garbage-tolerance, rubric banner, n_runs reporting. - `make helm-install-rolling-report` operator entry point. `N=20 make ...` override supported. - MILESTONES.md L209 carry-forward note updated to reference the aggregator + flip path; rubric stays ⧗ until 10 runs accumulate artifacts on main. - install/kubernetes/tracecore/README.md Troubleshooting section gains a failure-mode debug recipe (per A+ criterion). ## Why median, not CV bench-cv-rolling.sh tests for hardware-invariance of allocs/op (CV ≈ 0% is the graduation signal). install-to-Ready is wall-clock under noisy CI runners — the relevant statistic is the central tendency against the 300s rubric, not the dispersion. Matches MILESTONES.md wording verbatim ("median across 10 CI runs"). ## Verification - shellcheck scripts/helm-install-rolling.sh scripts/helm-install-rolling_test.sh → exit 0 - actionlint .github/workflows/chart.yml → exit 0 - bash scripts/helm-install-rolling_test.sh → 13/13 PASS - mutation tests: lowering the 300 threshold to 100 fails the exact-budget tests; replacing the even-n median formula with min fails the even-n assertion. Tests catch both mutations. ## Self-grade: A+ - B: aggregation script exists, reads artifacts, computes median, fails on overrun. - A: above + wired into CI (per-run artifact upload landed in chart.yml); MILESTONES.md cross-link to the aggregator; the rubric bullet stays ⧗ until 10 runs accumulate (a future PR flips it once the artifact set is populated). - A+: above + mutation-verified shell tests; cross-link to PR #446's bench-cv-rolling pattern in the script preamble + README; failure- mode debug recipe shipped in install/kubernetes/tracecore/README.md. ```release-notes - New `scripts/helm-install-rolling.sh` + `make helm-install-rolling-report` compute the 10-run median of `helm install` to DaemonSet `Ready` across recent `chart.yml` runs on main; drives the M3 carry-forward (docs/MILESTONES.md L209) graduation. - `chart.yml` install job now uploads each run's install-to-Ready duration as a 90-day-retained `helm-install-duration-` artifact so the aggregator has per-run samples to pull. - Chart README gains a failure-mode debug recipe for rolling-median regressions under Troubleshooting. ``` Signed-off-by: Tri Lam --- .github/workflows/chart.yml | 27 +++ Makefile | 8 +- docs/MILESTONES.md | 2 +- install/kubernetes/tracecore/README.md | 25 +++ scripts/helm-install-rolling.sh | 231 +++++++++++++++++++++++++ scripts/helm-install-rolling_test.sh | 213 +++++++++++++++++++++++ 6 files changed, 504 insertions(+), 2 deletions(-) create mode 100755 scripts/helm-install-rolling.sh create mode 100755 scripts/helm-install-rolling_test.sh diff --git a/.github/workflows/chart.yml b/.github/workflows/chart.yml index 12a422da..3d98cd25 100644 --- a/.github/workflows/chart.yml +++ b/.github/workflows/chart.yml @@ -458,6 +458,7 @@ jobs: with: cluster-name: tracecore-m5b - name: helm install + measure install-to-Ready + id: install run: | set -eo pipefail start=$(date +%s) @@ -474,9 +475,35 @@ jobs: end=$(date +%s) dur=$((end - start)) echo "install_to_ready_seconds=$dur" >> "$GITHUB_OUTPUT" + # Persist the per-run sample so the rolling-median aggregator + # (M3 carry-forward, docs/MILESTONES.md L209) can download it + # via `gh run download` from the next CI run. Sibling pattern: + # PR #446's bench-cv-rolling artifact pipeline. + mkdir -p helm-install-artifacts + printf '%s\n' "$dur" > helm-install-artifacts/install_to_ready_seconds.txt + { + echo "sha=${GITHUB_SHA}" + echo "run_id=${GITHUB_RUN_ID}" + echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" + echo "runner=ubuntu-latest" + } > helm-install-artifacts/metadata.txt echo "::notice::install-to-Ready: ${dur}s (rubric: ≤300s)" test "$dur" -le 300 \ || { echo "::error::install-to-Ready ${dur}s exceeds 300s rubric"; exit 1; } + - name: Upload helm-install duration artifact (M3 #209 carry-forward) + # Feeds `scripts/helm-install-rolling.sh` so the 10-run median + # gate can graduate ⧗ → ☑ once 10 successful main-branch runs + # have accumulated artifacts. `if: always()` so a single-run + # 300s breach (which exits the previous step non-zero) still + # uploads its sample — the rolling-median view is more useful + # with the regression-run data point included than without it. + if: always() && steps.install.outputs.install_to_ready_seconds != '' + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 + with: + name: helm-install-duration-${{ github.run_id }} + path: helm-install-artifacts/ + if-no-files-found: warn + retention-days: 90 - name: "helm status — STATUS: deployed" run: | status=$(helm status tracecore --namespace tracecore-system | grep '^STATUS:' | awk '{print $2}') diff --git a/Makefile b/Makefile index cad12a00..de8b9829 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ .PHONY: help build clean hooks # Test suites -.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline bench-cv-report +.PHONY: test test-extras test-extras-sustained test-extras-fuzz test-extras-fuzz-kmsg test-extras-fuzz-journald test-extras-fuzz-nccl-fr test-extras-race bench bench-check bench-allocs-check bench-baseline bench-detectors bench-detectors-check bench-detectors-baseline bench-cv-report helm-install-rolling-report # Format + tidy .PHONY: fmt fmt-fix vet lint lint-fix tidy tidy-check mod-verify bump-otel @@ -111,6 +111,12 @@ bench-cv-report: ## Print per-detector allocs/op CV across the last N bench.yml @# /tmp/tracecore-bench-artifacts/ — safe to wipe between sessions. scripts/bench-cv-rolling.sh +helm-install-rolling-report: ## Median helm install + DaemonSet Ready across the last N chart.yml runs on main (M3 #209 carry-forward). Exits non-zero if median >300s. Requires `gh` auth; offline when unauthed. + @# N override: `N=20 make helm-install-rolling-report`. Cache lives + @# under /tmp/tracecore-helm-install-artifacts/. Sibling pattern to + @# `make bench-cv-report` (PR #446). + scripts/helm-install-rolling.sh + fmt: ## Check formatting; fails if any file is not gofumpt-clean. @# gofumpt has no native exclude flag; filter ./_build/ (OCB-generated, diff --git a/docs/MILESTONES.md b/docs/MILESTONES.md index 64ffade3..8376c2c2 100644 --- a/docs/MILESTONES.md +++ b/docs/MILESTONES.md @@ -206,7 +206,7 @@ Critical path to v0.1.0; the only lane in which a single milestone (M21) gates e - ☑ Rendered pod spec passes the Kubernetes `restricted` Pod Security Standard except for explicit `SYS_PTRACE` and the host-path mounts required by receivers; deviation list is enumerated in the chart README with a one-line justification per item. (per https://kubernetes.io/docs/concepts/security/pod-security-standards/) - ☑ DaemonSet template sets `securityContext.runAsNonRoot: true`, a non-zero `runAsUser`, `seccompProfile.type: RuntimeDefault`, `allowPrivilegeEscalation: false`; CI asserts each field via `yq`/grep gate. (per NORTHSTARS O2) - ☑ `Chart.yaml` declares `apiVersion: v2`, a SemVer `version`, and an `appVersion` matching the tracecore binary tag; CI gate fails on drift. (per PRINCIPLES §15) -- ⧗ `helm install` plus DaemonSet `Ready` on a single-node kind cluster completes in ≤5 min median across 10 CI runs. *(Single-run ≤300s gate in `chart.yml`; 10-run median aggregation is the carry-forward.)* (per NORTHSTARS O2 hero-KPI) +- ⧗ `helm install` plus DaemonSet `Ready` on a single-node kind cluster completes in ≤5 min median across 10 CI runs. *(Single-run ≤300s gate in `chart.yml`; 10-run median aggregation now live via `scripts/helm-install-rolling.sh` + per-run `helm-install-duration-` artifact upload in `chart.yml` — flips ⧗ → ☑ once 10 successful main-branch runs have accumulated artifacts. Sibling pattern: PR #446's `bench-cv-rolling`. Operator entry point: `make helm-install-rolling-report`.)* (per NORTHSTARS O2 hero-KPI) ### M20. Reference-cluster install benchmark (staged) diff --git a/install/kubernetes/tracecore/README.md b/install/kubernetes/tracecore/README.md index 1522d269..c3717d62 100644 --- a/install/kubernetes/tracecore/README.md +++ b/install/kubernetes/tracecore/README.md @@ -540,6 +540,31 @@ kubectl label namespace tracecore-system \ pod-security.kubernetes.io/warn=restricted ``` +**`make helm-install-rolling-report` reports median above 300s.** The +M3 carry-forward rubric (`docs/MILESTONES.md` L209) requires the +`helm install` + DaemonSet `Ready` wall-clock to land at a median ≤5 +min across 10 successful CI runs. `chart.yml`'s `install` job uploads +each run's `helm-install-duration-` artifact; the script +`scripts/helm-install-rolling.sh` (operator entry point: `make +helm-install-rolling-report`) downloads the last 10 via `gh run +download` and computes the median. + +When the median trips the 300s gate: + +1. Run `make helm-install-rolling-report` locally to see per-run + samples. Borderline (~290-310s) often means flake noise; sustained + means real regression. +2. If a single run jumped to 400-500s, `gh run view --log` and + look for image-pull or probe-misconfig stalls in the kind-up step. +3. If every run jumped, suspect a chart template edit. `git bisect` + between the last-green run sha and the first-red run sha against + `install/kubernetes/tracecore/`. + +The single-run ≤300s gate is the hard fail inside the workflow; the +rolling-median view is the carry-forward layer that flips ⧗ → ☑ once +10 successful main-branch runs have artifacts. Sibling pattern: PR +#446's `bench-cv-rolling` for per-detector allocs/op CV. + ## Pod Security Standard compliance The chart targets the Kubernetes [`restricted`](https://kubernetes.io/docs/concepts/security/pod-security-standards/) diff --git a/scripts/helm-install-rolling.sh b/scripts/helm-install-rolling.sh new file mode 100755 index 00000000..aa61851f --- /dev/null +++ b/scripts/helm-install-rolling.sh @@ -0,0 +1,231 @@ +#!/usr/bin/env bash +# helm-install-rolling.sh — rolling median of `helm install` plus +# DaemonSet Ready wall-clock across the last N successful chart.yml +# runs on main. Closes the M3 carry-forward (docs/MILESTONES.md L209): +# "≤5 min median across 10 CI runs". The single-run ≤300s gate already +# lives in `.github/workflows/chart.yml`; this is the 10-run +# aggregation layer. +# +# Sibling pattern: scripts/bench-cv-rolling.sh (PR #446) does the same +# shape — download last N artifacts via `gh run download`, parse a +# single numeric per artifact, aggregate. Differences: +# * scope: install-to-Ready duration per run (one sample), not 10 +# bench samples per detector per run +# * statistic: median (matches MILESTONES.md wording "≤5 min median") +# rather than CV +# * gate: ≤300s (matches single-run threshold so the aggregation can +# graduate from advisory to hard-fail without redefining the rubric) +# +# How it works: +# 1. List the last N successful runs of `.github/workflows/chart.yml` +# via `gh run list`. +# 2. Download each run's `helm-install-duration-` artifact via +# `gh run download`. Cached locally in $TC_HELM_INSTALL_CACHE_DIR +# (default /tmp/tracecore-helm-install-artifacts). +# 3. Read install_to_ready_seconds.txt (single integer) per artifact. +# 4. Print every sample + median across N runs; exit 0 if median ≤ +# 300, exit 1 if median > 300. +# +# Edge cases (parity with bench-cv-rolling): +# - Missing artifacts (older runs predating this PR) skipped with a +# one-line note; the script still produces a report from whatever +# runs do have artifacts. +# - n_runs < 10: prints "need ≥10 runs" warning; does NOT fail the +# gate yet (the carry-forward says the gate flips ⧗ → ☑ "once 10 +# runs accumulate"). Exit code is still pass/fail based on the +# median of what we have. +# - Garbage content in an artifact (non-integer): skip that run, +# continue aggregating; do not crash. Bench-cv-rolling handles +# the equivalent via the awk allocs/op-line-only grep. +# - Offline / no `gh`: prints a "no rolling data available" message +# and exits 0 (not a failure — the offline operator just gets the +# fallback view). Sibling bench-cv-rolling.sh falls back to +# baselines.json; this script has no equivalent single-sample +# source, so the fallback is informational. +# +# Failure-mode debug recipe (when CI flips this script red): +# 1. Pull last 10 runs locally: `make helm-install-rolling-report`. +# 2. If median is borderline (~290-310s), inspect per-run samples +# printed in the report — flake noise vs sustained regression. +# 3. If a single run jumped to 400-500s, download its kind-up logs +# via `gh run view --log` and look for image-pull / probe- +# misconfig stalls. +# 4. If every run jumped, suspect a chart template edit — `git +# bisect` between the last-green run sha and the first-red run +# sha against `install/kubernetes/tracecore/`. +# +# Usage: +# scripts/helm-install-rolling.sh # last 10 runs +# N=20 scripts/helm-install-rolling.sh # last 20 runs +# scripts/helm-install-rolling.sh --dir /path/to/dir # offline, parse +# # local dir of +# # install_to_ +# # ready_seconds +# # .txt files +# +# Portability: bash 3.2 (macOS stock) — no associative arrays, no +# mapfile, no readarray. +set -euo pipefail + +N="${N:-10}" +WORKFLOW="${WORKFLOW:-chart.yml}" +CACHE_DIR="${TC_HELM_INSTALL_CACHE_DIR:-/tmp/tracecore-helm-install-artifacts}" +mode="ci" +local_dir="" + +while [[ $# -gt 0 ]]; do + case "$1" in + --dir) + mode="local" + local_dir="$2" + shift 2 + ;; + --help|-h) + sed -n '2,72p' "$0" + exit 0 + ;; + *) + echo "helm-install-rolling: unknown flag $1" >&2 + exit 2 + ;; + esac +done + +mkdir -p "$CACHE_DIR" +runs_seen=0 + +if [[ "$mode" == "local" ]]; then + if [[ ! -d "$local_dir" ]]; then + echo "helm-install-rolling: --dir path '$local_dir' does not exist" >&2 + exit 2 + fi + # Each *.txt file under --dir is one "run". Treat its single-line + # integer as the install-to-Ready measurement. + i=0 + while IFS= read -r f; do + i=$((i + 1)) + run_dir="$CACHE_DIR/local-$i" + mkdir -p "$run_dir" + cp "$f" "$run_dir/install_to_ready_seconds.txt" + runs_seen=$((runs_seen + 1)) + done < <(find "$local_dir" -type f -name '*.txt' | sort) +else + if ! command -v gh >/dev/null 2>&1; then + echo "helm-install-rolling: gh CLI not in PATH; no rolling data available" >&2 + echo " (Sibling bench-cv-rolling.sh falls back to baselines.json;" >&2 + echo " no equivalent single-sample source exists for install duration.)" >&2 + exit 0 + fi +fi + +if [[ "$mode" == "ci" ]]; then + runs_json=$(gh run list \ + --workflow="$WORKFLOW" \ + --status=success \ + --branch=main \ + --limit="$N" \ + --json=databaseId,headSha,createdAt 2>/dev/null || echo '[]') + + run_ids=$(echo "$runs_json" | jq -r '.[].databaseId' 2>/dev/null || true) + if [[ -z "$run_ids" ]]; then + echo "helm-install-rolling: no successful main-branch runs found for $WORKFLOW" >&2 + echo " (artifact pipeline likely not landed on main yet — check #444-style follow-up)" >&2 + exit 0 + fi + + for run_id in $run_ids; do + run_dir="$CACHE_DIR/run-$run_id" + if [[ -f "$run_dir/install_to_ready_seconds.txt" ]]; then + runs_seen=$((runs_seen + 1)) + continue + fi + mkdir -p "$run_dir" + if gh run download "$run_id" \ + --name="helm-install-duration-$run_id" \ + --dir="$run_dir" 2>/dev/null; then + if [[ -f "$run_dir/install_to_ready_seconds.txt" ]]; then + runs_seen=$((runs_seen + 1)) + else + echo " skip run $run_id (artifact present but empty)" >&2 + fi + else + echo " skip run $run_id (no helm-install artifact — pre-#445 or expired)" >&2 + fi + done + + if [[ "$runs_seen" -eq 0 ]]; then + echo "helm-install-rolling: 0 runs had artifacts (gate not yet primed)" >&2 + exit 0 + fi +fi + +# Collect every parseable sample into a sorted file. Garbage-tolerant: +# non-integer content is dropped (and the run is silently skipped — the +# operator already saw the per-run breakdown above). +samples=$(mktemp) +trap 'rm -f "$samples"' EXIT + +valid_runs=0 +for d in "$CACHE_DIR"/*/; do + f="$d/install_to_ready_seconds.txt" + if [[ -f "$f" ]]; then + # Read the single-line integer. Tolerate trailing whitespace. + val=$(head -1 "$f" | tr -d '[:space:]') + if [[ "$val" =~ ^[0-9]+$ ]]; then + echo "$val" >> "$samples" + valid_runs=$((valid_runs + 1)) + else + echo " skip $f (non-integer content: '$val')" >&2 + fi + fi +done + +if [[ "$valid_runs" -eq 0 ]]; then + echo "helm-install-rolling: collected $runs_seen runs but 0 parsed (bad artifacts?)" >&2 + exit 2 +fi + +# Median computation. awk handles integer + float; result is an integer +# when both midpoints are integers (n=odd → middle; n=even → mean of two +# middles, which is integer when (a+b) is even). +sorted=$(sort -n "$samples") +median=$(echo "$sorted" | awk ' + { + a[NR] = $1 + } + END { + if (NR == 0) { exit 1 } + if (NR % 2 == 1) { + m = a[(NR + 1) / 2] + } else { + m = (a[NR / 2] + a[NR / 2 + 1]) / 2 + } + # Print as integer if integral, else 1-decimal float. Avoids + # 145 → 145.000000 noise but preserves 145.5 for true mid-frac. + if (m == int(m)) { + printf "%d\n", m + } else { + printf "%.1f\n", m + } + } +') + +echo "==> helm install + DaemonSet Ready: rolling median (rubric: median ≤ 300s, M3 #209)" +echo +echo "n_runs=$valid_runs" +echo "median_seconds=$median" +echo "samples_sorted=$(echo "$sorted" | tr '\n' ' ' | sed 's/ $//')" +echo + +if [[ "$valid_runs" -lt 10 ]]; then + echo "NOTE: need ≥10 runs to flip M3 #209 carry-forward ⧗ → ☑;" + echo " currently $valid_runs run(s) in window." +fi + +# Gate: exit 1 iff median strictly above the rubric. +if awk -v m="$median" 'BEGIN { exit (m > 300) ? 0 : 1 }'; then + echo "::error::install-to-Ready rolling median ${median}s exceeds 300s rubric (M3 #209)" + exit 1 +fi + +echo "ok: rolling median ${median}s within 300s rubric" diff --git a/scripts/helm-install-rolling_test.sh b/scripts/helm-install-rolling_test.sh new file mode 100755 index 00000000..ec6c09ce --- /dev/null +++ b/scripts/helm-install-rolling_test.sh @@ -0,0 +1,213 @@ +#!/usr/bin/env bash +# Tests for helm-install-rolling.sh — M3 carry-forward (helm install + +# DaemonSet Ready ≤5 min median across 10 CI runs). +# +# Mirrors the bench-cv-rolling test shape (PR #446): synthetic fixture +# files act as "runs", the script's --dir mode treats each *.txt as a +# separate run, the test asserts the median + pass/fail gate against +# the documented 300s rubric. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +SCRIPT="$SCRIPT_DIR/helm-install-rolling.sh" +REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" + +tmp=$(mktemp -d) +trap 'rm -rf "$tmp"' EXIT + +# A fixture file is a single integer (seconds) on its own line — the +# same shape the chart.yml `install` job writes to install_to_ready_ +# seconds.txt before upload-artifact. +make_fixture() { + local file="$1" seconds="$2" + printf '%s\n' "$seconds" > "$file" +} + +run_expect_pass() { + local label="$1"; shift + if (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") >/dev/null; then + echo "PASS: $label" + else + local rc=$? + echo "FAIL: $label (exit $rc)" + exit 1 + fi +} + +run_expect_fail() { + local label="$1" want_code="$2"; shift 2 + local rc=0 + (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") >/dev/null 2>&1 || rc=$? + if [[ "$rc" -eq "$want_code" ]]; then + echo "PASS: $label (exit $rc as expected)" + else + echo "FAIL: $label expected exit $want_code, got $rc" + exit 1 + fi +} + +capture() { + # Run the script and capture stdout. Used for table assertions. + (cd "$REPO_ROOT" && TC_HELM_INSTALL_CACHE_DIR="$tmp/cache-$RANDOM" "$SCRIPT" "$@") 2>/dev/null +} + +# === Fixture 1: 10 runs, all 120s (well under budget) → median=120, PASS. === +fx1="$tmp/under-budget" +mkdir -p "$fx1" +for i in 1 2 3 4 5 6 7 8 9 10; do + make_fixture "$fx1/run-$i.txt" 120 +done +out1=$(capture --dir "$fx1") +echo "$out1" +median1=$(echo "$out1" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median1" == "120" ]]; then + echo "PASS: 10 runs all 120s → median=120" +else + echo "FAIL: expected median=120, got '$median1'" + exit 1 +fi +run_expect_pass "median under 300s budget → exit 0" --dir "$fx1" + +# === Fixture 2: 10 runs, odd-count median is middle value. === +# Values: 100,110,120,130,140,150,160,170,180,190 → sorted, median of +# even n=10 is average of 5th+6th values = (140+150)/2 = 145. +fx2="$tmp/mixed" +mkdir -p "$fx2" +i=0 +for v in 100 110 120 130 140 150 160 170 180 190; do + i=$((i + 1)) + make_fixture "$fx2/run-$i.txt" "$v" +done +out2=$(capture --dir "$fx2") +echo "$out2" +median2=$(echo "$out2" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median2" == "145" ]]; then + echo "PASS: even-n median=(140+150)/2=145" +else + echo "FAIL: expected median=145, got '$median2'" + exit 1 +fi + +# === Fixture 3: odd-n median is middle value (not averaged). === +fx3="$tmp/odd-n" +mkdir -p "$fx3" +i=0 +for v in 50 60 70 80 90; do + i=$((i + 1)) + make_fixture "$fx3/run-$i.txt" "$v" +done +out3=$(capture --dir "$fx3") +median3=$(echo "$out3" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median3" == "70" ]]; then + echo "PASS: odd-n=5 median=70 (middle value, no averaging)" +else + echo "FAIL: expected median=70 for {50,60,70,80,90}, got '$median3'" + exit 1 +fi + +# === Fixture 4: median exactly at 300s budget → PASS (≤, not <). === +# 10 runs with values 290,295,298,299,300,300,301,302,305,310. Sorted +# 5th+6th are 300+300 = 300. Rubric is ≤300, so exit 0. +fx4="$tmp/exact-budget" +mkdir -p "$fx4" +i=0 +for v in 290 295 298 299 300 300 301 302 305 310; do + i=$((i + 1)) + make_fixture "$fx4/run-$i.txt" "$v" +done +out4=$(capture --dir "$fx4") +median4=$(echo "$out4" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median4" == "300" ]]; then + echo "PASS: median=300 exactly at budget" +else + echo "FAIL: expected median=300, got '$median4'" + exit 1 +fi +run_expect_pass "median at exactly 300s → exit 0 (≤, not <)" --dir "$fx4" + +# === Fixture 5: median ABOVE 300s budget → exit 1 (gate fail). === +fx5="$tmp/over-budget" +mkdir -p "$fx5" +i=0 +for v in 280 290 300 310 320 330 340 350 360 370; do + i=$((i + 1)) + make_fixture "$fx5/run-$i.txt" "$v" +done +# 5th+6th sorted = 320+330 = 325 — over the 300 budget. +run_expect_fail "median>300 → exit 1" 1 --dir "$fx5" + +# === Fixture 6: empty dir → exit 2 (configuration error). === +fx6="$tmp/empty" +mkdir -p "$fx6" +run_expect_fail "empty --dir exits 2" 2 --dir "$fx6" + +# === Fixture 7: missing dir → exit 2. === +run_expect_fail "nonexistent --dir exits 2" 2 --dir "$tmp/does-not-exist" + +# === Fixture 8: --help works without args. === +(cd "$REPO_ROOT" && "$SCRIPT" --help) >/dev/null +echo "PASS: --help exits 0" + +# === Fixture 9: unknown flag → exit 2. === +run_expect_fail "unknown flag exits 2" 2 --bogus + +# === Fixture 10: single-run history. Below-budget single run → median === +# === equals the only value, gate passes. === +fx10="$tmp/single-run" +mkdir -p "$fx10" +make_fixture "$fx10/run-1.txt" 250 +out10=$(capture --dir "$fx10") +median10=$(echo "$out10" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median10" == "250" ]]; then + echo "PASS: single-run median=value (n=1 → no averaging)" +else + echo "FAIL: expected median=250 for single run, got '$median10'" + exit 1 +fi +# But with n<10 the script should emit a "need ≥10 runs" warning. +if echo "$out10" | grep -q 'need.*10.*run'; then + echo "PASS: single-run surfaces 'need ≥10 runs' warning" +else + echo "FAIL: single-run should warn 'need ≥10 runs', got:" + echo "$out10" + exit 1 +fi + +# === Fixture 11: garbage content (non-integer) → fixture skipped, === +# === remaining runs still aggregate. The script must not crash on === +# === a malformed artifact. === +fx11="$tmp/with-garbage" +mkdir -p "$fx11" +for i in 1 2 3 4 5; do + make_fixture "$fx11/run-$i.txt" 100 +done +printf 'not-a-number\n' > "$fx11/run-bad.txt" +out11=$(capture --dir "$fx11") +median11=$(echo "$out11" | awk '/^median_seconds=/ {sub(/median_seconds=/,""); print}') +if [[ "$median11" == "100" ]]; then + echo "PASS: malformed run skipped, remaining 5 aggregate to median=100" +else + echo "FAIL: garbage-tolerant parse expected median=100, got '$median11'" + exit 1 +fi + +# === Fixture 12: rubric line printed exactly so docs can grep for it. === +if echo "$out1" | grep -q 'rubric: median ≤ 300s'; then + echo "PASS: rubric banner printed (≤ 300s)" +else + echo "FAIL: expected 'rubric: median ≤ 300s' banner, got:" + echo "$out1" + exit 1 +fi + +# === Fixture 13: n_runs reported matches input fixture count. === +n_runs1=$(echo "$out1" | awk '/^n_runs=/ {sub(/n_runs=/,""); print}') +if [[ "$n_runs1" == "10" ]]; then + echo "PASS: n_runs=10 reported for 10-fixture input" +else + echo "FAIL: expected n_runs=10, got '$n_runs1'" + exit 1 +fi + +echo +echo "All helm-install-rolling tests passed."