ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138) by trilamsr · Pull Request #289 · TraceCoreAI/tracecore

trilamsr · 2026-06-01T05:19:19Z

Summary

New policy-matrix workflow gates the tracecore Helm chart against three enterprise-shape admission engines on every chart edit (kind cluster + helm install --dry-run=server).
Three matrix rows adopt upstream curated bundles verbatim — no hand-rolled Rego or Kyverno YAML:
- psa-restricted — Pod Security Admission via namespace labels (KEP-2579, GA in 1.25)
- kyverno-baseline-restricted — kyverno/policies pod-security baseline + restricted (flipped to Enforce)
- gatekeeper-restricted — open-policy-agent/gatekeeper-library PSP constraint templates (7-template minimum that exercises tracecore's pod shape)
Bundle versions are pinned in scripts/policy-matrix-smoke.sh so reproductions are byte-identical to CI; bumping is an explicit code change.
fail-fast: false — operators want to know which gate failed, not "we failed".

Closes #138.

Test plan

CI: all three matrix rows green on this PR (the whole point — chart should pass out of the box).
Local repro: POLICY_ENGINE=kyverno bash scripts/policy-matrix-smoke.sh against a pre-provisioned kind cluster admits the chart.
Mutation gate: temporarily flip containerSecurityContext.allowPrivilegeEscalation: true in values.yaml → at least one engine row must FAIL (verified locally before merge; reverted in this PR).
actionlint .github/workflows/policy-matrix.yml clean (verified locally).
zizmor .github/workflows/policy-matrix.yml produces no findings above the repo-wide artipacked baseline (verified locally).
shellcheck scripts/policy-matrix-smoke.sh clean (verified locally).

ci: kind-cluster matrix asserts the tracecore Helm chart admits cleanly under PSA-restricted, Kyverno baseline+restricted, and Gatekeeper PSP constraint templates on every chart edit.

Adopt upstream curated policy bundles to gate the tracecore Helm chart against three enterprise-shape admission engines on every chart edit: - PSA-restricted via namespace labels (KEP-2579 GA) - kyverno/policies pod-security baseline + restricted (Enforce mode) - open-policy-agent/gatekeeper-library PSP constraint templates The assertion is `helm install --dry-run=server` — the kind API server runs each engine's admission webhook(s) against the rendered chart and rejects it on policy violation. No hand-rolled Rego or Kyverno YAML; bundle versions are pinned in the helper script. Signed-off-by: Tri Lam <tri@maydow.com>

Per fresh-context review of #289: - KYVERNO_POLICIES_REF + GATEKEEPER_LIBRARY_REF were 'main'/'master' branches; upstream has no tagged releases, so byte-reproducibility required commit-SHA pins. Refreshed 2026-05-31: - kyverno/policies: 76be98a25d49ae01278a94ecde8f50f9e08577ef - gatekeeper-library: 53684fab133fd52d77aa42f632bc2ecd52f0447c - Heredoc was missing constraint resources for K8sPSPHostFilesystem and K8sPSPAllowedUsers — the templates were wait-applied but no constraints fired against them. Added both with deny enforcement + parameters that gate the chart's existing hostPath-free, runAsNonRoot+nonRoot-group posture. Strictly more regression coverage; chart already conforms. Signed-off-by: Tri Lam <tri@maydow.com>

## Summary Completes the **A+ deliverables** for issue #138 that were deferred when PR #289 landed. #138 is already closed, so this PR is a follow-up that closes the remaining gaps from the original task brief, not a reopen. What #289 shipped (B/A grade): - `policy-matrix.yml` workflow with three engines (PSA-restricted, Kyverno baseline+restricted, Gatekeeper PSP constraint templates) running `helm install --dry-run=server` against the chart with **default values only**. - Manual mutation test marked as a test-plan checkbox. - No README cross-link. What this PR adds: 1. **Production-values matrix dimension.** Every engine now runs against **both** the chart defaults **and** `install/kubernetes/tracecore/values-production.yaml` (the v1.0-rc1 cut-criteria-10 preset — NetworkPolicy + PDB + ServiceMonitor + hardened gracePeriod + pinned image policy). The original #138 task brief was explicit: validate production values against real policy engines, not just defaults. Matrix grows from 3 rows to 6. 2. **`policy-matrix-mutation` job — automated falsifier.** Applies the existing conftest testdata fixture `bad-allowprivilegeescalation.yaml` via `kubectl apply --dry-run=server` against a namespace governed by each engine, then asserts the API server rejects it. Without this gate, a no-op policy bundle (forgot Enforce mode on Kyverno, forgot to apply Gatekeeper constraints, forgot the PSA namespace label) would let every `policy-matrix` row pass green and ship false confidence. - We bypass `helm` for the mutation because the chart's `values.schema.json` pins `containerSecurityContext.allowPrivilegeEscalation: const false` — helm itself would reject the values before the API server saw the manifest. The point of the mutation gate is to exercise the **API server's** policy chain, not the chart schema (the conftest gate in `chart.yml` already covers that). 3. **Chart README cross-link.** New "Live-cluster policy validation" subsection under "Pod Security Standard compliance" documents the workflow, the engine/bundle versions, the mutation gate, and a local repro recipe for the failure-mode debug path. 4. **Smoke script env knobs.** `VALUES_FILE` (production overlay) and `SKIP_SMOKE` (engine-only provision, used by the mutation job). ## Test plan - [x] `actionlint .github/workflows/policy-matrix.yml` — exit 0 (clean). - [x] `shellcheck scripts/policy-matrix-smoke.sh` — exit 0 (clean). - [x] `zizmor .github/workflows/policy-matrix.yml` — same `artipacked` low-confidence baseline as the rest of the repo, no new findings. - [x] `bash -n scripts/policy-matrix-smoke.sh` — syntax clean. - [x] Pre-commit (golangci-lint, go vet, go mod verify, attribute-namespace-check) — clean. - [ ] CI: all 6 `policy-matrix` rows green (3 engines × {default, production} values profiles). - [ ] CI: all 3 `policy-matrix-mutation` rows green (each engine rejects `bad-allowprivilegeescalation.yaml` with `allowPrivilegeEscalation` in the denial). ## Root cause (why this is needed) Issue #138 acceptance criteria included: "passes today; fails on the daemonsets if they violate a policy". PR #289 implemented the "passes today" half against defaults. It did **not** implement (a) the production-preset coverage the rc1 cut now requires or (b) an automated mutation falsifier — the test plan listed the mutation as a manual checkbox. Both are load-bearing for the falsifiability claim of the gate. This PR closes that gap. ## Hard rules respected - Did **not** enable auto-merge. - Did **not** modify `chart.yml`'s install-to-Ready M5b gate (touched zero install/upgrade-job lines). - Skipped OSS-Fuzz lane per `docs/followups/opportunistic.md` "premature". ```release-notes ci(policy-matrix): production-values matrix dimension + automated mutation gate prove the live-cluster policy bundles (PSA-restricted, Kyverno baseline+restricted, Gatekeeper PSP) actually enforce against the v1.0-rc1 production preset. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>

Tri Lam added 2 commits May 31, 2026 22:18

trilamsr enabled auto-merge (squash) June 1, 2026 05:25

trilamsr merged commit 09b83f2 into main Jun 1, 2026
14 checks passed

trilamsr deleted the ci/policy-matrix-kind branch June 1, 2026 05:33

trilamsr mentioned this pull request Jun 2, 2026

ci(policy-matrix): production values + mutation gate (#138 A+) #475

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138)#289

ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138)#289
trilamsr merged 2 commits into
mainfrom
ci/policy-matrix-kind

trilamsr commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented Jun 1, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant