ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138)#289
Merged
Conversation
added 2 commits
May 31, 2026 22:18
Adopt upstream curated policy bundles to gate the tracecore Helm chart against three enterprise-shape admission engines on every chart edit: - PSA-restricted via namespace labels (KEP-2579 GA) - kyverno/policies pod-security baseline + restricted (Enforce mode) - open-policy-agent/gatekeeper-library PSP constraint templates The assertion is `helm install --dry-run=server` — the kind API server runs each engine's admission webhook(s) against the rendered chart and rejects it on policy violation. No hand-rolled Rego or Kyverno YAML; bundle versions are pinned in the helper script. Signed-off-by: Tri Lam <tri@maydow.com>
Per fresh-context review of #289: - KYVERNO_POLICIES_REF + GATEKEEPER_LIBRARY_REF were 'main'/'master' branches; upstream has no tagged releases, so byte-reproducibility required commit-SHA pins. Refreshed 2026-05-31: - kyverno/policies: 76be98a25d49ae01278a94ecde8f50f9e08577ef - gatekeeper-library: 53684fab133fd52d77aa42f632bc2ecd52f0447c - Heredoc was missing constraint resources for K8sPSPHostFilesystem and K8sPSPAllowedUsers — the templates were wait-applied but no constraints fired against them. Added both with deny enforcement + parameters that gate the chart's existing hostPath-free, runAsNonRoot+nonRoot-group posture. Strictly more regression coverage; chart already conforms. Signed-off-by: Tri Lam <tri@maydow.com>
7 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
## Summary Completes the **A+ deliverables** for issue #138 that were deferred when PR #289 landed. #138 is already closed, so this PR is a follow-up that closes the remaining gaps from the original task brief, not a reopen. What #289 shipped (B/A grade): - `policy-matrix.yml` workflow with three engines (PSA-restricted, Kyverno baseline+restricted, Gatekeeper PSP constraint templates) running `helm install --dry-run=server` against the chart with **default values only**. - Manual mutation test marked as a test-plan checkbox. - No README cross-link. What this PR adds: 1. **Production-values matrix dimension.** Every engine now runs against **both** the chart defaults **and** `install/kubernetes/tracecore/values-production.yaml` (the v1.0-rc1 cut-criteria-10 preset — NetworkPolicy + PDB + ServiceMonitor + hardened gracePeriod + pinned image policy). The original #138 task brief was explicit: validate production values against real policy engines, not just defaults. Matrix grows from 3 rows to 6. 2. **`policy-matrix-mutation` job — automated falsifier.** Applies the existing conftest testdata fixture `bad-allowprivilegeescalation.yaml` via `kubectl apply --dry-run=server` against a namespace governed by each engine, then asserts the API server rejects it. Without this gate, a no-op policy bundle (forgot Enforce mode on Kyverno, forgot to apply Gatekeeper constraints, forgot the PSA namespace label) would let every `policy-matrix` row pass green and ship false confidence. - We bypass `helm` for the mutation because the chart's `values.schema.json` pins `containerSecurityContext.allowPrivilegeEscalation: const false` — helm itself would reject the values before the API server saw the manifest. The point of the mutation gate is to exercise the **API server's** policy chain, not the chart schema (the conftest gate in `chart.yml` already covers that). 3. **Chart README cross-link.** New "Live-cluster policy validation" subsection under "Pod Security Standard compliance" documents the workflow, the engine/bundle versions, the mutation gate, and a local repro recipe for the failure-mode debug path. 4. **Smoke script env knobs.** `VALUES_FILE` (production overlay) and `SKIP_SMOKE` (engine-only provision, used by the mutation job). ## Test plan - [x] `actionlint .github/workflows/policy-matrix.yml` — exit 0 (clean). - [x] `shellcheck scripts/policy-matrix-smoke.sh` — exit 0 (clean). - [x] `zizmor .github/workflows/policy-matrix.yml` — same `artipacked` low-confidence baseline as the rest of the repo, no new findings. - [x] `bash -n scripts/policy-matrix-smoke.sh` — syntax clean. - [x] Pre-commit (golangci-lint, go vet, go mod verify, attribute-namespace-check) — clean. - [ ] CI: all 6 `policy-matrix` rows green (3 engines × {default, production} values profiles). - [ ] CI: all 3 `policy-matrix-mutation` rows green (each engine rejects `bad-allowprivilegeescalation.yaml` with `allowPrivilegeEscalation` in the denial). ## Root cause (why this is needed) Issue #138 acceptance criteria included: "passes today; fails on the daemonsets if they violate a policy". PR #289 implemented the "passes today" half against defaults. It did **not** implement (a) the production-preset coverage the rc1 cut now requires or (b) an automated mutation falsifier — the test plan listed the mutation as a manual checkbox. Both are load-bearing for the falsifiability claim of the gate. This PR closes that gap. ## Hard rules respected - Did **not** enable auto-merge. - Did **not** modify `chart.yml`'s install-to-Ready M5b gate (touched zero install/upgrade-job lines). - Skipped OSS-Fuzz lane per `docs/followups/opportunistic.md` "premature". ```release-notes ci(policy-matrix): production-values matrix dimension + automated mutation gate prove the live-cluster policy bundles (PSA-restricted, Kyverno baseline+restricted, Gatekeeper PSP) actually enforce against the v1.0-rc1 production preset. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
policy-matrixworkflow gates the tracecore Helm chart against three enterprise-shape admission engines on every chart edit (kind cluster +helm install --dry-run=server).psa-restricted— Pod Security Admission via namespace labels (KEP-2579, GA in 1.25)kyverno-baseline-restricted—kyverno/policiespod-securitybaseline+restricted(flipped to Enforce)gatekeeper-restricted—open-policy-agent/gatekeeper-libraryPSP constraint templates (7-template minimum that exercises tracecore's pod shape)scripts/policy-matrix-smoke.shso reproductions are byte-identical to CI; bumping is an explicit code change.fail-fast: false— operators want to know which gate failed, not "we failed".Closes #138.
Test plan
POLICY_ENGINE=kyverno bash scripts/policy-matrix-smoke.shagainst a pre-provisionedkindcluster admits the chart.containerSecurityContext.allowPrivilegeEscalation: trueinvalues.yaml→ at least one engine row must FAIL (verified locally before merge; reverted in this PR).actionlint .github/workflows/policy-matrix.ymlclean (verified locally).zizmor .github/workflows/policy-matrix.ymlproduces no findings above the repo-wideartipackedbaseline (verified locally).shellcheck scripts/policy-matrix-smoke.shclean (verified locally).