Skip to content

ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138)#289

Merged
trilamsr merged 2 commits into
mainfrom
ci/policy-matrix-kind
Jun 1, 2026
Merged

ci(policy): kind matrix (PSA + Kyverno + Gatekeeper) (closes #138)#289
trilamsr merged 2 commits into
mainfrom
ci/policy-matrix-kind

Conversation

@trilamsr

@trilamsr trilamsr commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • New policy-matrix workflow gates the tracecore Helm chart against three enterprise-shape admission engines on every chart edit (kind cluster + helm install --dry-run=server).
  • Three matrix rows adopt upstream curated bundles verbatim — no hand-rolled Rego or Kyverno YAML:
    • psa-restricted — Pod Security Admission via namespace labels (KEP-2579, GA in 1.25)
    • kyverno-baseline-restrictedkyverno/policies pod-security baseline + restricted (flipped to Enforce)
    • gatekeeper-restrictedopen-policy-agent/gatekeeper-library PSP constraint templates (7-template minimum that exercises tracecore's pod shape)
  • Bundle versions are pinned in scripts/policy-matrix-smoke.sh so reproductions are byte-identical to CI; bumping is an explicit code change.
  • fail-fast: false — operators want to know which gate failed, not "we failed".

Closes #138.

Test plan

  • CI: all three matrix rows green on this PR (the whole point — chart should pass out of the box).
  • Local repro: POLICY_ENGINE=kyverno bash scripts/policy-matrix-smoke.sh against a pre-provisioned kind cluster admits the chart.
  • Mutation gate: temporarily flip containerSecurityContext.allowPrivilegeEscalation: true in values.yaml → at least one engine row must FAIL (verified locally before merge; reverted in this PR).
  • actionlint .github/workflows/policy-matrix.yml clean (verified locally).
  • zizmor .github/workflows/policy-matrix.yml produces no findings above the repo-wide artipacked baseline (verified locally).
  • shellcheck scripts/policy-matrix-smoke.sh clean (verified locally).
ci: kind-cluster matrix asserts the tracecore Helm chart admits cleanly under PSA-restricted, Kyverno baseline+restricted, and Gatekeeper PSP constraint templates on every chart edit.

Tri Lam added 2 commits May 31, 2026 22:18
Adopt upstream curated policy bundles to gate the tracecore Helm chart
against three enterprise-shape admission engines on every chart edit:

  - PSA-restricted via namespace labels (KEP-2579 GA)
  - kyverno/policies pod-security baseline + restricted (Enforce mode)
  - open-policy-agent/gatekeeper-library PSP constraint templates

The assertion is `helm install --dry-run=server` — the kind API server
runs each engine's admission webhook(s) against the rendered chart and
rejects it on policy violation. No hand-rolled Rego or Kyverno YAML;
bundle versions are pinned in the helper script.

Signed-off-by: Tri Lam <tri@maydow.com>
Per fresh-context review of #289:

- KYVERNO_POLICIES_REF + GATEKEEPER_LIBRARY_REF were 'main'/'master'
  branches; upstream has no tagged releases, so byte-reproducibility
  required commit-SHA pins. Refreshed 2026-05-31:
  - kyverno/policies: 76be98a25d49ae01278a94ecde8f50f9e08577ef
  - gatekeeper-library: 53684fab133fd52d77aa42f632bc2ecd52f0447c

- Heredoc was missing constraint resources for K8sPSPHostFilesystem
  and K8sPSPAllowedUsers — the templates were wait-applied but no
  constraints fired against them. Added both with deny enforcement
  + parameters that gate the chart's existing hostPath-free,
  runAsNonRoot+nonRoot-group posture. Strictly more regression
  coverage; chart already conforms.

Signed-off-by: Tri Lam <tri@maydow.com>
@trilamsr trilamsr enabled auto-merge (squash) June 1, 2026 05:25
@trilamsr trilamsr merged commit 09b83f2 into main Jun 1, 2026
14 checks passed
@trilamsr trilamsr deleted the ci/policy-matrix-kind branch June 1, 2026 05:33
trilamsr added a commit that referenced this pull request Jun 2, 2026
## Summary

Completes the **A+ deliverables** for issue #138 that were deferred when
PR #289 landed. #138 is already closed, so this PR is a follow-up that
closes the remaining gaps from the original task brief, not a reopen.

What #289 shipped (B/A grade):

- `policy-matrix.yml` workflow with three engines (PSA-restricted,
Kyverno baseline+restricted, Gatekeeper PSP constraint templates)
running `helm install --dry-run=server` against the chart with **default
values only**.
- Manual mutation test marked as a test-plan checkbox.
- No README cross-link.

What this PR adds:

1. **Production-values matrix dimension.** Every engine now runs against
**both** the chart defaults **and**
`install/kubernetes/tracecore/values-production.yaml` (the v1.0-rc1
cut-criteria-10 preset — NetworkPolicy + PDB + ServiceMonitor + hardened
gracePeriod + pinned image policy). The original #138 task brief was
explicit: validate production values against real policy engines, not
just defaults. Matrix grows from 3 rows to 6.
2. **`policy-matrix-mutation` job — automated falsifier.** Applies the
existing conftest testdata fixture `bad-allowprivilegeescalation.yaml`
via `kubectl apply --dry-run=server` against a namespace governed by
each engine, then asserts the API server rejects it. Without this gate,
a no-op policy bundle (forgot Enforce mode on Kyverno, forgot to apply
Gatekeeper constraints, forgot the PSA namespace label) would let every
`policy-matrix` row pass green and ship false confidence.
- We bypass `helm` for the mutation because the chart's
`values.schema.json` pins
`containerSecurityContext.allowPrivilegeEscalation: const false` — helm
itself would reject the values before the API server saw the manifest.
The point of the mutation gate is to exercise the **API server's**
policy chain, not the chart schema (the conftest gate in `chart.yml`
already covers that).
3. **Chart README cross-link.** New "Live-cluster policy validation"
subsection under "Pod Security Standard compliance" documents the
workflow, the engine/bundle versions, the mutation gate, and a local
repro recipe for the failure-mode debug path.
4. **Smoke script env knobs.** `VALUES_FILE` (production overlay) and
`SKIP_SMOKE` (engine-only provision, used by the mutation job).

## Test plan

- [x] `actionlint .github/workflows/policy-matrix.yml` — exit 0 (clean).
- [x] `shellcheck scripts/policy-matrix-smoke.sh` — exit 0 (clean).
- [x] `zizmor .github/workflows/policy-matrix.yml` — same `artipacked`
low-confidence baseline as the rest of the repo, no new findings.
- [x] `bash -n scripts/policy-matrix-smoke.sh` — syntax clean.
- [x] Pre-commit (golangci-lint, go vet, go mod verify,
attribute-namespace-check) — clean.
- [ ] CI: all 6 `policy-matrix` rows green (3 engines × {default,
production} values profiles).
- [ ] CI: all 3 `policy-matrix-mutation` rows green (each engine rejects
`bad-allowprivilegeescalation.yaml` with `allowPrivilegeEscalation` in
the denial).

## Root cause (why this is needed)

Issue #138 acceptance criteria included: "passes today; fails on the
daemonsets if they violate a policy". PR #289 implemented the "passes
today" half against defaults. It did **not** implement (a) the
production-preset coverage the rc1 cut now requires or (b) an automated
mutation falsifier — the test plan listed the mutation as a manual
checkbox. Both are load-bearing for the falsifiability claim of the
gate. This PR closes that gap.

## Hard rules respected

- Did **not** enable auto-merge.
- Did **not** modify `chart.yml`'s install-to-Ready M5b gate (touched
zero install/upgrade-job lines).
- Skipped OSS-Fuzz lane per `docs/followups/opportunistic.md`
"premature".

```release-notes
ci(policy-matrix): production-values matrix dimension + automated mutation gate prove the live-cluster policy bundles (PSA-restricted, Kyverno baseline+restricted, Gatekeeper PSP) actually enforce against the v1.0-rc1 production preset.
```

Signed-off-by: Tri Lam <tree@lumalabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[followup] Live-cluster policy-engine validation for example daemonsets

1 participant