fix(chart): flip AppArmor default to opt-in (#492) (refs #481)#493
Merged
Conversation
PR #481 shipped securityHardening.appArmorProfile.enabled: true as the default in install/kubernetes/tracecore/values.yaml. Kubelet rejects pod-create when the appArmorProfile field references a profile the host cannot resolve, which breaks installs on AppArmor-less nodes including the ubuntu-latest GitHub Actions runner image (post-2024) and RHEL/SELinux hosts. install-bench + downstream PRs (#491, #484, #479, #431) all regressed. Flip the default to false in values.yaml so the chart installs on a vanilla cluster. values-production.yaml retains enabled: true since production Linux clusters package the RuntimeDefault profile via containerd / CRI-O. Operators who want the hardening either layer values-production.yaml or set the flag explicitly. Chart mutation tests in .github/workflows/chart.yml updated: - T1, T2 now assert default render emits NEITHER the structured appArmorProfile field nor the legacy container.apparmor.security.beta.kubernetes.io annotation on K8s 1.30 or 1.28 (regression-prevent for #492). - T3, T4 cover the opt-in path (--set enabled=true) and pin the pre-#492 behaviour for production-preset users. - T7, T8 explicitly pass --set enabled=true so the Localhost-profile contract is still exercised under the new default. - Production-preset assertion at line ~509 is untouched and still asserts appArmorProfile.type=RuntimeDefault. README defense-in-depth section clarifies opt-in posture and points operators at values-production.yaml. Verified locally: - helm lint: 0 warnings. - conftest: 52 default / 91 production tests pass. - All 8 mutation gates green. Signed-off-by: Tri Lam <tree@lumalabs.ai>
This was referenced Jun 2, 2026
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
…) (#496) ## Summary `policy-matrix.yml` workflow has been failing on every chart-touching PR (blocked #476, #481, #493) since #475 landed. The chart's production preset (`values-production.yaml`) flips `serviceMonitor.enabled=true`, which renders a `monitoring.coreos.com/v1 ServiceMonitor` resource. Kind clusters don't ship the prometheus-operator CRDs, so `helm install --dry-run=server` exits 1 with: ``` no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" ensure CRDs are installed first ``` ## Root cause Kind ships only the core Kubernetes API set. `monitoring.coreos.com/v1` is supplied by prometheus-operator, which the policy-matrix kind cluster never installs. The chart's `templates/servicemonitor.yaml` is gated by `.Values.serviceMonitor.enabled` — default `false` (chart stays first-install-compatible on bare clusters), but the production preset enables it (kube-prometheus-stack convention). The policy-matrix gate exercises both default and production presets across PSA / Kyverno / Gatekeeper, so the production rows hit the missing CRD on every run. ## Fix Issue #494 recommended option (a) — install the missing CRD prereq. This PR adds a single workflow step after kind cluster spin-up but before the smoke script: ```yaml - name: Install prometheus-operator ServiceMonitor CRD (issue #494) run: | kubectl apply -f \ "https://github.com/prometheus-operator/prometheus-operator/v0.91.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml" kubectl wait --for=condition=established crd/servicemonitors.monitoring.coreos.com --timeout=60s ``` Design choices: - **Slim CRD (ServiceMonitor only)** vs full prometheus-operator bundle. The chart's production preset references no other `monitoring.coreos.com` kinds. Slim install (~700 lines of YAML) avoids pulling Prometheus, Alertmanager, ThanosRuler, PodMonitor, Probe, PrometheusRule we don't exercise. - **Applied to every matrix row** (not just production). A future flip of the default `serviceMonitor.enabled` toggle cannot silently re-break this gate. - **Pinned to `v0.91.0`** (latest stable, published 2026-05-05). Matches the existing `KYVERNO_POLICIES_REF` / `GATEKEEPER_VERSION` pin convention in `scripts/policy-matrix-smoke.sh`. Bumping is a reviewed code change — never tracks `main`. - **`kubectl wait --for=condition=established`** before the helm dry-run so the apiserver has registered the CRD when the chart template reaches the admission chain (avoids a race where the dry-run hits before discovery refreshes). ## Gatekeeper CRD timing Re-audited — `install_gatekeeper()` in `scripts/policy-matrix-smoke.sh` already polls `kubectl get crd ...constraints.gatekeeper.sh` (line 143-149) and the constraint `byPod[*].enforced` field (line 270-276) before the smoke step exits. The `kubectl get constraints -A || true` in the failure-collection step is diagnostic only and already tolerates absent CRDs. No timing fix needed there. ## Why not install-bench / chart.yml - `install-bench.yml` uses `bench/install/tracecore-values.yaml` which doesn't enable serviceMonitor — same failure shape doesn't apply. - `chart.yml`'s `install` and `upgrade` jobs install with default values (`serviceMonitor.enabled=false`); the `render` job's production-preset check is `helm template` only (no cluster), so no API discovery runs. ## Test plan - [x] `actionlint .github/workflows/policy-matrix.yml` — exit 0 - [x] `actionlint` across all `.github/workflows/` — exit 0 - [x] Pre-push hook suite passed locally (golangci-lint, vet, mod verify, attribute-namespace-check, zizmor, doc-check, alert-check, chart-appversion-check, rfc-status-check, slo-rules-check, deprecation-check, no-autoupdate-check, test-flake-audit) - [ ] policy-matrix workflow runs green on this PR — all 6 matrix rows (psa × default, psa × production, kyverno × default, kyverno × production, gatekeeper × default, gatekeeper × production) plus all 3 mutation rows. Closes #494. ```release-notes ci: install prometheus-operator ServiceMonitor CRD in policy-matrix kind cluster so chart-touching PRs no longer fail on the production preset's ServiceMonitor render ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>
4 tasks
trilamsr
added a commit
that referenced
this pull request
Jun 2, 2026
## Summary Removes `.github/workflows/policy-matrix.yml`. Engine-specific admission validation (PSA-restricted × Kyverno × Gatekeeper × default+production) delivered negative ROI at rc1. ## Root cause 4 PRs blocked or chasing this workflow's flakes (#475 introduction, #481, #498, #501). Caught zero real regressions; only its own infra bugs: - ServiceMonitor CRD bootstrap race (#494) - AppArmor host-capability mismatch (#481 → #493) - kubectl wait .status.conditions nil race (#500 → #501) ## Coverage retained (without policy-matrix) - `conftest` — offline PSS-baseline + restricted validation. - `helm lint` — chart structural validation. - `kubeconform` — K8s API conformance. - `kubectl apply --dry-run=server` (chart.yml install/upgrade jobs) — API-level breakage on generic kind cluster. ## What stays in tree - `scripts/policy-matrix-smoke.sh` + Gatekeeper/Kyverno bundle refs — cheap reactivation when GA triggers fire. - `install/kubernetes/tracecore/policies/conftest/**` — offline policy bundle (still active). ## Re-enable triggers (tracked in #502) - GA criterion #1 (third-party audit) requests engine-specific compat validation. - First operator running under Kyverno/Gatekeeper reports admission rot. - CRD-bootstrap pattern stabilises across other workflows. ## Test plan - [x] `make doc-check` exit 0 (post comment-edit in kind-cluster-setup action.yml). - [x] No remaining policy-matrix.yml references in repo (verified by grep). - [x] Pre-commit hooks green (lint/vet/mod-verify/attribute-namespace). - [x] README + install-bench stale refs scrubbed (follow-up commit). ```release-notes ci: defer engine-specific policy-matrix workflow (PSA × Kyverno × Gatekeeper admission validation) to GA. Coverage retained via conftest + helm lint + kubeconform + kubectl apply --dry-run=server. Re-enable tracked in #502. ``` Refs #502 #475 #494 #500. --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR #481 shipped
securityHardening.appArmorProfile.enabled: trueas the default ininstall/kubernetes/tracecore/values.yaml. Kubelet rejects pod-create whenpod.securityContext.appArmorProfilereferences a profile the host cannot resolve, so the chart no longer installs on AppArmor-less nodes — including the ubuntu-latest GitHub Actions runner image (AppArmor dropped post-2024) and RHEL/SELinux production hosts. install-bench regressed; PRs #491, #484, #479, #431 are blocked behind this.This PR implements option (a) from #492: flip the default to opt-in.
values-production.yamlkeepsenabled: truesince AppArmor-equipped Linux clusters (the production target) shipRuntimeDefaultvia containerd / CRI-O.Root cause
Default-on AppArmor in
values.yamlviolated the chart contract that the default render installs on a vanilla cluster. The defense-in-depth posture is correct for production-preset users; it was wrong as the unconditional default. PR #481 didn't add a CI gate to assert "default render installs on a host without AppArmor", so the regression escaped review.Changes
install/kubernetes/tracecore/values.yaml:securityHardening.appArmorProfile.enabled: true->false; in-line guidance reflects opt-in posture and names the failing-host classes (CI runners, RHEL/SELinux).install/kubernetes/tracecore/values-production.yaml: unchanged — production preset still hardens withenabled: true.install/kubernetes/tracecore/README.md: defaults table + Defense-in-depth section explain the opt-in posture, point operators atvalues-production.yamlfor the prior behavior, and link regression(chart): #481 AppArmor default-on breaks install-bench on AppArmor-less hosts #492..github/workflows/chart.yml: AppArmor mutation tests reshuffled from 6 to 8 cases. T1/T2 now assert default render emits no AppArmor field or annotation on K8s 1.30 + 1.28 (regression-prevent for regression(chart): #481 AppArmor default-on breaks install-bench on AppArmor-less hosts #492). T3/T4 cover the opt-in path (--set enabled=true) and pin pre-regression(chart): #481 AppArmor default-on breaks install-bench on AppArmor-less hosts #492 production-preset behavior. T7/T8 explicitly pass--set enabled=trueso the Localhost-profile contract still fires under the new default. Production-preset assertion (appArmorProfile.type=RuntimeDefaultfromvalues-production.yaml) is untouched.Backward compatibility
Behavior change for default-values users. Operators who installed via
helm install ... install/kubernetes/tracecore(no production preset) and depended on the AppArmor hardening that #481 added will see it disappear on next upgrade. Two ways to keep the prior behavior:Operators who relied on the chart's documented default (#481 was three days old; opt-in is the chart-hygiene norm for defense-in-depth knobs) get a quieter install on AppArmor-less hosts.
Test plan
helm lint install/kubernetes/tracecore— 0 warnings.helm template ... --kube-version 1.30.0 --show-only templates/daemonset.yaml | grep -i apparmor— empty (default render has no AppArmor).--kube-version 1.28.0— empty.helm template ... --values values-production.yaml --kube-version 1.30.0— rendersappArmorProfile.type: RuntimeDefault(production preset unchanged).helm template ... --set securityHardening.appArmorProfile.enabled=true --kube-version 1.30.0— renders structured field (opt-in works)..github/workflows/chart.ymlAppArmor step run locally and pass.chart.yml.Refs
Closes #492 (refs #481).