feat(chart): version-gated AppArmor RuntimeDefault on DaemonSet (M5b) by trilamsr · Pull Request #481 · TraceCoreAI/tracecore

trilamsr · 2026-06-02T04:35:44Z

Summary

Adds version-gated appArmorProfile: { type: RuntimeDefault } on the DaemonSet pod via new securityHardening.appArmorProfile values key (default enabled: true). On Kubernetes 1.30+ emits the GA structured pod.securityContext.appArmorProfile field; on 1.28 / 1.29 (chart kubeVersion floor >=1.28.0-0) falls back to the legacy container.apparmor.security.beta.kubernetes.io/<container>: runtime/default pod annotation. Auto-selected via semverCompare against .Capabilities.KubeVersion.Version; operators do not pick the form.
Cross-linked from install/kubernetes/tracecore/README.md §"Defense-in-depth above restricted-PSS" and a new STRIDE elevation row under docs/threat-model.md §B1 (host filesystem reads).
New CI step in .github/workflows/chart.yml exercises six falsifiers: default render @ 1.30 (structured field, no annotation), default render @ 1.28 (legacy annotation, no structured field), toggle-off @ 1.30 and @ 1.28 (neither code path renders), type: Localhost without localhostProfile (fails closed with operator-visible error), and type: Localhost with profile (custom path renders through). The production-preset CI step also asserts the structured field at the embedded helm kubeVersion. Closes the appArmor item in docs/followups/M5b.md.

Root cause

Restricted PSS permits an undefined AppArmor profile, so the chart was compliant today — but stricter-local-policy clusters and adopter security checklists flag the absence. The chart didn't pin a defense-in-depth layer that costs nothing on Linux nodes. Shipped proactively (sibling to L31 production-preset hardening) rather than waiting for the M5b follow-up trigger ("kubeVersion floor moves to >=1.30, or first adopter asks").

Local verification

chart: version-gated AppArmor RuntimeDefault pinned on the DaemonSet via the new
       `securityHardening.appArmorProfile.enabled` toggle (default true).
       Kubernetes 1.30+ renders the GA structured
       `pod.securityContext.appArmorProfile` field; 1.28 / 1.29 renders the
       legacy `container.apparmor.security.beta.kubernetes.io/<container>`
       pod annotation. Auto-selected per cluster kubeVersion; opt out via
       `--set securityHardening.appArmorProfile.enabled=false`.

Ran end-to-end against helm v4.2.0 + conftest dev / OPA 1.15.2 locally:

helm lint install/kubernetes/tracecore + helm lint -f values-production.yaml — both zero WARNINGs.
helm template --kube-version 1.30.0 and --kube-version 1.28.0 for both default values and the production preset — all four exit 0.
conftest test --policy policies/conftest/tracecore.rego against all four renders — 52/52 + 52/52 + 91/91 + 91/91 tests passed.
All 12 bad-*.yaml fixtures still denied; good-baseline.yaml + good-sys-ptrace.yaml still passed.
Mutation sweep: toggle off → neither code path renders; type: Localhost without localhostProfile → render fails closed with localhostProfile to be set message; type: Localhost with profile → structured field carries custom path.
values.schema.json rejects type: BadValue with the upstream JSON-schema enum error.
pre-commit hooks (golangci-lint, go vet, go mod verify, attribute-namespace-check) and pre-push hooks (doc-check, no-autoupdate-check) all green.

Test plan

CI chart / render step passes the new M5b appArmor falsifier.
CI chart / render production-preset assertion lights up (structured field at embedded helm kubeVersion).
CI chart / install (kind) rolls clean — kind clusters in CI run K8s ≥1.30, so the structured field path is the one exercised at install time.

Cross-links

docs/followups/M5b.md appArmor item — flipped to [x] with link back to this PR's chart README + threat-model anchors.
docs/threat-model.md §B1 — gains an Elevation STRIDE row naming AppArmor RuntimeDefault as the defense-in-depth above restricted-PSS for the /dev/kmsg + journald hostPath surface.
install/kubernetes/tracecore/README.md §Defense-in-depth above restricted-PSS — operator-facing explanation of which form renders per kubeVersion.

Grade

A+: TDD-verified six-way falsifier sweep covers both code paths and the failure-closed path. Defaults-on with explicit opt-out; sibling-style with tls/networkPolicy/podDisruptionBudget toggle conventions in the existing chart. Threat-model row, README §security cross-link, schema enum guard, and production-preset duplicate-render assertion all wired. Closes the M5b appArmor follow-up without breaking the default render on any supported kubeVersion.

Restricted PSS permits an undefined AppArmor profile, so the chart was compliant today; explicit RuntimeDefault narrows the syscall surface a compromised receiver could reach against the read-only /dev/kmsg + journald hostPath mounts and removes one item from adopter security checklists. Triggered proactively (sibling to L31 production-preset hardening) rather than waiting for the M5b follow-up trigger ("kubeVersion floor moves to >=1.30, or first adopter asks"). Version-gating: the structured pod.securityContext.appArmorProfile field is GA in K8s 1.30+; on 1.28 / 1.29 (chart kubeVersion floor >=1.28.0-0) the legacy `container.apparmor.security.beta.kubernetes.io/<container>` pod annotation carries the same intent. The template auto-selects via semverCompare against .Capabilities.KubeVersion.Version; operators do not pick the form. Toggle securityHardening.appArmorProfile.enabled (default true) opts out; type=Localhost + localhostProfile wires a node-preloaded custom profile (fails closed if profile name missing). Verified locally with helm v4.2.0 + conftest dev/OPA 1.15.2: - helm lint default + production: zero WARNINGs - helm template default @ 1.30 / 1.28: renders structured / legacy - helm template production @ 1.30 / 1.28: renders structured / legacy - conftest default @ 1.30 / 1.28: 52/52 passed each - conftest production @ 1.30 / 1.28: 91/91 passed each - toggle off @ 1.30 / 1.28: neither code path renders - type=Localhost without profile: fails closed with operator-visible error - values.schema.json rejects bad type values Cross-links: docs/threat-model.md §B1 (elevation row gains AppArmor mitigation), install/kubernetes/tracecore/README.md §Defense-in-depth. Closes docs/followups/M5b.md appArmor item. Signed-off-by: Tri Lam <tree@lumalabs.ai>

Doc-check pre-push gate rejected the section-banner added in the prior commit (`# --- defense-in-depth: AppArmor RuntimeDefault ... ---`) because banner comments rot in long-lived files per STYLE.md. Existing banners in the file are grandfathered; new lines are not. Substance is unchanged — the rationale prose remains. Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr · 2026-06-02T04:42:21Z

Independent Adversarial Review: PR #481

Grade: A (→ A+ after one doc fix)

TDD verified. Six-way falsifier sweep covers both code paths (K8s 1.30+ structured field, 1.28/29 legacy annotation), both toggle states (enabled/disabled), failure-closed path (Localhost without localhostProfile), and custom profile path. semverCompare(">=1.30.0-0", version) correctly gates pre-release and GA 1.30 builds. Schema enum guard on type field. Threat-model STRIDE row added at §B1 with defense-in-depth rationale. Production preset hardened by default; CI asserts structured field at embedded helm kubeVersion (≥1.30).

Findings

CI documentation bug: .github/workflows/chart.yml line ~270 says "Five mutation checks bound the contract:" but the code implements 6 numbered test cases (items 1–6). The 6th test (Localhost with custom profile) is correct; the comment header is just stale. Fix required before merge: change "Five" → "Six".

Optional simplification: values-production.yaml repeats the feature rationale from values.yaml (both ~18 and ~13 lines respectively). Since doc-check already rejected section banners per STYLE.md, consider trimming values-production to 5 lines (just M5b reference + threat-model anchor). Not blocking.

Cross-links

✓ M5b.md appArmor checkbox flipped to [x]
✓ threat-model.md §B1 Elevation row names AppArmor RuntimeDefault as defense-in-depth above restricted-PSS
✓ README.md §Defense-in-depth explains version-gating (fails-open on legacy annotation path, fails-closed on structured field path)
✓ Production preset defaults enabled=true with sibling-style opt-out pattern (matches tls, networkPolicy, podDisruptionBudget toggles)

Logic

semverCompare(">=1.30.0-0") correctly covers 1.30.0, 1.30.0-beta, 1.30.0+kind
Annotation (K8s 1.28/29) and structured field (K8s 1.30+) paths mutually exclusive
Type field defaults: template uses | default "RuntimeDefault" + values.yaml explicit
Fail-closed: Localhost without localhostProfile → helm fails template with operator-visible error
Container name hardcoded to tracecore (matches pod spec)
Annotation value lowercase runtime/default (per spec), struct type CamelCase RuntimeDefault (per API)

After the CI comment fix: approve for auto-merge.

Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr · 2026-06-02T04:48:28Z

Nit fix: 'Five' → 'Six' mutation checks. Re-requesting review for A+.

## Summary PR #481 shipped `securityHardening.appArmorProfile.enabled: true` as the default in `install/kubernetes/tracecore/values.yaml`. Kubelet rejects pod-create when `pod.securityContext.appArmorProfile` references a profile the host cannot resolve, so the chart no longer installs on AppArmor-less nodes — including the ubuntu-latest GitHub Actions runner image (AppArmor dropped post-2024) and RHEL/SELinux production hosts. install-bench regressed; PRs #491, #484, #479, #431 are blocked behind this. This PR implements option (a) from #492: flip the default to opt-in. `values-production.yaml` keeps `enabled: true` since AppArmor-equipped Linux clusters (the production target) ship `RuntimeDefault` via containerd / CRI-O. ## Root cause Default-on AppArmor in `values.yaml` violated the chart contract that the default render installs on a vanilla cluster. The defense-in-depth posture is correct for production-preset users; it was wrong as the unconditional default. PR #481 didn't add a CI gate to assert "default render installs on a host without AppArmor", so the regression escaped review. ## Changes - `install/kubernetes/tracecore/values.yaml`: `securityHardening.appArmorProfile.enabled: true` -> `false`; in-line guidance reflects opt-in posture and names the failing-host classes (CI runners, RHEL/SELinux). - `install/kubernetes/tracecore/values-production.yaml`: unchanged — production preset still hardens with `enabled: true`. - `install/kubernetes/tracecore/README.md`: defaults table + Defense-in-depth section explain the opt-in posture, point operators at `values-production.yaml` for the prior behavior, and link #492. - `.github/workflows/chart.yml`: AppArmor mutation tests reshuffled from 6 to 8 cases. T1/T2 now assert default render emits **no** AppArmor field or annotation on K8s 1.30 + 1.28 (regression-prevent for #492). T3/T4 cover the opt-in path (`--set enabled=true`) and pin pre-#492 production-preset behavior. T7/T8 explicitly pass `--set enabled=true` so the Localhost-profile contract still fires under the new default. Production-preset assertion (`appArmorProfile.type=RuntimeDefault` from `values-production.yaml`) is untouched. ## Backward compatibility **Behavior change for default-values users.** Operators who installed via `helm install ... install/kubernetes/tracecore` (no production preset) and depended on the AppArmor hardening that #481 added will see it disappear on next upgrade. Two ways to keep the prior behavior: ```bash # Option 1 — adopt the production preset (recommended). helm upgrade demo install/kubernetes/tracecore \ --values install/kubernetes/tracecore/values-production.yaml # Option 2 — keep your current values, just flip the flag. helm upgrade demo install/kubernetes/tracecore \ --set securityHardening.appArmorProfile.enabled=true ``` Operators who relied on the chart's documented default (#481 was three days old; opt-in is the chart-hygiene norm for defense-in-depth knobs) get a quieter install on AppArmor-less hosts. ## Test plan - [x] `helm lint install/kubernetes/tracecore` — 0 warnings. - [x] `helm template ... --kube-version 1.30.0 --show-only templates/daemonset.yaml | grep -i apparmor` — empty (default render has no AppArmor). - [x] Same with `--kube-version 1.28.0` — empty. - [x] `helm template ... --values values-production.yaml --kube-version 1.30.0` — renders `appArmorProfile.type: RuntimeDefault` (production preset unchanged). - [x] `helm template ... --set securityHardening.appArmorProfile.enabled=true --kube-version 1.30.0` — renders structured field (opt-in works). - [x] All 8 mutation tests in `.github/workflows/chart.yml` AppArmor step run locally and pass. - [x] conftest: 52/52 default render, 91/91 production render. - [x] actionlint: 0 issues on `chart.yml`. - [x] Pre-commit (golangci-lint, vet, attribute-namespace-check, test-flake-audit) — all green. - [ ] CI: chart workflow turns green on this PR. - [ ] CI: install-bench turns green on this PR (and unblocks #491 / #484 / #479 / #431 once merged). ## Refs Closes #492 (refs #481). ```release-notes **Breaking (default-values users only).** `securityHardening.appArmorProfile.enabled` now defaults to `false` in `values.yaml` so the chart installs on AppArmor-less nodes (CI runners, RHEL/SELinux). The `values-production.yaml` preset still ships `enabled: true` — production Linux clusters that package the `RuntimeDefault` profile (every distro with containerd / CRI-O) keep the hardening when they layer that preset. Operators upgrading default-values installs who want the prior behavior can either adopt `values-production.yaml` or set `--set securityHardening.appArmorProfile.enabled=true`. Fixes the install-bench regression introduced in #481. ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>

…) (#496) ## Summary `policy-matrix.yml` workflow has been failing on every chart-touching PR (blocked #476, #481, #493) since #475 landed. The chart's production preset (`values-production.yaml`) flips `serviceMonitor.enabled=true`, which renders a `monitoring.coreos.com/v1 ServiceMonitor` resource. Kind clusters don't ship the prometheus-operator CRDs, so `helm install --dry-run=server` exits 1 with: ``` no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" ensure CRDs are installed first ``` ## Root cause Kind ships only the core Kubernetes API set. `monitoring.coreos.com/v1` is supplied by prometheus-operator, which the policy-matrix kind cluster never installs. The chart's `templates/servicemonitor.yaml` is gated by `.Values.serviceMonitor.enabled` — default `false` (chart stays first-install-compatible on bare clusters), but the production preset enables it (kube-prometheus-stack convention). The policy-matrix gate exercises both default and production presets across PSA / Kyverno / Gatekeeper, so the production rows hit the missing CRD on every run. ## Fix Issue #494 recommended option (a) — install the missing CRD prereq. This PR adds a single workflow step after kind cluster spin-up but before the smoke script: ```yaml - name: Install prometheus-operator ServiceMonitor CRD (issue #494) run: | kubectl apply -f \ "https://github.com/prometheus-operator/prometheus-operator/v0.91.0/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml" kubectl wait --for=condition=established crd/servicemonitors.monitoring.coreos.com --timeout=60s ``` Design choices: - **Slim CRD (ServiceMonitor only)** vs full prometheus-operator bundle. The chart's production preset references no other `monitoring.coreos.com` kinds. Slim install (~700 lines of YAML) avoids pulling Prometheus, Alertmanager, ThanosRuler, PodMonitor, Probe, PrometheusRule we don't exercise. - **Applied to every matrix row** (not just production). A future flip of the default `serviceMonitor.enabled` toggle cannot silently re-break this gate. - **Pinned to `v0.91.0`** (latest stable, published 2026-05-05). Matches the existing `KYVERNO_POLICIES_REF` / `GATEKEEPER_VERSION` pin convention in `scripts/policy-matrix-smoke.sh`. Bumping is a reviewed code change — never tracks `main`. - **`kubectl wait --for=condition=established`** before the helm dry-run so the apiserver has registered the CRD when the chart template reaches the admission chain (avoids a race where the dry-run hits before discovery refreshes). ## Gatekeeper CRD timing Re-audited — `install_gatekeeper()` in `scripts/policy-matrix-smoke.sh` already polls `kubectl get crd ...constraints.gatekeeper.sh` (line 143-149) and the constraint `byPod[*].enforced` field (line 270-276) before the smoke step exits. The `kubectl get constraints -A || true` in the failure-collection step is diagnostic only and already tolerates absent CRDs. No timing fix needed there. ## Why not install-bench / chart.yml - `install-bench.yml` uses `bench/install/tracecore-values.yaml` which doesn't enable serviceMonitor — same failure shape doesn't apply. - `chart.yml`'s `install` and `upgrade` jobs install with default values (`serviceMonitor.enabled=false`); the `render` job's production-preset check is `helm template` only (no cluster), so no API discovery runs. ## Test plan - [x] `actionlint .github/workflows/policy-matrix.yml` — exit 0 - [x] `actionlint` across all `.github/workflows/` — exit 0 - [x] Pre-push hook suite passed locally (golangci-lint, vet, mod verify, attribute-namespace-check, zizmor, doc-check, alert-check, chart-appversion-check, rfc-status-check, slo-rules-check, deprecation-check, no-autoupdate-check, test-flake-audit) - [ ] policy-matrix workflow runs green on this PR — all 6 matrix rows (psa × default, psa × production, kyverno × default, kyverno × production, gatekeeper × default, gatekeeper × production) plus all 3 mutation rows. Closes #494. ```release-notes ci: install prometheus-operator ServiceMonitor CRD in policy-matrix kind cluster so chart-touching PRs no longer fail on the production preset's ServiceMonitor render ``` Signed-off-by: Tri Lam <tree@lumalabs.ai>

## Summary Removes `.github/workflows/policy-matrix.yml`. Engine-specific admission validation (PSA-restricted × Kyverno × Gatekeeper × default+production) delivered negative ROI at rc1. ## Root cause 4 PRs blocked or chasing this workflow's flakes (#475 introduction, #481, #498, #501). Caught zero real regressions; only its own infra bugs: - ServiceMonitor CRD bootstrap race (#494) - AppArmor host-capability mismatch (#481 → #493) - kubectl wait .status.conditions nil race (#500 → #501) ## Coverage retained (without policy-matrix) - `conftest` — offline PSS-baseline + restricted validation. - `helm lint` — chart structural validation. - `kubeconform` — K8s API conformance. - `kubectl apply --dry-run=server` (chart.yml install/upgrade jobs) — API-level breakage on generic kind cluster. ## What stays in tree - `scripts/policy-matrix-smoke.sh` + Gatekeeper/Kyverno bundle refs — cheap reactivation when GA triggers fire. - `install/kubernetes/tracecore/policies/conftest/**` — offline policy bundle (still active). ## Re-enable triggers (tracked in #502) - GA criterion #1 (third-party audit) requests engine-specific compat validation. - First operator running under Kyverno/Gatekeeper reports admission rot. - CRD-bootstrap pattern stabilises across other workflows. ## Test plan - [x] `make doc-check` exit 0 (post comment-edit in kind-cluster-setup action.yml). - [x] No remaining policy-matrix.yml references in repo (verified by grep). - [x] Pre-commit hooks green (lint/vet/mod-verify/attribute-namespace). - [x] README + install-bench stale refs scrubbed (follow-up commit). ```release-notes ci: defer engine-specific policy-matrix workflow (PSA × Kyverno × Gatekeeper admission validation) to GA. Coverage retained via conftest + helm lint + kubeconform + kubectl apply --dry-run=server. Re-enable tracked in #502. ``` Refs #502 #475 #494 #500. --------- Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr added 2 commits June 1, 2026 21:23

docs(chart): correct mutation-check count 5→6

3b421d5

Signed-off-by: Tri Lam <tree@lumalabs.ai>

trilamsr enabled auto-merge (squash) June 2, 2026 04:49

trilamsr merged commit 7c23bcd into main Jun 2, 2026
17 of 25 checks passed

trilamsr deleted the feat/m5b-chart-apparmor-profile branch June 2, 2026 04:57

This was referenced Jun 2, 2026

audit(wave-2026-06-02): autonomous-wave cross-cut review #488

Closed

regression(chart): #481 AppArmor default-on breaks install-bench on AppArmor-less hosts #492

Closed

fix(chart): flip AppArmor default to opt-in (#492) (refs #481) #493

Merged

This was referenced Jun 2, 2026

ci(policy-matrix): install ServiceMonitor + Gatekeeper CRDs in kind setup (regression since #475) #494

Closed

ci(policy-matrix): install ServiceMonitor CRD before helm dry-run (#494) #496

Merged

This was referenced Jun 2, 2026

ci(policy-matrix): re-enable when GA gates request engine-specific validation #502

Closed

chore: defer engine-specific policy-matrix workflow to GA #503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(chart): version-gated AppArmor RuntimeDefault on DaemonSet (M5b)#481

feat(chart): version-gated AppArmor RuntimeDefault on DaemonSet (M5b)#481
trilamsr merged 3 commits into
mainfrom
feat/m5b-chart-apparmor-profile

trilamsr commented Jun 2, 2026

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

trilamsr commented Jun 2, 2026

Summary

Root cause

Local verification

Test plan

Cross-links

Grade

Uh oh!

trilamsr commented Jun 2, 2026

Independent Adversarial Review: PR #481

Grade: A (→ A+ after one doc fix)

Findings

Cross-links

Logic

Uh oh!

trilamsr commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant