Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions docs/followups/M5b.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,18 @@ budget, etc.).
Items surfaced during PR #29 review that were explicitly held out of
M5b scope. Order is roughly highest-leverage first.

- [ ] **NetworkPolicy template.** Ship a `templates/networkpolicy.yaml`
gated by `networkPolicy.enabled` (default `false`) that allows
egress to operator-configured exporter endpoints + kubelet probe
traffic. Zero-trust adopters add their own today; the chart
should ship a baseline they can opt into. *Trigger:* the first
adopter request, or when an OTLP exporter receiver lands (M10+)
and the egress shape stops being guess-work.
- [x] **NetworkPolicy template.** Shipped as
`install/kubernetes/tracecore/templates/networkpolicy.yaml`,
gated by `networkPolicy.enabled` (default `false`). The
baseline includes scrape-in (telemetry + health ports),
operator-configured OTLP egress, DNS egress, AND kubelet probe
ingress (`networkPolicy.kubeletProbes.*`; port-scoped
`ipBlock` rule so liveness/readiness probes from the node IP
survive the default-deny posture). Production preset
(`values-production.yaml`) flips `enabled: true`. Cross-linked
from `docs/threat-model.md` §6.G + chart README §security.
Initial scrape-in + OTLP-out scope shipped under #301; the
kubelet-probe ingress rule is M5b chart opportunistic #1.
- [ ] **Image scanning + SBOM gate on the chart container image.** The
`install/kubernetes/tracecore/Dockerfile` is reference-only for
the kind-install CI workflow; M3 owns the canonical release
Expand Down
22 changes: 20 additions & 2 deletions install/kubernetes/tracecore/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,9 @@ automatically; PersistentVolumeClaims (if any are added via the
| `networkPolicy.allowedEgressEndpoints` | list | `[]` | `{cidr, port, protocol, except?}` entries for OTLP-out. Operator declares so the policy is auditable. |
| `networkPolicy.dnsNamespaceSelector` | map | `{kubernetes.io/metadata.name: kube-system}` | DNS resolver namespace label. Override if your DNS lives elsewhere. |
| `networkPolicy.dnsPodSelector` | map | `{k8s-app: kube-dns}` | DNS resolver pod label. Override for non-coredns/kube-dns setups. |
| `networkPolicy.kubeletProbes.enabled` | bool | `true` | Carve an `ipBlock` ingress rule on the `health` port so kubelet liveness/readiness probes survive the default-deny baseline. Probes originate from the node IP (host network), which is NOT selectable via namespaceSelector / podSelector; without this rule, every pod flips NotReady within one `failureThreshold` window (M5b chart opportunistic #1). Disable only when a CNI-specific rule already covers host-network probe traffic (Cilium `fromEntities: [host, remote-node]`, Calico host-endpoint selector). |
| `networkPolicy.kubeletProbes.cidr` | string | `0.0.0.0/0` | Source CIDR for the probe rule. Default permissive because kube-apiserver does not expose a cluster-wide node-CIDR primitive a chart can template against; the rule is L4-scoped to the health port so the surface stays narrow. Tighten to the cluster node CIDR if it is fixed and known. |
| `networkPolicy.kubeletProbes.except` | list | `[]` | CIDRs to exclude from `kubeletProbes.cidr` (NetworkPolicy `ipBlock.except` semantics). |
| `tls.enabled` | bool | `false` | Mount a `kubernetes.io/tls` Secret (typically [cert-manager](../../../docs/integrations/cert-manager-mtls.md)-issued) into the DaemonSet at `tls.mountPath`. Operators wire `tls.cert_file` / `tls.key_file` / `tls.ca_file` (or `client_ca_file`) into the free-form `config:` block referencing the projected file literals; the chart does NOT inject `tls:` clauses (#301). |
| `tls.certificateRef` | string | `""` | Name of the `kubernetes.io/tls` Secret in `.Values.namespace`. Required when `tls.enabled` is true; the helm-template render fails closed with a clear error if empty. |
| `tls.mountPath` | string | `/etc/tracecore/tls` | Absolute directory the Secret projects into. Schema-validated `^/`. Path literals across `docs/integrations/` assume the default. |
Expand Down Expand Up @@ -394,12 +397,27 @@ scrape_configs:
Verify with `promtool check rules install/kubernetes/tracecore/dashboards/slo-rules.yaml`
and `kubectl -n tracecore-system port-forward svc/tracecore 8888:8888 && curl localhost:8888/metrics`.

**Default-deny NetworkPolicy with allow-list for scrape + OTLP-out (issue #301):**
**Default-deny NetworkPolicy with allow-list for scrape + OTLP-out + kubelet probes (issues #301, M5b chart opportunistic #1):**

The chart ships an opt-in `NetworkPolicy` template that isolates the
collector pods at L3/L4. Off by default for CNI compatibility (Flannel
without canal ignores NetworkPolicy and rendering one would mislead).
Enable on Calico / Cilium / kube-router clusters.
Enable on Calico / Cilium / kube-router clusters. See
[`docs/threat-model.md`](../../../docs/threat-model.md) §6.G for the
audit-RFP scope this template satisfies (network-surface inventory +
default-deny verification).

The policy carves three rule families back open against the
`policyTypes: [Ingress, Egress]` baseline:
- **Scrape-in** — Prometheus / ServiceMonitor traffic to the
`telemetry` + `health` ports, restricted by
`networkPolicy.allowedScrapers` (default: same-namespace).
- **Kubelet probes** — liveness + readiness probes from the node IP
to the `health` port. Probes originate from host-network, NOT from
a selectable namespace/pod, so the rule uses `ipBlock`
(default `0.0.0.0/0`, L4-scoped to the health port).
- **Egress** — DNS to the cluster resolver + OTLP-out to
`networkPolicy.allowedEgressEndpoints`.

```yaml
# networkpolicy-overlay.yaml
Expand Down
64 changes: 51 additions & 13 deletions install/kubernetes/tracecore/templates/networkpolicy.yaml
Original file line number Diff line number Diff line change
@@ -1,22 +1,37 @@
{{/*
Default-deny NetworkPolicy for the tracecore DaemonSet (issue #301).
Default-deny NetworkPolicy for the tracecore DaemonSet (issues #301,
M5b chart opportunistic #1).

The policy isolates the collector pods at L3/L4 so a compromised
receiver / dependency cannot exfiltrate to arbitrary network
destinations and so a hostile in-cluster workload cannot speak OTLP
at the collector's listener. Two allow-list rules carve the minimum
surface back open:
at the collector's listener. The deny-all baseline comes from
`policyTypes: [Ingress, Egress]` with no matching rules; the three
rule families below carve the minimum surface back open:

Egress — OTLP-out to the operator-declared exporter endpoint
(host + port) and DNS to the cluster DNS resolver.
Without these the exporter cannot resolve / connect, and
the DaemonSet renders inert.
Ingress — scrape-in to the telemetry port. Source is selectable;
the default (`{}`) means any pod in the same namespace
can scrape, which matches a Prometheus running alongside
tracecore. Tighten via `networkPolicy.allowedScrapers` to
a namespaceSelector / podSelector for kube-prometheus
installs that pin the scraper namespace.
Egress — OTLP-out to the operator-declared exporter endpoint
(host + port) and DNS to the cluster DNS resolver.
Without these the exporter cannot resolve / connect, and
the DaemonSet renders inert.
Ingress — scrape-in to the telemetry port. Source is selectable;
the default (`{}`) means any pod in the same namespace
can scrape, which matches a Prometheus running alongside
tracecore. Tighten via `networkPolicy.allowedScrapers` to
a namespaceSelector / podSelector for kube-prometheus
installs that pin the scraper namespace.
Probes — kubelet liveness/readiness probes originate from the
node IP (host network), NOT from any pod selectable by
namespaceSelector / podSelector. NetworkPolicy v1 matches
kubelet probes only via `ipBlock` peers. The chart
carves a port-scoped `ipBlock` rule for the `health`
port (chart default :13133) so probes survive an
otherwise-default-deny posture. Default
`networkPolicy.kubeletProbes.cidr` is `0.0.0.0/0` —
permissive on source IP, but rule-scoped to the
healthcheckextension port so the surface stays narrow.
Operators that know their node CIDR (e.g. a fixed
control-plane + worker pool) tighten it to the cluster
node CIDR.

RFC posture: NetworkPolicy is a Kubernetes-native primitive; the
chart's role is to render a known-correct policy aligned with the
Expand All @@ -26,6 +41,14 @@
set `networkPolicy.enabled: false` — rendering a policy on a
CNI that ignores it is misleading.

Cross-references:
- docs/threat-model.md §6.G — network-surface audit scope
(listener inventory + default-deny verification).
- docs/followups/M5b.md "NetworkPolicy template" — opportunistic
deferral that introduced the kubelet-probe ingress rule.
- install/kubernetes/tracecore/README.md §"NetworkPolicy" —
operator-facing values walkthrough.

Default is OFF (`networkPolicy.enabled: false`) so the chart's
first-install path stays compatible with bare-CNI clusters. Operators
on Calico / Cilium / kube-router / canal-flannel enable explicitly.
Expand Down Expand Up @@ -63,6 +86,21 @@ spec:
protocol: TCP
- port: health
protocol: TCP
{{- if .Values.networkPolicy.kubeletProbes.enabled }}
# kubelet probes: node-IP source → `ipBlock` is the only
# NetworkPolicy v1 peer that matches (see header). Port-scoped
# to `health` (:13133) so the default `0.0.0.0/0` CIDR stays
# narrow at L4 — telemetry + OTLP listeners remain locked.
- from:
- ipBlock:
cidr: {{ .Values.networkPolicy.kubeletProbes.cidr | default "0.0.0.0/0" }}
{{- with .Values.networkPolicy.kubeletProbes.except }}
except: {{- toYaml . | nindent 14 }}
{{- end }}
ports:
- port: health
protocol: TCP
{{- end }}
{{- with .Values.networkPolicy.extraIngress }}
{{- toYaml . | nindent 4 }}
{{- end }}
Expand Down
15 changes: 15 additions & 0 deletions install/kubernetes/tracecore/values-production.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,21 @@ networkPolicy:
kubernetes.io/metadata.name: kube-system
dnsPodSelector:
k8s-app: kube-dns
# kubeletProbes (M5b chart opportunistic #1): kubelet liveness +
# readiness probes originate from the node IP, which is NOT
# selectable via namespaceSelector. The chart carves an `ipBlock`
# rule on the `health` port so probes survive the default-deny
# baseline. Production posture leaves the source CIDR permissive
# (`0.0.0.0/0`) because most clusters do not expose a fixed
# node-CIDR allowlist primitive; the rule is L4-scoped to the
# healthcheckextension port (chart default :13133) so the
# surface stays narrow. Tighten in the operator's overlay if
# the node CIDR is fixed and known (e.g. on-prem fleets with a
# single /24 control-plane subnet).
kubeletProbes:
enabled: true
cidr: 0.0.0.0/0
except: []

# --- prometheus wiring: ServiceMonitor on, annotation-scrape off ----------
#
Expand Down
12 changes: 11 additions & 1 deletion install/kubernetes/tracecore/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,17 @@
"extraIngress": { "type": "array", "items": { "type": "object" } },
"extraEgress": { "type": "array", "items": { "type": "object" } },
"dnsNamespaceSelector": { "type": "object" },
"dnsPodSelector": { "type": "object" }
"dnsPodSelector": { "type": "object" },
"kubeletProbes": {
"type": "object",
"additionalProperties": false,
"required": ["enabled"],
"properties": {
"enabled": { "type": "boolean" },
"cidr": { "type": "string", "minLength": 1 },
"except": { "type": "array", "items": { "type": "string" } }
}
}
}
},

Expand Down
33 changes: 31 additions & 2 deletions install/kubernetes/tracecore/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -301,10 +301,14 @@ probes:
periodSeconds: 10
failureThreshold: 4

# NetworkPolicy (issue #301). Default-deny ingress/egress at L3/L4
# with two narrow allow-list rules carved back open:
# NetworkPolicy (issue #301, M5b chart opportunistic #1). Default-deny
# ingress/egress at L3/L4 with three narrow allow-list rules carved
# back open:
# - Ingress on the telemetry + health ports, restricted to the
# selected scrape sources (`allowedScrapers`).
# - Ingress on the health port from the node IP (kubelet probes;
# `kubeletProbes` — port-scoped ipBlock since probes come from
# host-network, not a selectable namespace/pod).
# - Egress to cluster DNS + the operator-declared OTLP exporter
# endpoints (`allowedEgressEndpoints`).
#
Expand All @@ -330,6 +334,27 @@ probes:
# dnsNamespaceSelector / dnsPodSelector: how the chart finds the
# cluster DNS resolver for egress. Defaults match coredns/kube-dns in
# kube-system; override on clusters where DNS lives elsewhere.
#
# kubeletProbes (M5b chart opportunistic #1): kubelet liveness +
# readiness probes originate from the node IP, which is NOT selectable
# via namespaceSelector / podSelector. The chart carves a port-scoped
# `ipBlock` rule for the `health` port so probes survive an
# otherwise-default-deny posture. Default ON because a
# `networkPolicy.enabled: true` install without this rule flips every
# pod NotReady within one failureThreshold window — the chart would
# render its own DaemonSet inoperable.
#
# enabled — toggle the kubelet-probe ingress rule. Set false only
# if a CNI-specific rule already covers host-network
# probe traffic (Cilium `fromEntities: [host, remote-node]`,
# Calico `Selector: 'has(projectcalico.org/orchestrator)'`).
# cidr — source CIDR for the probe rule. Default `0.0.0.0/0`
# because kube-apiserver does not expose a cluster-wide
# node-CIDR primitive a chart can template against, and
# the rule is L4-scoped to the health port so the surface
# stays narrow. Operators with a fixed node CIDR
# (control-plane + worker pool) tighten this to that range.
# except — list of CIDRs to exclude from `cidr`. Empty by default.
networkPolicy:
enabled: false
allowedScrapers: []
Expand All @@ -340,6 +365,10 @@ networkPolicy:
kubernetes.io/metadata.name: kube-system
dnsPodSelector:
k8s-app: kube-dns
kubeletProbes:
enabled: true
cidr: 0.0.0.0/0
except: []

# mTLS material wiring (issue #301). When enabled, the chart mounts a
# Kubernetes Secret (typically cert-manager-issued via a `Certificate`
Expand Down
Loading