Skip to content

[rc1-prep] commission v1.0 GA security audit (RFP-ready) #336

Description

@trilamsr

Summary

Commission tracecore's first external security audit before cutting
v1.0 GA. Threat model landed in docs/threat-model.md (commit
cb5b430). This issue tracks (a) RFP-ready scope to send to a paid
auditor (Trail of Bits, NCC Group, Cure53, Doyensec, similar), and
(b) the in-repo prep work that gates handing the engagement off.

Driver: project_v1_rc1_target Tier 2 prereq + NORTHSTARS.md §O3
"Trust & supply chain". The first audit is the discipline that turns
the SLSA / cosign / SBOM / SECURITY.md stack from claims into
evidence.

Audit RFP scope (~17 person-days, 4-week engagement)

Verbatim from docs/threat-model.md §6:

  • A. Moat-module input parsing (~5 person-days)module/pkg/nccl/fr_parser/ (pickle parser, ~1,800 LOC; opcode-allowlist bypass, declared-length games, depth-limit games, memoization-table abuse), module/receiver/ncclfrreceiver/ (path traversal, rotation race, file-size enforcement), OTTL Xid-regex set in docs/integrations/journald-kernel.md.
  • B. Credential handling (~3 person-days) — exporter auth-header leak in slog error paths (otlphttpexporter, datadogexporter, clickhouseexporter); recipe-level tls.insecure: true audit; envFrom Secret pattern verification.
  • C. Kubernetes RBAC (~2 person-days) — rendered ClusterRole verb-map per recipe under docs/integrations/; NetworkPolicy default-deny; automountServiceAccountToken: false baseline + opt-in posture.
  • D. Vendor-SDK isolation boundary (~1 person-day) — confirm no cgo to libdcgm/libcuda/libnvml/libnccl post RFC-0013 PR-F.1; recipe-side Prometheus boundary verification.
  • E. Build + supply chain (~3 person-days) — SLSA L3 attestation chain end-to-end; reproducible-build verification (diffoscope on two independent builds from the same commit); cosign signature verification on every published artifact; SBOM accuracy diff vs go.sum + base-image package list.
  • F. Dependency / CVE process (~1 person-day) — disclosure-SLA review against SECURITY.md; govulncheck CI gate verification; Dependabot config review.
  • G. Network surface (~1 person-day) — listener inventory (:8888 /metrics, :13133 / health); bind-address audit; egress audit.
  • H. Threat-model accuracy review (~1 person-day) — read docs/threat-model.md; flag missed scenarios and actor combinations.

Plus ~2-3 person-weeks of writeup on top.

In-repo prep checklist (block audit hand-off until ☑)

Verbatim from docs/threat-model.md §7 — only the audit-blocking rows:

  • Documented threat model — docs/threat-model.md (commit cb5b430)
  • SECURITY.md disclosure inbox verified live (private GHSA flow active)
  • Dependency-pinning auditgo mod tidy clean, no latest tags in builder-config.yaml, every gomod: is fixed semver
  • Fuzz harness gap analysismodule/pkg/nccl/fr_parser/FuzzParseFRPickle exists (-fuzztime=30s in make ci + 10-min nightly); need OSS-Fuzz integration + OTTL Xid-regex fuzz target
  • SLSA L3 attestation chain manual reproduction drill — maintainer team reproduces end-to-end before auditor does
  • NetworkPolicy template rendered by the chart — sibling work in flight
  • Replay-corpus PII / operator-identifying-data review — diff every fixture under module/pkg/replay/testdata/

Non-blocking (ship before GA but not before audit):

  • Cosign verification documented for operators (admission-controller policy snippet in chart README §security)
  • Conftest rule covering rendered ClusterRoles (audit §C residual mitigation)
  • Exporter-credential redaction wrapper — upstream confighttp PR (audit §B residual mitigation)

Vendor shortlist

Trail of Bits, NCC Group, Cure53, Doyensec, X41, Atredis. Quote the
scope letter as §6.A-H verbatim from docs/threat-model.md.

Done

This issue closes when the audit report lands committed to
docs/audits/YYYY-MM-tracecore-v1.0.md, the top-10 risks in
docs/threat-model.md §5 have their "Residual" columns updated with
audit findings, and any open follow-ups are filed as separate issues
linked back here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions