Summary
Commission tracecore's first external security audit before cutting
v1.0 GA. Threat model landed in docs/threat-model.md (commit
cb5b430). This issue tracks (a) RFP-ready scope to send to a paid
auditor (Trail of Bits, NCC Group, Cure53, Doyensec, similar), and
(b) the in-repo prep work that gates handing the engagement off.
Driver: project_v1_rc1_target Tier 2 prereq + NORTHSTARS.md §O3
"Trust & supply chain". The first audit is the discipline that turns
the SLSA / cosign / SBOM / SECURITY.md stack from claims into
evidence.
Audit RFP scope (~17 person-days, 4-week engagement)
Verbatim from docs/threat-model.md §6:
- A. Moat-module input parsing (~5 person-days) —
module/pkg/nccl/fr_parser/ (pickle parser, ~1,800 LOC; opcode-allowlist bypass, declared-length games, depth-limit games, memoization-table abuse), module/receiver/ncclfrreceiver/ (path traversal, rotation race, file-size enforcement), OTTL Xid-regex set in docs/integrations/journald-kernel.md.
- B. Credential handling (~3 person-days) — exporter auth-header leak in
slog error paths (otlphttpexporter, datadogexporter, clickhouseexporter); recipe-level tls.insecure: true audit; envFrom Secret pattern verification.
- C. Kubernetes RBAC (~2 person-days) — rendered ClusterRole verb-map per recipe under
docs/integrations/; NetworkPolicy default-deny; automountServiceAccountToken: false baseline + opt-in posture.
- D. Vendor-SDK isolation boundary (~1 person-day) — confirm no cgo to libdcgm/libcuda/libnvml/libnccl post RFC-0013 PR-F.1; recipe-side Prometheus boundary verification.
- E. Build + supply chain (~3 person-days) — SLSA L3 attestation chain end-to-end; reproducible-build verification (
diffoscope on two independent builds from the same commit); cosign signature verification on every published artifact; SBOM accuracy diff vs go.sum + base-image package list.
- F. Dependency / CVE process (~1 person-day) — disclosure-SLA review against
SECURITY.md; govulncheck CI gate verification; Dependabot config review.
- G. Network surface (~1 person-day) — listener inventory (
:8888 /metrics, :13133 / health); bind-address audit; egress audit.
- H. Threat-model accuracy review (~1 person-day) — read
docs/threat-model.md; flag missed scenarios and actor combinations.
Plus ~2-3 person-weeks of writeup on top.
In-repo prep checklist (block audit hand-off until ☑)
Verbatim from docs/threat-model.md §7 — only the audit-blocking rows:
Non-blocking (ship before GA but not before audit):
Vendor shortlist
Trail of Bits, NCC Group, Cure53, Doyensec, X41, Atredis. Quote the
scope letter as §6.A-H verbatim from docs/threat-model.md.
Done
This issue closes when the audit report lands committed to
docs/audits/YYYY-MM-tracecore-v1.0.md, the top-10 risks in
docs/threat-model.md §5 have their "Residual" columns updated with
audit findings, and any open follow-ups are filed as separate issues
linked back here.
Summary
Commission tracecore's first external security audit before cutting
v1.0 GA. Threat model landed in
docs/threat-model.md(commitcb5b430). This issue tracks (a) RFP-ready scope to send to a paidauditor (Trail of Bits, NCC Group, Cure53, Doyensec, similar), and
(b) the in-repo prep work that gates handing the engagement off.
Driver:
project_v1_rc1_targetTier 2 prereq +NORTHSTARS.md§O3"Trust & supply chain". The first audit is the discipline that turns
the SLSA / cosign / SBOM / SECURITY.md stack from claims into
evidence.
Audit RFP scope (~17 person-days, 4-week engagement)
Verbatim from
docs/threat-model.md§6:module/pkg/nccl/fr_parser/(pickle parser, ~1,800 LOC; opcode-allowlist bypass, declared-length games, depth-limit games, memoization-table abuse),module/receiver/ncclfrreceiver/(path traversal, rotation race, file-size enforcement), OTTL Xid-regex set indocs/integrations/journald-kernel.md.slogerror paths (otlphttpexporter,datadogexporter,clickhouseexporter); recipe-leveltls.insecure: trueaudit;envFromSecret pattern verification.docs/integrations/; NetworkPolicy default-deny;automountServiceAccountToken: falsebaseline + opt-in posture.diffoscopeon two independent builds from the same commit); cosign signature verification on every published artifact; SBOM accuracy diff vsgo.sum+ base-image package list.SECURITY.md;govulncheckCI gate verification; Dependabot config review.:8888 /metrics,:13133 /health); bind-address audit; egress audit.docs/threat-model.md; flag missed scenarios and actor combinations.Plus ~2-3 person-weeks of writeup on top.
In-repo prep checklist (block audit hand-off until ☑)
Verbatim from
docs/threat-model.md§7 — only the audit-blocking rows:docs/threat-model.md(commitcb5b430)SECURITY.mddisclosure inbox verified live (private GHSA flow active)go mod tidyclean, nolatesttags inbuilder-config.yaml, everygomod:is fixed semvermodule/pkg/nccl/fr_parser/FuzzParseFRPickleexists (-fuzztime=30sinmake ci+ 10-min nightly); need OSS-Fuzz integration + OTTL Xid-regex fuzz targetmodule/pkg/replay/testdata/Non-blocking (ship before GA but not before audit):
confighttpPR (audit §B residual mitigation)Vendor shortlist
Trail of Bits, NCC Group, Cure53, Doyensec, X41, Atredis. Quote the
scope letter as §6.A-H verbatim from
docs/threat-model.md.Done
This issue closes when the audit report lands committed to
docs/audits/YYYY-MM-tracecore-v1.0.md, the top-10 risks indocs/threat-model.md§5 have their "Residual" columns updated withaudit findings, and any open follow-ups are filed as separate issues
linked back here.