diff --git a/docs/NORTHSTARS.md b/docs/NORTHSTARS.md index aac179dd..b2011ccc 100644 --- a/docs/NORTHSTARS.md +++ b/docs/NORTHSTARS.md @@ -277,6 +277,8 @@ Seven lines of work. Each has one accountable owner role, one hero KPI, supporti - "Unique production organizations" is unmeasurable in full. The named methodology is partial-coverage. State this in any public adoption claim. - Conference acceptance is partially out of our control; we measure *submissions accepted*, not talks given. +**Partner outreach plan:** the v1.0 rc1 → GA outreach package lives at [`docs/partner-outreach.md`](partner-outreach.md) — target-persona list (P1-P12 archetypes), outreach message template, live-conversation talking points, tracker shape, and numeric success criteria (≥5 contacted, ≥2 feedback-received, ≥1 pilot-deploy-attempted before GA tag). Tracks against the [v1.0-rc1 Tier-2 B](v1-rc1-cut-criteria.md#b-partner-outreach-checkpoint) cut criterion; the SC-3 pilot-deploy gate is a GA blocker. + --- ### O6. Velocity diff --git a/docs/RELEASE-CHECKLIST.md b/docs/RELEASE-CHECKLIST.md index ae80894a..e8d50f96 100644 --- a/docs/RELEASE-CHECKLIST.md +++ b/docs/RELEASE-CHECKLIST.md @@ -89,10 +89,17 @@ All RC gates, plus: - [ ] **Third-party security audit report published** — report at `docs/audits/-tracecore-v1.0.md`; all P0/P1 findings closed +- [ ] **Partner outreach checkpoint cleared** — outreach package at + [`docs/partner-outreach.md`](partner-outreach.md) has accumulated + ≥5 contacted operators (SC-1), ≥2 feedback responses (SC-2), and + ≥1 pilot-deploy attempt (SC-3) in its §Tracker. Status flips on the + next `make cut-criteria-render` cycle once the tracker rows land. - [ ] **Real-operator integration verified** — at least one production GPU cluster operator has installed the production-preset Helm chart end-to-end and confirmed ≥1 pattern verdict fires against real - signal (write-up at `docs/case-studies/.md`) + signal (write-up at `docs/case-studies/.md`); this is the + same operator as the SC-3 pilot-deploy row in + [`docs/partner-outreach.md`](partner-outreach.md). - [ ] **Reproducibility chain externally verified** — independent third party runs `cosign verify-blob` + `slsa-verifier verify-artifact` against a published artifact; result diff --git a/docs/cut-criteria.yaml b/docs/cut-criteria.yaml index cc2d2ded..6babc6c8 100644 --- a/docs/cut-criteria.yaml +++ b/docs/cut-criteria.yaml @@ -449,26 +449,72 @@ - id: B title: "Partner outreach checkpoint" tier: 2 - owner: "" + owner: "O5 (Distribution & Community) — hero KPI is unique production organizations; outreach is the on-ramp." rubric: | - A tracking issue lists candidate operators (Lambda, CoreWeave, fal.ai, - anyscale, hyperscaler GPU teams) and records at least three intro - conversations. Each conversation entry has a date, the operator-side - contact, and a yes/no/deferred on whether they will run the - production-preset chart (criterion 10) against real signal. At least - one operator commits to the rc1 → GA validation window before tag-cut. + The outreach-ready package lives at + [`docs/partner-outreach.md`](partner-outreach.md) — target-persona + list (≥10 archetypes), outreach message template (≤200 word pitch + with a specific ask), live-conversation talking points (the + 15-pattern coverage story, OCB-distro pivot, why-now signal, + differentiation vs Prometheus + Datadog + DCGM-exporter), tracker + table shape, and numeric success criteria (≥5 contacted, ≥2 + feedback-received, ≥1 pilot-deploy-attempted before GA tag). The + package is the falsifiable artifact; the partner conversations + themselves land as rows in the tracker. citation: - "[`docs/RELEASE-CHECKLIST.md`](RELEASE-CHECKLIST.md) GA gate \"Real-operator integration verified\"" - "[`NORTHSTARS.md` §O5 hero KPI](NORTHSTARS.md#o5-distribution--community) (≥15 unique production orgs at M12)" + - "[`docs/partner-outreach.md`](partner-outreach.md) §Success criteria" rubric_check: - # No in-repo artifact — this is tracked on the issue tracker. Status - # stays ☐ (PLANNED) by hard-coding `false`; once a partner-outreach - # doc lands at a stable path the check flips to `test -f docs/`. - # (The literal path was previously named here, but the YAML link-rot - # gate flagged it as dead — correctly — so we hard-code instead.) - artifact_exists: "false" - gate_script: "" - notes: "" + # Artifact present: the outreach-ready package exists at a stable + # path. File-presence alone flips this off ☐ planned. + artifact_exists: "test -f docs/partner-outreach.md" + # Binding gate: at least five real (non-placeholder) rows in the + # §Tracker table of docs/partner-outreach.md, where "real" means a + # tracker row whose Status column carries a real status token, not + # the placeholder enumeration string. The threshold matches SC-1 + # (≥5 operators contacted) in docs/partner-outreach.md §Success + # criteria — the executable gate must not be looser than the + # written contract. Until at least five operators have been + # contacted, this gate fails → status holds at ⧗ in progress. The + # grep walks the tracker block and counts pipe-delimited rows + # whose Status cell is exactly one of the documented tokens, + # excluding the placeholder enumeration row that lists every + # token separated by ' / '. `queued` is intentionally excluded: + # SC-1 in docs/partner-outreach.md scopes "contacted" as outreach + # actually sent, and `queued` rows are merely identified — they + # do not count toward the 5. + gate_script: | + awk -F'|' ' + /^## 4\. Tracker/ { in_tracker = 1; next } + in_tracker && /^## / { in_tracker = 0 } + in_tracker && /^\| / { + status = $6 + gsub(/^ +| +$/, "", status) + if (status == "contacted" || status == "meeting-set" || \ + status == "feedback-received" || status == "pilot-deploy-attempted" || \ + status == "declined" || status == "deferred") { + n++ + } + } + END { exit (n >= 5 ? 0 : 1) } + ' docs/partner-outreach.md + notes: | + [`docs/partner-outreach.md`](partner-outreach.md) shipped: 12 + operator-archetype persona list (P1-P12 — GPU-cloud providers, + frontier-model labs, hyperscaler ML infra teams, OTel-adjacent + maintainers, mid-cap AI startups, HPC centers), ≤200-word cold- + outreach pitch with audit-funding + OSS-maintainer variants, + seven live-conversation talking points (15-pattern coverage, + RFC-0013 OCB-distro pivot, why-now signal, three-layer + differentiation, production posture, what-we-are-NOT framing), + tracker column shape + status vocabulary + hygiene rules, three + falsifiable success criteria (SC-1 ≥5 contacted, SC-2 ≥2 + feedback-received, SC-3 ≥1 pilot-deploy-attempted). SC-3 is the + GA blocker; rc1 ships with all three open. The ⧗ in progress + marker reflects "package ready, conversations not yet started"; + the gate script flips to ☑ when the §Tracker table accumulates + five non-placeholder rows (matching SC-1). - id: C title: "Launch artifacts drafted" diff --git a/docs/partner-outreach.md b/docs/partner-outreach.md new file mode 100644 index 00000000..966263d9 --- /dev/null +++ b/docs/partner-outreach.md @@ -0,0 +1,297 @@ +# Partner outreach plan (v1.0 rc1 → GA) + +This document is the **outreach-ready package** for tracecore's first +deliberate operator-facing distribution push. It is the artifact a +maintainer hands a sponsor (or themselves, two weeks later) to +demonstrate that "we tried to talk to real operators" is falsifiable — +the persona list is explicit, the pitch is written down, the tracker +shape is committed, and the success criteria are numeric. It is **not** +a committed campaign yet — no conversations have started (see §Status) +and the tracker rows are empty. + +Read in this order: + +1. [`docs/NORTHSTARS.md` §O5](NORTHSTARS.md#o5-distribution--community) + — the Distribution & Community objective whose M12 hero KPI + ("≥15 unique production organizations") this outreach pushes + toward. +2. [`docs/v1-rc1-cut-criteria.md` §Tier 2 B](v1-rc1-cut-criteria.md#b-partner-outreach-checkpoint) + — the rc1 Tier-2 cut criterion this artifact addresses; the + numeric success criteria below (§5) are the criterion's gate. +3. [`docs/security-audit-rfp.md` §Funding plan + gap](security-audit-rfp.md#funding-plan--gap) + — the audit-funding ask that partner conversations carry as a + secondary line item; sibling Tier-2 A artifact. +4. This document — persona list, pitch, talking points, tracker, + success criteria. + +When a partner conversation lands a yes/no/deferred, the responsible +maintainer adds a row to the §Tracker below in the same PR that records +the conversation. When the [§Success criteria](#success-criteria) +below all flip to ☑, this artifact's cut-criterion (rc1 Tier-2 B) flips +to ☑ on the next `make cut-criteria-render` cycle. + +## Status + +| Item | State | +|---|---| +| Persona list drafted | landed (this doc, §1) | +| Outreach message template drafted | landed (this doc, §2) | +| Talking-points crib drafted | landed (this doc, §3) | +| Tracker shape committed | landed (this doc, §4) | +| ≥5 operators contacted | **not yet** — tracker empty | +| ≥2 feedback responses received | **not yet** | +| ≥1 pilot deploy attempted | **not yet** — GA-gate prerequisite | + +## 1. Target persona list + +Outreach targets ML platform / observability / SRE leads at shops that +run multi-node GPU training workloads at a scale where the +[15 named root-cause patterns](NORTHSTARS.md#appendix-a-the-15-named-root-cause-patterns) +are real pain rather than theoretical. "Hyperscaler-adjacent" is the +operational shorthand: the shop does not need to be a hyperscaler, +but it needs to have the failure modes a hyperscaler does. Each +archetype carries a one-line "why they would care" so a contact +researcher can sanity-check candidates against the list. + +| # | Archetype | Why they care | +|---|---|---| +| **P1** | ML platform lead at a **GPU-cloud provider** (CoreWeave, Lambda, fal.ai, RunPod, Crusoe, Together AI, Modal). | They sell GPU-hours; their customers' training jobs are their uptime; pattern-level root-cause shortens MTTR on the cluster they bill for. | +| **P2** | Observability / SRE lead at a **frontier-model lab** training >100B-parameter runs (Anthropic, OpenAI, xAI, Mistral, Cohere, AI21, Reka, Adept-style). | They run 1000+ GPU jobs for weeks; a silent NCCL hang or HBM-ECC storm is a million-dollar event. They have in-house equivalents and benchmark the OSS alternative. | +| **P3** | Platform engineer at a **hyperscaler ML infra team** (AWS Trainium, GCP TPU+H100 pods, Azure ND-series, Oracle OCI GPU). | Their internal observability is closed-source; they need an OSS narrative for external customers. Pattern verdicts are a vendor-neutral interop layer. | +| **P4** | Open-source maintainer at an **OTel-adjacent project** (OpenTelemetry Collector contrib, Prometheus, Grafana, Pixie, Beyla). | They review the OCB-distro pivot for downstream fit; they're a credibility multiplier when they say "this is the right shape." | +| **P5** | Infra lead at a **mid-cap AI startup** training 7-70B-parameter models on 8-128 H100s (Suno, Pika, Hailuo, Ideogram, Krea, Cartesia, ElevenLabs, Decart). | They feel the pain without the in-house budget to build a parallel tracecore. They are the ideal pilot partner. | +| **P6** | Reliability engineer at a **scientific-computing HPC center** running GPU jobs (NERSC, Oak Ridge OLCF, Argonne ALCF, CSCS, JSC, RIKEN). | They run NCCL on InfiniBand at scale; their job mix overlaps the moat (Xid, IB link flap, NCCL bootstrap, HBM-ECC). They have OSS-only policies. | +| **P7** | Platform lead at a **GPU-renting marketplace / aggregator** (Vast.ai, TensorDock, Foundry, Salad). | Their customers complain about "GPU bad" with no further detail; pattern verdicts make the conversation actionable. | +| **P8** | Observability vendor PM at a **commercial APM** evaluating GPU-training coverage (Datadog Watchdog, New Relic, Honeycomb, Chronosphere, Splunk Observability). | They are buyers, not adopters; a conversation here calibrates the positioning narrative (§Talking points #4) before launch. | +| **P9** | Researcher / engineer at a **kernel-engineering shop** (Together-style fork maintainers, NVIDIA cuDNN team, Mosaic-style optimization shops). | They produce the workloads tracecore explains; their feedback closes the loop on whether the moat patterns actually catch real bugs. | +| **P10** | Reliability lead at a **GPU-renting end customer** running a known-public training campaign (Stability successors, RWKV / community-pretrain orgs, Eleuther-style collectives). | Public training campaigns mean public postmortems; if tracecore's verdicts show up in one, that is the highest-leverage adoption proof point. | +| **P11** | **CNCF TAG-Observability / SIG-AI** participant. | They run the upstream conversation tracecore inherits; an early conversation here de-risks the v1.x CNCF-adjacent positioning. | +| **P12** | **MLOps community organizer** (MLOps.community Slack admin, mlops.conf chair, KubeCon AI-day co-chair). | They are pre-1.0 launch surface — a single message in their channels reaches every other persona above at once. Talk to them last; bring a real conversation count to the meeting. | + +Persona list is intentionally archetype-shaped, not a named-individual +list. The §Tracker below carries the named-individual data; this list +is the validation surface against which a contact researcher can ask +"is this person in scope?" before adding them to the tracker. + +## 2. Outreach message template + +Short pitch (~190 words) suitable for cold email / DM / a single LinkedIn +message. The ask is concrete: try the install on a real cluster and +tell us what broke. The ask is not a meeting; a meeting is the +fallback. The audience is the §1 persona list. + +> **Subject:** tracecore — pattern-level root-cause for GPU training, +> v1.0-rc1 ready for partners +> +> Hi {name}, +> +> I'm the maintainer of [tracecore](https://github.com/TraceCoreAI/tracecore), +> an OpenTelemetry Collector distribution that classifies multi-node GPU +> training failures into named root-cause patterns — Xid storms, HBM ECC, +> NCCL hangs, IB link flap, PCIe AER, the [usual twelve](https://github.com/TraceCoreAI/tracecore/blob/main/docs/NORTHSTARS.md#appendix-a-the-15-named-root-cause-patterns). +> We just landed v1.0-rc1; before we tag GA we want feedback from +> operators who run this stuff in anger. +> +> Specific ask, in priority order: +> +> 1. **Install the production-preset Helm chart** on a non-production +> cluster ([`docs/reference-architectures/01-single-cluster.md`](https://github.com/TraceCoreAI/tracecore/blob/main/docs/reference-architectures/01-single-cluster.md)), +> let it ingest one real failure, and reply with what broke. +> 2. **30 minutes on a call** to walk through your training-failure +> taxonomy and tell us which patterns we missed. +> 3. **A short email** with the one sentence you'd want a colleague +> to hear about tracecore before they evaluate it. +> +> No fluff, no sales follow-up, no auto-newsletter. We are pre-1.0 +> OSS and want our positioning calibrated against real operators +> before the v1.0.0 tag. +> +> Thanks for considering it, +> {maintainer} + +### Variants + +- **Audit-funding line.** When the recipient is at a sponsor-class + org (P1, P3, P10 in §1), append: + > P.S. We're funding the v1.0 third-party security audit through + > [OpenSSF Alpha-Omega](https://alpha-omega.dev/) / + > [OSTIF](https://ostif.org/); if your org sponsors OSS security + > audits, we have an [RFP-ready package](security-audit-rfp.md) + > we can hand you. +- **OSS-maintainer variant.** When the recipient is a P4 (OTel-adjacent + maintainer), drop the install-the-chart ask and lead with the + RFC-0013 distro pivot: + > We pivoted to an OCB-assembled distro shape (RFC-0013) — the in-house + > code is bounded to four moat scopes. We'd value 30 min of your time + > on whether the moat boundary is drawn in the right place. +- **Community-organizer variant** (P12). Bring real numbers; don't + cold-pitch. + +## 3. Talking points + +For live conversations — discovery calls, conference hallway, Slack DM +follow-up. Seven points; each anchors a question the persona is likely +to ask back. + +1. **The 15-pattern coverage story.** Tracecore ships detectors for + the [15 named root-cause patterns](NORTHSTARS.md#appendix-a-the-15-named-root-cause-patterns) + that account for the majority of multi-node GPU training-job + failures. Twelve of fifteen are end-to-end at v1.0-rc1 + ([cut criterion 1](v1-rc1-cut-criteria.md#1-pattern-coverage--12-of-15)); + the remaining three (silent data corruption, loss-spike → NaN, + FP8 numerics) are explicit v1.1/v2 scope. We classify, not just + collect. +2. **The OCB-distro pivot ([RFC-0013](rfcs/0013-distro-first-pivot.md)).** + tracecore is **not** a parallel Collector — it is an OpenTelemetry + Collector Builder (OCB) distribution that bundles upstream + contrib + components and adds an explicitly bounded "moat" module (pattern + detectors, NCCL FlightRecorder parsing, OTTL processors, install/bench + harness). Operators get the upstream contract for everything else; + in-house surface is small enough to audit (see + [`docs/threat-model.md` §1](threat-model.md#1-asset-inventory)). +3. **Why now: the structural signal.** Multi-node GPU training is now + a category — every frontier-model lab, every GPU-cloud, every + mid-cap AI startup is paying the same failure-mode tax. The + in-house equivalents (Anthropic's internal tooling, OpenAI's, + Meta's published [GenAI infra + paper](https://research.facebook.com/publications/the-llama-3-herd-of-models/)) + are private. An OSS distro with a maintained pattern library is + the obvious gap. +4. **Differentiation, three layers deep:** + - vs. **Prometheus + DCGM-exporter + Grafana** — they give you + metrics; tracecore gives you a verdict. "GPU `nvidia0` is + `Xid 79` storming" is a metric. "Pod `worker-3` was evicted + because `Xid 79 → ECC uncorrectable → PCIe AER` in 4.2s on + `node-0042`" is a pattern verdict. Both ship; the second is + a one-line page-out instead of an investigation. + - vs. **Datadog Watchdog / commercial APMs** — closed source, + vendor lock, no cross-vendor narrative. tracecore is OSS, + OTel-native, and the verdict schema is + [published + versioned](schemas/verdict-1.0.0-rc1.json) so + downstream consumers don't depend on us shipping. + - vs. **DCGM-exporter alone / vendor SDK exporters** — vendor-tied, + no cross-signal correlation. The pattern detectors fuse signals + from `dcgm-exporter`, kernel `journald`, NCCL FlightRecorder + dumps, and k8s pod events; no single vendor exporter does + cross-signal join. +5. **Production posture.** Helm chart with NetworkPolicy default-deny, + non-root pod, distroless image, cosign-signed binaries with SLSA L3 + provenance, third-party audit RFP filed + ([`docs/security-audit-rfp.md`](security-audit-rfp.md)). We do not + phone home ([`NORTHSTARS.md` §O2 operating rule + #5](NORTHSTARS.md#o2-convenience--quality)); install counts come + from opt-in only. +6. **What we are explicitly NOT.** A model-quality / hyperparameter + tracker — that's Weights & Biases. A workflow orchestrator — that's + Argo / Ray / Slurm. An RCA-as-a-service — we ship the detectors, + you ship the alert routing. See + [`NORTHSTARS.md` §"What we explicitly do not chase"](NORTHSTARS.md#what-we-explicitly-do-not-chase). +7. **What we want from this conversation.** In priority: try the + install on a real cluster; tell us which patterns we missed; tell + us what the install-friction story is on your cluster shape; if + your org sponsors OSS audits, point us at the program. + +## 4. Tracker + +The tracker shape is the artifact. Rows below are placeholders that +demonstrate the column shape — they must be replaced or deleted in +the PR that records the first real conversation. The §Success +criteria below count real rows only. + +| Name | Role | Org-archetype (§1) | First contacted | Status | Notes | +|---|---|---|---|---|---| +| _placeholder_ | _placeholder_ | _P1-P12_ | _YYYY-MM-DD_ | queued / contacted / meeting-set / feedback-received / pilot-deploy-attempted / declined / deferred | _free text — one line max; link to a private GHSA thread or a public issue if there is one_ | + +### Status vocabulary + +- **queued** — identified in §1 / a contact researcher proposed them; + no message sent yet. +- **contacted** — outreach message sent (§2). No response yet. +- **meeting-set** — a call is scheduled or has happened. Notes + field carries the date + a one-line takeaway. +- **feedback-received** — written or verbal feedback received + (install attempt, schema review, persona-list calibration, missed + pattern). Counts toward + [§Success criterion 2](#success-criteria). +- **pilot-deploy-attempted** — the operator has actually run the + production-preset Helm chart against a real cluster, even if it + didn't work end-to-end. Counts toward + [§Success criterion 3](#success-criteria). +- **declined** — explicit no; archived row but kept for honesty. +- **deferred** — interested but not on the rc1 → GA clock. Re-check + at the next minor. + +### Tracker hygiene rules + +- **One row per person, not per org.** If two engineers at the same + org engage, two rows; the org-archetype column is the join key. +- **Notes field stays one line.** Long narrative goes in a separate + follow-up doc (e.g. `docs/case-studies/.md` per the + [GA gate](RELEASE-CHECKLIST.md#ga-additional-gates)), not in this + table. +- **No private contact info in the public repo.** Names + public roles + + public org-archetype only. Email addresses, phone numbers, and + Slack DMs stay out of the tracker; if a private channel exists, + the notes column says "private GHSA thread" or similar and the + link itself stays internal. +- **Honesty disclaimer in the row.** When a conversation goes + sideways (declined / deferred / no-show), the row records that + honestly. The success criteria count real conversations, not + meetings-on-the-calendar. + +## Success criteria + +These are the falsifiable gates this artifact promises against. When +all three flip ☑, the rc1 Tier-2 B cut criterion flips to ☑ on the +next `make cut-criteria-render` cycle (the gate script reads the +§Tracker above). + +| # | Gate | Status | When it flips | +|---|---|---|---| +| **SC-1** | **≥5 operators contacted.** ≥5 distinct rows in §Tracker with `Status` ∈ {contacted, meeting-set, feedback-received, pilot-deploy-attempted, declined, deferred}. Placeholder rows do not count. `queued` rows do **not** count toward the 5 — outreach is "actually sent," not "identified" — matching the executable gate at [`docs/cut-criteria.yaml` §B `gate_script`](cut-criteria.yaml). | open | At any point before the GA tag; rc1 can ship with this open. | +| **SC-2** | **≥2 feedback responses received.** ≥2 rows with `Status` ∈ {meeting-set, feedback-received, pilot-deploy-attempted}. | open | Pre-GA; if not closed by GA tag, the GA release notes call it out per [`RELEASE-CHECKLIST.md` GA gate](RELEASE-CHECKLIST.md#ga-additional-gates). | +| **SC-3** | **≥1 pilot deploy attempted.** ≥1 row with `Status` = `pilot-deploy-attempted`. The corresponding writeup lands at `docs/case-studies/.md` per the [Real-operator integration GA gate](RELEASE-CHECKLIST.md#ga-additional-gates). | open | **GA blocker.** v1.0.0 tag does not cut until this is ☑. | + +**Numeric calibration.** The 5/2/1 ladder is deliberately conservative +for a solo-maintainer pre-1.0 effort. Hyperscaler-adjacent ML platform +leads have full inboxes; a 40% contact-to-feedback rate (2 of 5) and +a 20% contact-to-pilot rate (1 of 5) is the honest expectation, not +sandbagging. If the actual rates are worse, the v1.0 GA tag slips +honestly (per +[`NORTHSTARS.md` §O5 caveats](NORTHSTARS.md#o5-distribution--community)), +not by lowering the gate. + +## Cross-references + +This doc is the v1.0 rc1 → GA partner-outreach plan. The artifacts it +talks to: + +- [`docs/NORTHSTARS.md` §O5](NORTHSTARS.md#o5-distribution--community) + — the hero KPI (unique production organizations) this outreach + pushes toward. +- [`docs/v1-rc1-cut-criteria.md` §Tier 2 B](v1-rc1-cut-criteria.md#b-partner-outreach-checkpoint) + — the rc1 Tier-2 criterion whose status this doc gates. +- [`docs/v1-ga-cut-criteria.md`](v1-ga-cut-criteria.md) — the GA-tier + view; SC-3 above is a GA blocker. +- [`docs/RELEASE-CHECKLIST.md` GA additional gates](RELEASE-CHECKLIST.md#ga-additional-gates) + — the "Real-operator integration verified" GA gate. SC-3 above + feeds it directly. +- [`docs/security-audit-rfp.md`](security-audit-rfp.md) §Funding plan + + gap — the audit-funding line carried as a §2 variant. +- [`docs/reference-architectures/01-single-cluster.md`](reference-architectures/01-single-cluster.md) + — the install target the §2 outreach message points partners at. + +## When this plan is wrong + +This is the best understanding of "what partner outreach looks like +for tracecore" as of the doc's landing commit. The first real +conversation will reveal scope this doc missed — a persona archetype +the §1 list omitted, a question the §3 talking points didn't +anticipate, a tracker column the §4 shape doesn't carry. When this +document is wrong, update it in the same PR that records the +conversation. Don't accumulate outreach-plan debt. + +The numeric success criteria (§5) are the contract. The persona +list, the pitch, the talking points, the tracker shape — all of +those are revisable. The 5/2/1 ladder is not, until a CNCF-adjacent +or v1.x roadmap deliberately re-anchors it. diff --git a/docs/v1-rc1-cut-criteria.md b/docs/v1-rc1-cut-criteria.md index 076944ab..21804b68 100644 --- a/docs/v1-rc1-cut-criteria.md +++ b/docs/v1-rc1-cut-criteria.md @@ -296,14 +296,33 @@ repo. ### B. Partner outreach checkpoint -- **Falsifiable rubric:** A tracking issue lists candidate operators (Lambda, CoreWeave, fal.ai, - anyscale, hyperscaler GPU teams) and records at least three intro - conversations. Each conversation entry has a date, the operator-side - contact, and a yes/no/deferred on whether they will run the - production-preset chart (criterion 10) against real signal. At least - one operator commits to the rc1 → GA validation window before tag-cut. -- **Citation:** [`docs/RELEASE-CHECKLIST.md`](RELEASE-CHECKLIST.md) GA gate "Real-operator integration verified" and [`NORTHSTARS.md` §O5 hero KPI](NORTHSTARS.md#o5-distribution--community) (≥15 unique production orgs at M12). -- **Status:** ☐ planned +- **Owner objective:** O5 (Distribution & Community) — hero KPI is unique production organizations; outreach is the on-ramp. +- **Falsifiable rubric:** The outreach-ready package lives at + [`docs/partner-outreach.md`](partner-outreach.md) — target-persona + list (≥10 archetypes), outreach message template (≤200 word pitch + with a specific ask), live-conversation talking points (the + 15-pattern coverage story, OCB-distro pivot, why-now signal, + differentiation vs Prometheus + Datadog + DCGM-exporter), tracker + table shape, and numeric success criteria (≥5 contacted, ≥2 + feedback-received, ≥1 pilot-deploy-attempted before GA tag). The + package is the falsifiable artifact; the partner conversations + themselves land as rows in the tracker. +- **Citation:** [`docs/RELEASE-CHECKLIST.md`](RELEASE-CHECKLIST.md) GA gate "Real-operator integration verified" and [`NORTHSTARS.md` §O5 hero KPI](NORTHSTARS.md#o5-distribution--community) (≥15 unique production orgs at M12) and [`docs/partner-outreach.md`](partner-outreach.md) §Success criteria. +- **Status:** ⧗ in progress — [`docs/partner-outreach.md`](partner-outreach.md) shipped: 12 + operator-archetype persona list (P1-P12 — GPU-cloud providers, + frontier-model labs, hyperscaler ML infra teams, OTel-adjacent + maintainers, mid-cap AI startups, HPC centers), ≤200-word cold- + outreach pitch with audit-funding + OSS-maintainer variants, + seven live-conversation talking points (15-pattern coverage, + RFC-0013 OCB-distro pivot, why-now signal, three-layer + differentiation, production posture, what-we-are-NOT framing), + tracker column shape + status vocabulary + hygiene rules, three + falsifiable success criteria (SC-1 ≥5 contacted, SC-2 ≥2 + feedback-received, SC-3 ≥1 pilot-deploy-attempted). SC-3 is the + GA blocker; rc1 ships with all three open. The ⧗ in progress + marker reflects "package ready, conversations not yet started"; + the gate script flips to ☑ when the §Tracker table accumulates + five non-placeholder rows (matching SC-1). ### C. Launch artifacts drafted