Primer Roadmap

Primer is the harness intelligence layer for agentic engineering. Research and industry evidence converge on a single insight: outcome quality is determined more by the agent harness — tool design, context management, caching, orchestration, and permission boundaries — than by model capability alone. Primer captures session telemetry across agents, decomposes it into harness dimensions, and measures which configurations actually improve outcomes. It turns that data into harness attribution, coaching, enablement, and operational decisions.

This roadmap is organized in two layers:

Strategy and priorities at the top.
Detailed shipped and planned capabilities underneath.

Items marked with [x] are shipped. Items marked with [ ] are planned. Planned items include rough priority tags:

P0 - foundational and near-term
P1 - important follow-on work
P2 - valuable expansion work

What Primer Should Help Teams Answer

Harness effectiveness: Which harness configurations (tool designs, caching strategies, context management, orchestration patterns, permission boundaries) correlate with better outcomes?
Harness attribution: When a session succeeds or fails, which harness components contributed? What's the per-step compound reliability?
Harness evolution: How have harness configurations changed over time, and did those changes improve outcomes?
Harnessability: Are codebases and teams structurally ready for effective agent harnesses (documentation quality, typing, module boundaries, data governance)?
Dead weight: Which harness configurations are outdated compensations for older model limitations that now bottleneck performance?
Individual effectiveness: What should each engineer change about their harness setup to improve outcomes?

Product Goals

Measure harness effectiveness, not just usage — decompose outcomes to the harness component level.
Build the "code coverage for harnesses" that the industry is asking for (per-component reliability, compound failure math).
Track longitudinal harness evolution so teams can see how configuration changes correlate with outcome changes over time.
Make harnessability scoring a first-class product surface (documentation quality, context freshness, guide/sensor coverage).
Close the loop from harness insight to intervention — including subtractive coaching ("what can you stop doing?").
Bring harness intelligence into the engineer workflow via MCP sidecar, not only after the fact.

Strategic Themes

P0: Harness Attribution and Compound Reliability

Primer's moat is decomposing outcomes to the harness component level. Per-tool success rates, compound reliability math (10 steps at 99% = 90.4% end-to-end), and harness configuration fingerprinting from session telemetry. This is the "code coverage for harnesses" that the industry is asking for.

P0: Measurement Integrity

Trustworthy semantics remain foundational. Clean taxonomy for outcomes, goals, friction, and success, plus reprocessing and coverage tooling so every downstream metric is credible.

P0: Harness Evolution Tracking

Longitudinal correlation of harness configuration changes with outcome changes over time. LangChain rewrote their harness 4x in one year; Vercel removed 80% of tools and improved. No tool tracks this — Primer's team-level time-series data is uniquely positioned.

P1: Harnessability Scoring

Measure whether codebases and teams have the structural properties (documentation quality, context freshness, module boundaries, guide/sensor coverage) that make agent harnesses effective. Extends existing project readiness into a full harnessability assessment.

P1: Closed-Loop Coaching and Experimentation

Recommendations should become assignable, measurable interventions — including subtractive coaching ("what can you stop doing?"). Dead weight detection identifies harness configurations that are outdated compensations for older model limitations.

P1: In-Workflow Guidance

The most valuable insights should show up during the session via MCP sidecar: harness health scores, context quality warnings, dead weight alerts, and configuration recommendations.

P2: Operational Scale and Enterprise Readiness

Derived data pipelines, performance optimization, durable background jobs, enterprise identity, and observability.

Near-Term Priorities

P0 Per-tool success rate tracking with compound reliability computation — decompose session outcomes to the tool/step level.
P0 Harness configuration fingerprinting — extract and catalog the actual harness configuration (tools, context files, permissions, customizations) from session telemetry.
P0 Context quality scoring — measure AGENTS.md freshness, token efficiency, and guide/sensor coverage per project.
P1 Harness evolution timeline — before/after correlation of configuration changes with outcome changes.
P1 Harnessability scoring per project — documentation quality, typing strength, module boundaries, data governance readiness.
P1 Paragon's 4-dimension evaluation — tool correctness, tool usage accuracy, task completion, task efficiency.
P1 Semantic search over sessions via pgvector — exemplar discovery and cross-engineer pattern matching.
P2 Automated harness optimization suggestions — evolve coaching to recommend specific harness configuration changes.

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Engineer Intelligence

Growth & Onboarding

Cohort comparison (new hire / ramping / experienced)
Time-to-team-average tracking for new hires
Onboarding velocity scoring
Onboarding recommendations
Shared behavior pattern discovery with approach comparison
[P1] Bright spot detection: explicitly surface high performers and cross-pollinate their patterns
[P1] Exemplar-session-to-learning-path pipeline
[P1] Team skill gap mapping by workflow, tool category, and project context
[P2] Coaching program measurement: which onboarding or training changes improved outcomes

Project Intelligence

Dedicated project workspace with readiness, friction, quality, cost, and enablement views
Project AI-readiness scoring (CLAUDE.md, AGENTS.md, .claude/ detection)
Project scorecard that combines adoption, effectiveness, quality, and cost efficiency
[P0] Project-level workflow fingerprints and friction hotspots
[P1] Project-level agent mix comparison, including Cursor sessions alongside CLI agents
[P1] Repository context model: language mix, test maturity, repo size, and AI-enablement signals
[P1] Project enablement recommendations tied to observed bottlenecks
[P1] Cross-project comparison: which repos are easiest or hardest to use AI effectively in
[P2] Project playbook templates for greenfield, legacy, high-compliance, and test-poor repos

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

AI-generated narrative reports (engineer, team, org scope)
Narrative caching with TTL-based expiry
Auto-refresh via lifespan task
Conversational data explorer (SSE-streamed tool-use chat)
AI-powered recommendations panel
[P1] Saved explorer prompts and reusable report cards
[P1] Compare mode for engineer, team, project, and time-period analysis
[P2] Weekly manager review packs that combine quality, friction, growth, and cost
[P2] Recommendation narratives that explain why a workflow is likely to help

Website & Positioning

[P1] Reposition the website around harness intelligence for agentic engineering
[P1] Showcase harness effectiveness, cost attribution, quality, and exemplar sessions as the core proof points

Interventions & Experimentation

[P0] Recommendation-to-intervention workflow with owner, status, due date, and linked evidence
[P0] Before-and-after measurement for coaching, tooling, or repo changes
[P1] Experimentation layer for training rollouts, tool changes, and enablement playbooks
[P1] Intervention effectiveness reporting by team, project, and engineer cohort
[P2] Auto-generated next-step plans from alerts, narratives, and project findings

Real-Time Engineer Experience

MCP sidecar with on-demand stats, friction reports, and recommendations
[P0] Proactive coaching skill that activates at session start with contextual suggestions
[P0] Live session signals that stream friction, satisfaction, and risk as work happens
[P1] In-session workflow nudges based on project playbooks and prior failures
[P1] Daily and weekly personal recaps inside the sidecar
[P2] Lightweight session planning prompts before complex work begins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primer Roadmap

What Primer Should Help Teams Answer

Product Goals

Strategic Themes

P0: Harness Attribution and Compound Reliability

P0: Measurement Integrity

P0: Harness Evolution Tracking

P1: Harnessability Scoring

P1: Closed-Loop Coaching and Experimentation

P1: In-Workflow Guidance

P2: Operational Scale and Enterprise Readiness

Near-Term Priorities

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Engineer Intelligence

Growth & Onboarding

Project Intelligence

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

Website & Positioning

Interventions & Experimentation

Real-Time Engineer Experience

Organization & Administration

Platform & Infrastructure

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Primer Roadmap

What Primer Should Help Teams Answer

Product Goals

Strategic Themes

P0: Harness Attribution and Compound Reliability

P0: Measurement Integrity

P0: Harness Evolution Tracking

P1: Harnessability Scoring

P1: Closed-Loop Coaching and Experimentation

P1: In-Workflow Guidance

P2: Operational Scale and Enterprise Readiness

Near-Term Priorities

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Engineer Intelligence

Growth & Onboarding

Project Intelligence

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

Website & Positioning

Interventions & Experimentation

Real-Time Engineer Experience

Organization & Administration

Platform & Infrastructure