diff --git a/.claude/agents b/.claude/agents deleted file mode 120000 index fd65c790e..000000000 --- a/.claude/agents +++ /dev/null @@ -1 +0,0 @@ -../agents \ No newline at end of file diff --git a/.claude/agents/backend-review.md b/.claude/agents/backend-review.md new file mode 100644 index 000000000..e52475c91 --- /dev/null +++ b/.claude/agents/backend-review.md @@ -0,0 +1,123 @@ +--- +name: backend-review +description: > + Review Go backend code for convention violations. Use after modifying files + under components/backend/. Checks for panic usage, service account misuse, + type assertion safety, error handling, token security, and file size. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Backend Review Agent + +Review backend Go code against documented conventions. + +## Context + +Load these files before running checks: + +1. `components/backend/DEVELOPMENT.md` +2. `components/backend/ERROR_PATTERNS.md` +3. `components/backend/K8S_CLIENT_PATTERNS.md` + +## Checks + +### B1: No panic() in production (Blocker) + +```bash +grep -rn "panic(" components/backend/ --include="*.go" | grep -v "_test.go" +``` + +Any match is a Blocker. Production code must return `fmt.Errorf` with context. + +### B2: User-scoped clients for user operations (Blocker) + +In `components/backend/handlers/`: +- `DynamicClient.Resource` or `K8sClient` used for List/Get operations should use `GetK8sClientsForRequest(c)` instead +- Acceptable uses: after RBAC validation for writes, token minting, cleanup + +```bash +grep -rnE "DynamicClient\.|K8sClient\." components/backend/handlers/ --include="*.go" | grep -v "_test.go" +``` + +Cross-reference each match against the decision tree in `K8S_CLIENT_PATTERNS.md`. + +### B3: No direct type assertions on unstructured (Critical) + +```bash +grep -rnE 'Object\["[^"]+"\]\.\(' components/backend/ --include="*.go" | grep -v "_test.go" +``` + +Must use `unstructured.NestedMap`, `unstructured.NestedString`, etc. + +### B4: No silent error handling (Critical) + +Look for empty error handling blocks: +```bash +rg -nUP 'if err != nil \{\s*\n\s*\}' --type go --glob '!*_test.go' components/backend/ +``` + +Also manually inspect `if err != nil` blocks for cases where the body only contains a comment (no actual handling). + +### B5: No internal error exposure in API responses (Major) + +```bash +grep -rn 'gin.H{"error":.*fmt\.Sprintf\|gin.H{"error":.*err\.' components/backend/handlers/ --include="*.go" | grep -v "_test.go" +``` + +API responses should use generic messages. Detailed errors go to logs. + +### B6: No tokens in logs (Blocker) + +```bash +grep -rn 'log.*[Tt]oken\b\|log.*[Ss]ecret\b' components/backend/ --include="*.go" | grep -v "len(token)\|_test.go" +``` + +Use `len(token)` for logging, never the token value itself. + +### B7: Error wrapping with %w (Major) + +```bash +grep -rnP 'fmt.Errorf.*%v.*\berr\b' components/backend/ --include="*.go" | grep -v "_test.go" +``` + +Should use `%w` for error wrapping to preserve the error chain. + +### B8: Files under 400 lines (Minor) + +```bash +find components/backend/handlers/ -name "*.go" -not -name "*_test.go" -print0 | xargs -0 wc -l | sort -rn +``` + +Flag files exceeding 400 lines. Note: `sessions.go` is a known exception. + +## Output Format + +```markdown +# Backend Review + +## Summary +[1-2 sentence overview] + +## Findings + +### Blocker +[Must fix — or "None"] + +### Critical +[Should fix — or "None"] + +### Major +[Important — or "None"] + +### Minor +[Nice-to-have — or "None"] + +## Score +[X/8 checks passed] +``` + +Each finding includes: file:line, problem description, convention violated, suggested fix. diff --git a/.claude/agents/convention-eval.md b/.claude/agents/convention-eval.md new file mode 100644 index 000000000..77c1af78d --- /dev/null +++ b/.claude/agents/convention-eval.md @@ -0,0 +1,130 @@ +--- +name: convention-eval +description: > + Runs all convention checks across the full codebase and produces a scored + alignment report. Dispatched by the /align skill. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Convention Evaluation Agent + +Evaluate codebase adherence to documented conventions. Produce a scored report. + +## Context Files + +Load these before running checks: + +1. `components/backend/DEVELOPMENT.md` +2. `components/backend/ERROR_PATTERNS.md` +3. `components/backend/K8S_CLIENT_PATTERNS.md` +4. `components/frontend/DEVELOPMENT.md` +5. `components/frontend/REACT_QUERY_PATTERNS.md` +6. `components/operator/DEVELOPMENT.md` +7. `docs/security-standards.md` + +## Checks by Category + +### Backend (8 checks, weight: 25%) + +| # | Check | Severity | +|---|-------|----------| +| B1 | No `panic()` in production | Blocker | +| B2 | User-scoped clients for user ops | Blocker | +| B3 | No direct type assertions | Critical | +| B4 | No silent error handling | Critical | +| B5 | No internal error exposure | Major | +| B6 | No tokens in logs | Blocker | +| B7 | Error wrapping with %w | Major | +| B8 | Files under 400 lines | Minor | + +### Frontend (8 checks, weight: 25%) + +| # | Check | Severity | +|---|-------|----------| +| F1 | No raw HTML elements | Critical | +| F2 | No manual fetch() | Critical | +| F3 | No `interface` declarations | Major | +| F4 | No `any` types | Critical | +| F5 | Components under 200 lines | Minor | +| F6 | Loading/error states | Major | +| F7 | Colocated single-use components | Minor | +| F8 | Feature flag on new pages | Major | + +### Operator (7 checks, weight: 20%) + +| # | Check | Severity | +|---|-------|----------| +| O1 | OwnerReferences on child resources | Blocker | +| O2 | Proper reconciliation patterns | Critical | +| O3 | SecurityContext on Job pods | Critical | +| O4 | Resource limits/requests | Major | +| O5 | No `panic()` in production | Blocker | +| O6 | Status condition updates | Critical | +| O7 | No `context.TODO()` | Minor | + +### Runner (4 checks, weight: 10%) + +| # | Check | Severity | +|---|-------|----------| +| R1 | Proper async patterns | Major | +| R2 | Credential handling | Blocker | +| R3 | Error propagation | Critical | +| R4 | No hardcoded secrets | Blocker | + +### Security (7 checks, weight: 20%) + +| # | Check | Severity | +|---|-------|----------| +| S1 | User token for user ops | Blocker | +| S2 | RBAC before resource access | Critical | +| S3 | Token redaction | Blocker | +| S4 | Input validation | Major | +| S5 | SecurityContext on pods | Critical | +| S6 | OwnerReferences on Secrets | Critical | +| S7 | No hardcoded credentials | Blocker | + +## Scoring + +- Each check: Pass (1) or Fail (0) +- Category score: passes / total +- Overall score: + - Full scope: weighted average across all categories + - Scoped runs: renormalize weights to selected categories (e.g., backend-only uses 100% backend weight) + +## Output Format + +```markdown +# Convention Alignment Report + +**Scope:** [full | backend | frontend | ...] +**Date:** [ISO date] +**Overall Score:** [X%] + +## Category Scores + +| Category | Score | Pass | Fail | Blockers | +|----------|-------|------|------|----------| +| Backend | X/8 | X | X | X | +| Frontend | X/8 | X | X | X | +| Operator | X/7 | X | X | X | +| Runner | X/4 | X | X | X | +| Security | X/7 | X | X | X | + +## Failures + +### Blockers +[List with file:line references] + +### Critical +[List with file:line references] + +### Major / Minor +[List] + +## Recommendations +[Top 3 priorities to improve alignment] +``` diff --git a/.claude/agents/frontend-review.md b/.claude/agents/frontend-review.md new file mode 100644 index 000000000..4edadb1f7 --- /dev/null +++ b/.claude/agents/frontend-review.md @@ -0,0 +1,116 @@ +--- +name: frontend-review +description: > + Review frontend TypeScript/React code for convention violations. Use after + modifying files under components/frontend/src/. Checks for raw HTML elements, + manual fetch, any types, interface usage, component size, and missing states. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Frontend Review Agent + +Review frontend code against documented conventions. + +## Context + +Load these files before running checks: + +1. `components/frontend/DEVELOPMENT.md` +2. `components/frontend/REACT_QUERY_PATTERNS.md` +3. `components/frontend/DESIGN_GUIDELINES.md` (if it exists) + +## Checks + +### F1: No raw HTML elements (Critical) + +```bash +grep -rn "" components/frontend/src/ --include="*.ts" --include="*.tsx" | grep -v "node_modules\|\.d\.ts" +``` + +Use proper types, `unknown`, or generic constraints. + +### F5: Components under 200 lines (Minor) + +```bash +find components/frontend/src/ -name "*.tsx" -print0 | xargs -0 wc -l | sort -rn | head -20 +``` + +Flag components exceeding 200 lines. Consider splitting. + +### F6: Loading/error/empty states (Major) + +For components using `useQuery`: +- Must reference `isLoading` or `isPending` +- Must reference `error` +- Should handle empty data + +```bash +grep -rl "useQuery\|useSessions\|useSession" \ + components/frontend/src/app/ components/frontend/src/components/ --include="*.tsx" +``` + +Then check each file for `isLoading\|isPending` and `error` references. + +### F7: Single-use components in shared directories (Minor) + +Check `components/frontend/src/components/` for components imported only once. These should be co-located with their page in `_components/`. + +### F8: Feature flag on new pages (Major) + +New `page.tsx` files should reference `useWorkspaceFlag` or `useFlag` for feature gating. + +## Output Format + +```markdown +# Frontend Review + +## Summary +[1-2 sentence overview] + +## Findings + +### Blocker +[Must fix — or "None"] + +### Critical +[Should fix — or "None"] + +### Major +[Important — or "None"] + +### Minor +[Nice-to-have — or "None"] + +## Score +[X/8 checks passed] +``` + +Each finding includes: file:line, problem description, convention violated, suggested fix. diff --git a/.claude/agents/operator-review.md b/.claude/agents/operator-review.md new file mode 100644 index 000000000..001dde89e --- /dev/null +++ b/.claude/agents/operator-review.md @@ -0,0 +1,102 @@ +--- +name: operator-review +description: > + Review Kubernetes operator code for convention violations. Use after modifying + files under components/operator/. Checks for OwnerReferences, SecurityContext, + reconciliation patterns, resource limits, and panic usage. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Operator Review Agent + +Review operator Go code against documented conventions. + +## Context + +Load these files before running checks: + +1. `components/operator/DEVELOPMENT.md` +2. `components/backend/K8S_CLIENT_PATTERNS.md` +3. `components/backend/ERROR_PATTERNS.md` + +## Checks + +### O1: OwnerReferences on child resources (Blocker) + +```bash +grep -rn "Job\|Secret\|PersistentVolumeClaim" components/operator/ --include="*.go" | grep -i "create" +``` + +Cross-reference each create call with `OwnerReferences` in the same function. See `DEVELOPMENT.md` for the required pattern. + +### O2: Proper reconciliation patterns (Critical) + +- `errors.IsNotFound` → return nil (resource deleted, don't retry) +- Transient errors → return error (triggers requeue with backoff) +- Terminal errors → update CR status to "Failed", return nil + +### O3: SecurityContext on Job pod specs (Critical) + +```bash +grep -rn "SecurityContext" components/operator/ --include="*.go" | grep -v "_test.go" +``` + +Required: `AllowPrivilegeEscalation: false`, `Capabilities.Drop: ["ALL"]` + +### O4: Resource limits/requests on containers (Major) + +```bash +grep -rn "Resources\|Limits\|Requests" components/operator/ --include="*.go" | grep -v "_test.go" +``` + +Job containers should have resource requirements set. + +### O5: No panic() in production (Blocker) + +```bash +grep -rn "panic(" components/operator/ --include="*.go" | grep -v "_test.go" +``` + +### O6: Status condition updates (Critical) + +Error paths must update the CR status to reflect the error. + +### O7: No context.TODO() (Minor) + +```bash +grep -rn "context.TODO()" components/operator/ --include="*.go" | grep -v "_test.go" +``` + +Use proper context propagation from the reconciliation request. + +## Output Format + +```markdown +# Operator Review + +## Summary +[1-2 sentence overview] + +## Findings + +### Blocker +[Must fix — or "None"] + +### Critical +[Should fix — or "None"] + +### Major +[Important — or "None"] + +### Minor +[Nice-to-have — or "None"] + +## Score +[X/7 checks passed] +``` + +Each finding includes: file:line, problem description, convention violated, suggested fix. diff --git a/.claude/agents/runner-review.md b/.claude/agents/runner-review.md new file mode 100644 index 000000000..a2752739f --- /dev/null +++ b/.claude/agents/runner-review.md @@ -0,0 +1,68 @@ +--- +name: runner-review +description: > + Review Python runner code for convention violations. Use after modifying files + under components/runners/ambient-runner/. Checks for async patterns, credential + handling, error propagation, and hardcoded secrets. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Runner Review Agent + +Review runner Python code against documented conventions. + +## Context + +No runner-specific DEVELOPMENT.md exists yet. Review against general Python best practices and the patterns visible in `components/runners/ambient-runner/src/`. + +## Checks + +### R1: Proper async patterns (Major) + +No blocking calls (`open()`, `requests.`, `time.sleep()`) inside async functions. Use `aiofiles`, `httpx`, `asyncio.sleep()`. + +### R2: Credential handling (Blocker) + +No hardcoded credential values. Credentials loaded from environment or K8s secrets. No credentials in log statements. + +### R3: Error propagation from subprocess (Critical) + +Subprocess calls must propagate errors, not swallow them. Return codes checked, errors raised or logged with context. + +### R4: No hardcoded secrets or API keys (Blocker) + +```bash +grep -rn "sk-\|api_key=\|password=" components/runners/ambient-runner/ --include="*.py" | grep -v "_test\|test_\|example\|mock" +``` + +## Output Format + +```markdown +# Runner Review + +## Summary +[1-2 sentence overview] + +## Findings + +### Blocker +[Must fix — or "None"] + +### Critical +[Should fix — or "None"] + +### Major +[Important — or "None"] + +### Minor +[Nice-to-have — or "None"] + +## Score +[X/4 checks passed] +``` + +Each finding includes: file:line, problem description, convention violated, suggested fix. diff --git a/.claude/agents/security-review.md b/.claude/agents/security-review.md new file mode 100644 index 000000000..44262eef1 --- /dev/null +++ b/.claude/agents/security-review.md @@ -0,0 +1,84 @@ +--- +name: security-review +description: > + Cross-cutting security review for code touching auth, RBAC, tokens, or + container specs. Use before committing any code that handles authentication, + authorization, credentials, or security contexts. +tools: + - Read + - Grep + - Glob + - Bash +--- + +# Security Review Agent + +Cross-cutting security review against documented security standards. + +## Context + +Load these files before running checks: + +1. `docs/security-standards.md` + +## Checks + +### S1: User token for user operations (Blocker) + +Handlers must use `GetK8sClientsForRequest(c)` for user-initiated operations. Service account only for privileged operations after RBAC validation. + +### S2: RBAC before resource access (Critical) + +`SelfSubjectAccessReview` (or equivalent authz check) should precede user-scoped resource access. + +### S3: Token redaction in all outputs (Blocker) + +No tokens in logs, errors, or API responses. Use `len(token)` for logging. + +### S4: Input validation (Major) + +DNS labels validated, URLs parsed, no raw newlines for log injection. + +### S5: SecurityContext on pods (Critical) + +`AllowPrivilegeEscalation: false`, `Capabilities.Drop: ["ALL"]`. + +### S6: OwnerReferences on Secrets (Critical) + +Secrets created by the platform must have OwnerReferences for cleanup. + +### S7: No hardcoded credentials (Blocker) + +```bash +grep -rn 'password.*=.*"\|api.key.*=.*"\|secret.*=.*"\|token.*=.*"' components/ --include="*.go" --include="*.py" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.yaml" --include="*.yml" | grep -v "_test\|test_\|mock\|example\|fixture\|\.d\.ts" +``` + +## Output Format + +```markdown +# Security Review + +## Summary +[1-2 sentence overview with overall risk assessment] + +## Findings + +### Blocker +[Must fix — security vulnerabilities] + +### Critical +[Should fix — security weaknesses] + +### Major +[Important — defense-in-depth gaps] + +### Minor +[Nice-to-have — or "None"] + +## Score +[X/7 checks passed] +``` + +Each finding includes: file:line, problem description, convention violated, suggested fix. + +**Security reviews should err on the side of flagging potential issues.** False positives are acceptable; false negatives are not. diff --git a/.claude/commands/acp-compile.md b/.claude/commands/acp-compile.md deleted file mode 100644 index 779a4b5c9..000000000 --- a/.claude/commands/acp-compile.md +++ /dev/null @@ -1,51 +0,0 @@ ---- -description: Submit a plan file to ACP for execution as an AgenticSession on the cluster. ---- - -## User Input - -```text -$ARGUMENTS -``` - -## Steps - -1. **Locate the plan file**: - - If `$ARGUMENTS` is a non-empty file path, use that file - - If `$ARGUMENTS` is empty, find the most recently modified `.md` file in `.claude/plans/` - - Read the plan file contents — this becomes the `initial_prompt` - - If no plan file is found, stop and ask the user to provide a path - -2. **Get repository info**: - - Run `git remote get-url origin` to get the repo URL - - Run `git branch --show-current` to get the current branch - -3. **Build the prompt**: - - Prepend a context header to the plan contents: - ``` - You are executing a plan that was compiled and submitted to ACP. - Repository: {repo_url} - Branch: {branch} - - --- - - {plan_file_contents} - ``` - -4. **Create the session**: - - Call the `acp_create_session` MCP tool with: - - `initial_prompt`: the assembled prompt from step 3 - - `repos`: `["{repo_url}"]` - - `display_name`: `"Compiled: {plan_file_basename}"` - - `interactive`: `false` - - `timeout`: `1800` - - If the tool returns `created: false`, print the error message and stop - -5. **Report results**: - - Print the session name and project from the response - - Print follow-up commands: - ``` - Check status: acp_list_sessions(project="...") - View logs: acp_get_session_logs(project="...", session="...") - ``` - - Do NOT wait for the session to complete — return immediately diff --git a/.claude/commands/cypress-demo.md b/.claude/commands/cypress-demo.md deleted file mode 100644 index 14a6edca1..000000000 --- a/.claude/commands/cypress-demo.md +++ /dev/null @@ -1,285 +0,0 @@ ---- -description: Create a Cypress-based video demo for a feature branch with cursor, click effects, and captions. ---- - -# /cypress-demo Command - -Create a polished Cypress demo test that records a human-paced video walkthrough of UI features on the current branch. - -## Usage - -``` -/cypress-demo # Auto-detect features from branch diff -/cypress-demo chat input refactoring # Describe what to demo -``` - -## User Input - -```text -$ARGUMENTS -``` - -## Behavior - -When invoked, Claude will create a Cypress test file in `e2e/cypress/e2e/` that records a demo video with: - -- **Synthetic cursor** (white dot) that glides smoothly to each interaction target -- **Click ripple** (blue expanding ring) on every click action -- **Caption bar** (compact dark bar at top of viewport) describing each step -- **Human-paced timing** so every action is clearly visible -- **`--no-runner-ui`** flag to exclude the Cypress sidebar from the recording - -### 1. Determine what to demo - -- If `$ARGUMENTS` is provided, use it as the demo description -- If empty, run `git diff main..HEAD --stat` to identify changed files and infer features -- Read the changed/new component files to understand what UI to showcase -- Ask the user if clarification is needed on which features to highlight - -### 2. Check prerequisites - -- Verify `e2e/.env.test` or `e2e/.env` exists with `TEST_TOKEN` -- Check if `ANTHROPIC_API_KEY` is available (needed if the demo requires Running state for workflows, agents, or commands) -- Verify the kind cluster is up: `kubectl get pods -n ambient-code` -- Verify the frontend is accessible: `curl -s -o /dev/null -w "%{http_code}" http://localhost` -- If the frontend was rebuilt from this branch, verify imagePullPolicy is `Never` or `IfNotPresent` - -### 3. Create the demo test file - -Create `e2e/cypress/e2e/-demo.cy.ts` using the template structure below. - -#### Required helpers (copy into every demo file) - -```typescript -// Timing constants — adjust per demo, aim for ~2 min total video -const LONG = 3200 // hold on important visuals -const PAUSE = 2400 // standard pause between actions -const SHORT = 1600 // brief pause after small actions -const TYPE_DELAY = 80 // ms per keystroke - -// Target first element (session page renders desktop + mobile layout) -const chatInput = () => cy.get('textarea[placeholder*="message"]').first() - -// Caption: compact bar at TOP of viewport -function caption(text: string) { - cy.document().then((doc) => { - let el = doc.getElementById('demo-caption') - if (!el) { - el = doc.createElement('div') - el.id = 'demo-caption' - el.style.cssText = [ - 'position:fixed', 'top:0', 'left:0', 'right:0', 'z-index:99998', - 'background:rgba(0,0,0,0.80)', 'color:#fff', 'font-size:14px', - 'font-weight:500', 'font-family:system-ui,-apple-system,sans-serif', - 'padding:6px 20px', 'text-align:center', 'letter-spacing:0.2px', - 'pointer-events:none', 'transition:opacity 0.4s ease', - ].join(';') - doc.body.appendChild(el) - } - el.textContent = text - el.style.opacity = '1' - }) -} - -function clearCaption() { - cy.document().then((doc) => { - const el = doc.getElementById('demo-caption') - if (el) el.style.opacity = '0' - }) -} - -// Synthetic cursor + click ripple -function initCursor() { - cy.document().then((doc) => { - if (doc.getElementById('demo-cursor')) return - const cursor = doc.createElement('div') - cursor.id = 'demo-cursor' - cursor.style.cssText = [ - 'position:fixed', 'z-index:99999', 'pointer-events:none', - 'width:20px', 'height:20px', 'border-radius:50%', - 'background:rgba(255,255,255,0.9)', 'border:2px solid #333', - 'box-shadow:0 0 6px rgba(0,0,0,0.4)', - 'transform:translate(-50%,-50%)', - 'transition:left 0.5s cubic-bezier(0.25,0.1,0.25,1), top 0.5s cubic-bezier(0.25,0.1,0.25,1)', - 'left:-40px', 'top:-40px', - ].join(';') - doc.body.appendChild(cursor) - const ripple = doc.createElement('div') - ripple.id = 'demo-ripple' - ripple.style.cssText = [ - 'position:fixed', 'z-index:99999', 'pointer-events:none', - 'width:40px', 'height:40px', 'border-radius:50%', - 'border:3px solid rgba(59,130,246,0.8)', - 'transform:translate(-50%,-50%) scale(0)', - 'opacity:0', 'left:-40px', 'top:-40px', - ].join(';') - doc.body.appendChild(ripple) - const style = doc.createElement('style') - style.textContent = ` - @keyframes demo-ripple-anim { - 0% { transform: translate(-50%,-50%) scale(0); opacity: 1; } - 100% { transform: translate(-50%,-50%) scale(2.5); opacity: 0; } - } - ` - doc.head.appendChild(style) - }) -} - -// Move cursor smoothly to element center -function moveTo(selector: string, options?: { first?: boolean }) { - const chain = options?.first ? cy.get(selector).first() : cy.get(selector) - chain.then(($el) => { - const rect = $el[0].getBoundingClientRect() - cy.document().then((doc) => { - const cursor = doc.getElementById('demo-cursor') - if (cursor) { - cursor.style.left = `${rect.left + rect.width / 2}px` - cursor.style.top = `${rect.top + rect.height / 2}px` - } - }) - cy.wait(600) - }) -} - -function moveToText(text: string, tag?: string) { - const chain = tag ? cy.contains(tag, text) : cy.contains(text) - chain.then(($el) => { - const rect = $el[0].getBoundingClientRect() - cy.document().then((doc) => { - const cursor = doc.getElementById('demo-cursor') - if (cursor) { - cursor.style.left = `${rect.left + rect.width / 2}px` - cursor.style.top = `${rect.top + rect.height / 2}px` - } - }) - cy.wait(600) - }) -} - -function moveToEl($el: JQuery) { - const rect = $el[0].getBoundingClientRect() - cy.document().then((doc) => { - const cursor = doc.getElementById('demo-cursor') - if (cursor) { - cursor.style.left = `${rect.left + rect.width / 2}px` - cursor.style.top = `${rect.top + rect.height / 2}px` - } - }) - cy.wait(600) -} - -function clickEffect() { - cy.document().then((doc) => { - const cursor = doc.getElementById('demo-cursor') - const ripple = doc.getElementById('demo-ripple') - if (cursor && ripple) { - ripple.style.left = cursor.style.left - ripple.style.top = cursor.style.top - ripple.style.animation = 'none' - void ripple.offsetHeight - ripple.style.animation = 'demo-ripple-anim 0.5s ease-out forwards' - } - }) -} - -// Compound: move → ripple → click -function cursorClickText(text: string, tag?: string, options?: { force?: boolean }) { - moveToText(text, tag) - clickEffect() - const chain = tag ? cy.contains(tag, text) : cy.contains(text) - chain.click({ force: options?.force }) -} -``` - -#### Test structure - -```typescript -describe(' Demo', () => { - const workspaceName = `demo-${Date.now()}` - - // ... helpers above ... - - Cypress.on('uncaught:exception', (err) => { - if (err.message.includes('Minified React error') || err.message.includes('Hydration')) { - return false - } - return true - }) - - after(() => { - if (!Cypress.env('KEEP_WORKSPACES')) { - const token = Cypress.env('TEST_TOKEN') - cy.request({ - method: 'DELETE', - url: `/api/projects/${workspaceName}`, - headers: { Authorization: `Bearer ${token}` }, - failOnStatusCode: false, - }) - } - }) - - it('demonstrates ', () => { - // ... single continuous test for one video file ... - }) -}) -``` - -### 4. Key patterns to follow - -| Pattern | Rule | -|---------|------| -| **Dual layout** | Session page renders desktop + mobile. Always use `.first()` on element queries that match both | -| **Caption scoping** | When asserting page content with `cy.contains`, scope to a tag (e.g., `cy.contains('p', 'text')`) to avoid matching the caption overlay | -| **Workspace setup** | Create workspace → poll `/api/projects/:name` until 200 → configure runner-secrets if API key needed | -| **Running state** | If demo needs agents/commands, configure `ANTHROPIC_API_KEY` via runner-secrets, select a workflow, and wait for `textarea[placeholder*="attach"]` (Running placeholder) with 180s timeout | -| **Operator pull policy** | For kind clusters, set `IMAGE_PULL_POLICY=IfNotPresent` on the operator to avoid re-pulling the 879MB runner image every session | -| **File attachment** | Use `cy.get('input[type="file"]').first().selectFile({...}, { force: true })` with a `Cypress.Buffer` — no real file needed | -| **Caption position** | Always `top:0` — bottom position obscures the chat toolbar | -| **Timing** | Aim for ~2 min total. LONG=3.2s, PAUSE=2.4s, SHORT=1.6s, TYPE_DELAY=80ms. Adjust if video feels too fast or slow | -| **Video output** | `e2e/cypress/videos/.cy.ts.mp4` at 2560x1440 (Retina) | - -### 5. Run the demo - -```bash -cd e2e -npx cypress run --no-runner-ui --spec "cypress/e2e/-demo.cy.ts" -``` - -- Verify the video plays at human-readable speed -- Check that captions don't overlap important UI elements -- Re-run and iterate if needed — adjust timing or add/remove steps - -### 6. Commit and push - -- Commit the demo test file and any config changes (`cypress.config.ts`) -- Push to the current branch -- If a PR exists, note the demo in the PR description - -## Reference implementation - -See `e2e/cypress/e2e/chatbox-demo.cy.ts` for a complete working example that demonstrates: -- Workspace creation, session creation -- WelcomeExperience (streaming text, workflow cards) -- Workflow selection ("Fix a bug") with Running state wait -- File attachments (AttachmentPreview) -- Autocomplete popovers (@agents, /commands) with real workflow data -- Message queueing (QueuedMessageBubble) -- Message history and queued message editing -- Settings dropdown -- Breadcrumb navigation - -## Config requirements - -`e2e/cypress.config.ts` must load `.env.test` and wire `TEST_TOKEN`: - -```typescript -// Load env files: .env.local > .env > .env.test -const envFiles = ['.env.local', '.env', '.env.test'].map(f => path.resolve(__dirname, f)) -for (const envFile of envFiles) { - if (fs.existsSync(envFile)) { dotenv.config({ path: envFile }) } -} - -// In setupNodeEvents: -config.env.TEST_TOKEN = process.env.CYPRESS_TEST_TOKEN || process.env.TEST_TOKEN || config.env.TEST_TOKEN || '' -config.env.ANTHROPIC_API_KEY = process.env.CYPRESS_ANTHROPIC_API_KEY || process.env.ANTHROPIC_API_KEY || '' -``` diff --git a/.claude/commands/speckit.analyze.md b/.claude/commands/speckit.analyze.md deleted file mode 100644 index 98b04b0c8..000000000 --- a/.claude/commands/speckit.analyze.md +++ /dev/null @@ -1,184 +0,0 @@ ---- -description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation. ---- - -## User Input - -```text -$ARGUMENTS -``` - -You **MUST** consider the user input before proceeding (if not empty). - -## Goal - -Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`. - -## Operating Constraints - -**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually). - -**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`. - -## Execution Steps - -### 1. Initialize Analysis Context - -Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths: - -- SPEC = FEATURE_DIR/spec.md -- PLAN = FEATURE_DIR/plan.md -- TASKS = FEATURE_DIR/tasks.md - -Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command). -For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). - -### 2. Load Artifacts (Progressive Disclosure) - -Load only the minimal necessary context from each artifact: - -**From spec.md:** - -- Overview/Context -- Functional Requirements -- Non-Functional Requirements -- User Stories -- Edge Cases (if present) - -**From plan.md:** - -- Architecture/stack choices -- Data Model references -- Phases -- Technical constraints - -**From tasks.md:** - -- Task IDs -- Descriptions -- Phase grouping -- Parallel markers [P] -- Referenced file paths - -**From constitution:** - -- Load `.specify/memory/constitution.md` for principle validation - -### 3. Build Semantic Models - -Create internal representations (do not include raw artifacts in output): - -- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" → `user-can-upload-file`) -- **User story/action inventory**: Discrete user actions with acceptance criteria -- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases) -- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements - -### 4. Detection Passes (Token-Efficient Analysis) - -Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary. - -#### A. Duplication Detection - -- Identify near-duplicate requirements -- Mark lower-quality phrasing for consolidation - -#### B. Ambiguity Detection - -- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria -- Flag unresolved placeholders (TODO, TKTK, ???, ``, etc.) - -#### C. Underspecification - -- Requirements with verbs but missing object or measurable outcome -- User stories missing acceptance criteria alignment -- Tasks referencing files or components not defined in spec/plan - -#### D. Constitution Alignment - -- Any requirement or plan element conflicting with a MUST principle -- Missing mandated sections or quality gates from constitution - -#### E. Coverage Gaps - -- Requirements with zero associated tasks -- Tasks with no mapped requirement/story -- Non-functional requirements not reflected in tasks (e.g., performance, security) - -#### F. Inconsistency - -- Terminology drift (same concept named differently across files) -- Data entities referenced in plan but absent in spec (or vice versa) -- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note) -- Conflicting requirements (e.g., one requires Next.js while other specifies Vue) - -### 5. Severity Assignment - -Use this heuristic to prioritize findings: - -- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality -- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion -- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case -- **LOW**: Style/wording improvements, minor redundancy not affecting execution order - -### 6. Produce Compact Analysis Report - -Output a Markdown report (no file writes) with the following structure: - -## Specification Analysis Report - -| ID | Category | Severity | Location(s) | Summary | Recommendation | -|----|----------|----------|-------------|---------|----------------| -| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version | - -(Add one row per finding; generate stable IDs prefixed by category initial.) - -**Coverage Summary Table:** - -| Requirement Key | Has Task? | Task IDs | Notes | -|-----------------|-----------|----------|-------| - -**Constitution Alignment Issues:** (if any) - -**Unmapped Tasks:** (if any) - -**Metrics:** - -- Total Requirements -- Total Tasks -- Coverage % (requirements with >=1 task) -- Ambiguity Count -- Duplication Count -- Critical Issues Count - -### 7. Provide Next Actions - -At end of report, output a concise Next Actions block: - -- If CRITICAL issues exist: Recommend resolving before `/speckit.implement` -- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions -- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'" - -### 8. Offer Remediation - -Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.) - -## Operating Principles - -### Context Efficiency - -- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation -- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis -- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow -- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts - -### Analysis Guidelines - -- **NEVER modify files** (this is read-only analysis) -- **NEVER hallucinate missing sections** (if absent, report them accurately) -- **Prioritize constitution violations** (these are always CRITICAL) -- **Use examples over exhaustive rules** (cite specific instances, not generic patterns) -- **Report zero issues gracefully** (emit success report with coverage statistics) - -## Context - -$ARGUMENTS diff --git a/.claude/commands/speckit.checklist.md b/.claude/commands/speckit.checklist.md deleted file mode 100644 index 970e6c9ed..000000000 --- a/.claude/commands/speckit.checklist.md +++ /dev/null @@ -1,294 +0,0 @@ ---- -description: Generate a custom checklist for the current feature based on user requirements. ---- - -## Checklist Purpose: "Unit Tests for English" - -**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain. - -**NOT for verification/testing**: - -- ❌ NOT "Verify the button clicks correctly" -- ❌ NOT "Test error handling works" -- ❌ NOT "Confirm the API returns 200" -- ❌ NOT checking if code/implementation matches the spec - -**FOR requirements quality validation**: - -- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness) -- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity) -- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency) -- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage) -- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases) - -**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works. - -## User Input - -```text -$ARGUMENTS -``` - -You **MUST** consider the user input before proceeding (if not empty). - -## Execution Steps - -1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list. - - All file paths must be absolute. - - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). - -2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST: - - Be generated from the user's phrasing + extracted signals from spec/plan/tasks - - Only ask about information that materially changes checklist content - - Be skipped individually if already unambiguous in `$ARGUMENTS` - - Prefer precision over breadth - - Generation algorithm: - 1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts"). - 2. Cluster signals into candidate focus areas (max 4) ranked by relevance. - 3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit. - 4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria. - 5. Formulate questions chosen from these archetypes: - - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?") - - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?") - - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?") - - Audience framing (e.g., "Will this be used by the author only or peers during PR review?") - - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?") - - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?") - - Question formatting rules: - - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters - - Limit to A–E options maximum; omit table if a free-form answer is clearer - - Never ask the user to restate what they already said - - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope." - - Defaults when interaction impossible: - - Depth: Standard - - Audience: Reviewer (PR) if code-related; Author otherwise - - Focus: Top 2 relevance clusters - - Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more. - -3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers: - - Derive checklist theme (e.g., security, review, deploy, ux) - - Consolidate explicit must-have items mentioned by user - - Map focus selections to category scaffolding - - Infer any missing context from spec/plan/tasks (do NOT hallucinate) - -4. **Load feature context**: Read from FEATURE_DIR: - - spec.md: Feature requirements and scope - - plan.md (if exists): Technical details, dependencies - - tasks.md (if exists): Implementation tasks - - **Context Loading Strategy**: - - Load only necessary portions relevant to active focus areas (avoid full-file dumping) - - Prefer summarizing long sections into concise scenario/requirement bullets - - Use progressive disclosure: add follow-on retrieval only if gaps detected - - If source docs are large, generate interim summary items instead of embedding raw text - -5. **Generate checklist** - Create "Unit Tests for Requirements": - - Create `FEATURE_DIR/checklists/` directory if it doesn't exist - - Generate unique checklist filename: - - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`) - - Format: `[domain].md` - - If file exists, append to existing file - - Number items sequentially starting from CHK001 - - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists) - - **CORE PRINCIPLE - Test the Requirements, Not the Implementation**: - Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for: - - **Completeness**: Are all necessary requirements present? - - **Clarity**: Are requirements unambiguous and specific? - - **Consistency**: Do requirements align with each other? - - **Measurability**: Can requirements be objectively verified? - - **Coverage**: Are all scenarios/edge cases addressed? - - **Category Structure** - Group items by requirement quality dimensions: - - **Requirement Completeness** (Are all necessary requirements documented?) - - **Requirement Clarity** (Are requirements specific and unambiguous?) - - **Requirement Consistency** (Do requirements align without conflicts?) - - **Acceptance Criteria Quality** (Are success criteria measurable?) - - **Scenario Coverage** (Are all flows/cases addressed?) - - **Edge Case Coverage** (Are boundary conditions defined?) - - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?) - - **Dependencies & Assumptions** (Are they documented and validated?) - - **Ambiguities & Conflicts** (What needs clarification?) - - **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**: - - ❌ **WRONG** (Testing implementation): - - "Verify landing page displays 3 episode cards" - - "Test hover states work on desktop" - - "Confirm logo click navigates home" - - ✅ **CORRECT** (Testing requirements quality): - - "Are the exact number and layout of featured episodes specified?" [Completeness] - - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity] - - "Are hover state requirements consistent across all interactive elements?" [Consistency] - - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage] - - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases] - - "Are loading states defined for asynchronous episode data?" [Completeness] - - "Does the spec define visual hierarchy for competing UI elements?" [Clarity] - - **ITEM STRUCTURE**: - Each item should follow this pattern: - - Question format asking about requirement quality - - Focus on what's WRITTEN (or not written) in the spec/plan - - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.] - - Reference spec section `[Spec §X.Y]` when checking existing requirements - - Use `[Gap]` marker when checking for missing requirements - - **EXAMPLES BY QUALITY DIMENSION**: - - Completeness: - - "Are error handling requirements defined for all API failure modes? [Gap]" - - "Are accessibility requirements specified for all interactive elements? [Completeness]" - - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]" - - Clarity: - - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]" - - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]" - - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]" - - Consistency: - - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]" - - "Are card component requirements consistent between landing and detail pages? [Consistency]" - - Coverage: - - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]" - - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]" - - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]" - - Measurability: - - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]" - - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]" - - **Scenario Classification & Coverage** (Requirements Quality Focus): - - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios - - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?" - - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]" - - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]" - - **Traceability Requirements**: - - MINIMUM: ≥80% of items MUST include at least one traceability reference - - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]` - - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]" - - **Surface & Resolve Issues** (Requirements Quality Problems): - Ask questions about the requirements themselves: - - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]" - - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]" - - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]" - - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]" - - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]" - - **Content Consolidation**: - - Soft cap: If raw candidate items > 40, prioritize by risk/impact - - Merge near-duplicates checking the same requirement aspect - - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]" - - **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test: - - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior - - ❌ References to code execution, user actions, system behavior - - ❌ "Displays correctly", "works properly", "functions as expected" - - ❌ "Click", "navigate", "render", "load", "execute" - - ❌ Test cases, test plans, QA procedures - - ❌ Implementation details (frameworks, APIs, algorithms) - - **✅ REQUIRED PATTERNS** - These test requirements quality: - - ✅ "Are [requirement type] defined/specified/documented for [scenario]?" - - ✅ "Is [vague term] quantified/clarified with specific criteria?" - - ✅ "Are requirements consistent between [section A] and [section B]?" - - ✅ "Can [requirement] be objectively measured/verified?" - - ✅ "Are [edge cases/scenarios] addressed in requirements?" - - ✅ "Does the spec define [missing aspect]?" - -6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### ` lines with globally incrementing IDs starting at CHK001. - -7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize: - - Focus areas selected - - Depth level - - Actor/timing - - Any explicit user-specified must-have items incorporated - -**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows: - -- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`) -- Simple, memorable filenames that indicate checklist purpose -- Easy identification and navigation in the `checklists/` folder - -To avoid clutter, use descriptive types and clean up obsolete checklists when done. - -## Example Checklist Types & Sample Items - -**UX Requirements Quality:** `ux.md` - -Sample items (testing the requirements, NOT the implementation): - -- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]" -- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]" -- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]" -- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]" -- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]" -- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]" - -**API Requirements Quality:** `api.md` - -Sample items: - -- "Are error response formats specified for all failure scenarios? [Completeness]" -- "Are rate limiting requirements quantified with specific thresholds? [Clarity]" -- "Are authentication requirements consistent across all endpoints? [Consistency]" -- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]" -- "Is versioning strategy documented in requirements? [Gap]" - -**Performance Requirements Quality:** `performance.md` - -Sample items: - -- "Are performance requirements quantified with specific metrics? [Clarity]" -- "Are performance targets defined for all critical user journeys? [Coverage]" -- "Are performance requirements under different load conditions specified? [Completeness]" -- "Can performance requirements be objectively measured? [Measurability]" -- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]" - -**Security Requirements Quality:** `security.md` - -Sample items: - -- "Are authentication requirements specified for all protected resources? [Coverage]" -- "Are data protection requirements defined for sensitive information? [Completeness]" -- "Is the threat model documented and requirements aligned to it? [Traceability]" -- "Are security requirements consistent with compliance obligations? [Consistency]" -- "Are security failure/breach response requirements defined? [Gap, Exception Flow]" - -## Anti-Examples: What NOT To Do - -**❌ WRONG - These test implementation, not requirements:** - -```markdown -- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001] -- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003] -- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010] -- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005] -``` - -**✅ CORRECT - These test requirements quality:** - -```markdown -- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001] -- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003] -- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010] -- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005] -- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap] -- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001] -``` - -**Key Differences:** - -- Wrong: Tests if the system works correctly -- Correct: Tests if the requirements are written correctly -- Wrong: Verification of behavior -- Correct: Validation of requirement quality -- Wrong: "Does it do X?" -- Correct: "Is X clearly specified?" diff --git a/.claude/commands/speckit.clarify.md b/.claude/commands/speckit.clarify.md deleted file mode 100644 index 8ff62c348..000000000 --- a/.claude/commands/speckit.clarify.md +++ /dev/null @@ -1,177 +0,0 @@ ---- -description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec. ---- - -## User Input - -```text -$ARGUMENTS -``` - -You **MUST** consider the user input before proceeding (if not empty). - -## Outline - -Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file. - -Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases. - -Execution steps: - -1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields: - - `FEATURE_DIR` - - `FEATURE_SPEC` - - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.) - - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment. - - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot"). - -2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked). - - Functional Scope & Behavior: - - Core user goals & success criteria - - Explicit out-of-scope declarations - - User roles / personas differentiation - - Domain & Data Model: - - Entities, attributes, relationships - - Identity & uniqueness rules - - Lifecycle/state transitions - - Data volume / scale assumptions - - Interaction & UX Flow: - - Critical user journeys / sequences - - Error/empty/loading states - - Accessibility or localization notes - - Non-Functional Quality Attributes: - - Performance (latency, throughput targets) - - Scalability (horizontal/vertical, limits) - - Reliability & availability (uptime, recovery expectations) - - Observability (logging, metrics, tracing signals) - - Security & privacy (authN/Z, data protection, threat assumptions) - - Compliance / regulatory constraints (if any) - - Integration & External Dependencies: - - External services/APIs and failure modes - - Data import/export formats - - Protocol/versioning assumptions - - Edge Cases & Failure Handling: - - Negative scenarios - - Rate limiting / throttling - - Conflict resolution (e.g., concurrent edits) - - Constraints & Tradeoffs: - - Technical constraints (language, storage, hosting) - - Explicit tradeoffs or rejected alternatives - - Terminology & Consistency: - - Canonical glossary terms - - Avoided synonyms / deprecated terms - - Completion Signals: - - Acceptance criteria testability - - Measurable Definition of Done style indicators - - Misc / Placeholders: - - TODO markers / unresolved decisions - - Ambiguous adjectives ("robust", "intuitive") lacking quantification - - For each category with Partial or Missing status, add a candidate question opportunity unless: - - Clarification would not materially change implementation or validation strategy - - Information is better deferred to planning phase (note internally) - -3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints: - - Maximum of 10 total questions across the whole session. - - Each question must be answerable with EITHER: - - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR - - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words"). - - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation. - - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved. - - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness). - - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests. - - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic. - -4. Sequential questioning loop (interactive): - - Present EXACTLY ONE question at a time. - - For multiple‑choice questions: - - **Analyze all options** and determine the **most suitable option** based on: - - Best practices for the project type - - Common patterns in similar implementations - - Risk reduction (security, performance, maintainability) - - Alignment with any explicit project goals or constraints visible in the spec - - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice). - - Format as: `**Recommended:** Option [X] - ` - - Then render all options as a Markdown table: - - | Option | Description | - |--------|-------------| - | A |