diff --git a/.claude/agents b/.claude/agents
deleted file mode 120000
index fd65c790e..000000000
--- a/.claude/agents
+++ /dev/null
@@ -1 +0,0 @@
-../agents
\ No newline at end of file
diff --git a/.claude/agents/backend-review.md b/.claude/agents/backend-review.md
new file mode 100644
index 000000000..e52475c91
--- /dev/null
+++ b/.claude/agents/backend-review.md
@@ -0,0 +1,123 @@
+---
+name: backend-review
+description: >
+  Review Go backend code for convention violations. Use after modifying files
+  under components/backend/. Checks for panic usage, service account misuse,
+  type assertion safety, error handling, token security, and file size.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Backend Review Agent
+
+Review backend Go code against documented conventions.
+
+## Context
+
+Load these files before running checks:
+
+1. `components/backend/DEVELOPMENT.md`
+2. `components/backend/ERROR_PATTERNS.md`
+3. `components/backend/K8S_CLIENT_PATTERNS.md`
+
+## Checks
+
+### B1: No panic() in production (Blocker)
+
+```bash
+grep -rn "panic(" components/backend/ --include="*.go" | grep -v "_test.go"
+```
+
+Any match is a Blocker. Production code must return `fmt.Errorf` with context.
+
+### B2: User-scoped clients for user operations (Blocker)
+
+In `components/backend/handlers/`:
+- `DynamicClient.Resource` or `K8sClient` used for List/Get operations should use `GetK8sClientsForRequest(c)` instead
+- Acceptable uses: after RBAC validation for writes, token minting, cleanup
+
+```bash
+grep -rnE "DynamicClient\.|K8sClient\." components/backend/handlers/ --include="*.go" | grep -v "_test.go"
+```
+
+Cross-reference each match against the decision tree in `K8S_CLIENT_PATTERNS.md`.
+
+### B3: No direct type assertions on unstructured (Critical)
+
+```bash
+grep -rnE 'Object\["[^"]+"\]\.\(' components/backend/ --include="*.go" | grep -v "_test.go"
+```
+
+Must use `unstructured.NestedMap`, `unstructured.NestedString`, etc.
+
+### B4: No silent error handling (Critical)
+
+Look for empty error handling blocks:
+```bash
+rg -nUP 'if err != nil \{\s*\n\s*\}' --type go --glob '!*_test.go' components/backend/
+```
+
+Also manually inspect `if err != nil` blocks for cases where the body only contains a comment (no actual handling).
+
+### B5: No internal error exposure in API responses (Major)
+
+```bash
+grep -rn 'gin.H{"error":.*fmt\.Sprintf\|gin.H{"error":.*err\.' components/backend/handlers/ --include="*.go" | grep -v "_test.go"
+```
+
+API responses should use generic messages. Detailed errors go to logs.
+
+### B6: No tokens in logs (Blocker)
+
+```bash
+grep -rn 'log.*[Tt]oken\b\|log.*[Ss]ecret\b' components/backend/ --include="*.go" | grep -v "len(token)\|_test.go"
+```
+
+Use `len(token)` for logging, never the token value itself.
+
+### B7: Error wrapping with %w (Major)
+
+```bash
+grep -rnP 'fmt.Errorf.*%v.*\berr\b' components/backend/ --include="*.go" | grep -v "_test.go"
+```
+
+Should use `%w` for error wrapping to preserve the error chain.
+
+### B8: Files under 400 lines (Minor)
+
+```bash
+find components/backend/handlers/ -name "*.go" -not -name "*_test.go" -print0 | xargs -0 wc -l | sort -rn
+```
+
+Flag files exceeding 400 lines. Note: `sessions.go` is a known exception.
+
+## Output Format
+
+```markdown
+# Backend Review
+
+## Summary
+[1-2 sentence overview]
+
+## Findings
+
+### Blocker
+[Must fix — or "None"]
+
+### Critical
+[Should fix — or "None"]
+
+### Major
+[Important — or "None"]
+
+### Minor
+[Nice-to-have — or "None"]
+
+## Score
+[X/8 checks passed]
+```
+
+Each finding includes: file:line, problem description, convention violated, suggested fix.
diff --git a/.claude/agents/convention-eval.md b/.claude/agents/convention-eval.md
new file mode 100644
index 000000000..77c1af78d
--- /dev/null
+++ b/.claude/agents/convention-eval.md
@@ -0,0 +1,130 @@
+---
+name: convention-eval
+description: >
+  Runs all convention checks across the full codebase and produces a scored
+  alignment report. Dispatched by the /align skill.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Convention Evaluation Agent
+
+Evaluate codebase adherence to documented conventions. Produce a scored report.
+
+## Context Files
+
+Load these before running checks:
+
+1. `components/backend/DEVELOPMENT.md`
+2. `components/backend/ERROR_PATTERNS.md`
+3. `components/backend/K8S_CLIENT_PATTERNS.md`
+4. `components/frontend/DEVELOPMENT.md`
+5. `components/frontend/REACT_QUERY_PATTERNS.md`
+6. `components/operator/DEVELOPMENT.md`
+7. `docs/security-standards.md`
+
+## Checks by Category
+
+### Backend (8 checks, weight: 25%)
+
+| # | Check | Severity |
+|---|-------|----------|
+| B1 | No `panic()` in production | Blocker |
+| B2 | User-scoped clients for user ops | Blocker |
+| B3 | No direct type assertions | Critical |
+| B4 | No silent error handling | Critical |
+| B5 | No internal error exposure | Major |
+| B6 | No tokens in logs | Blocker |
+| B7 | Error wrapping with %w | Major |
+| B8 | Files under 400 lines | Minor |
+
+### Frontend (8 checks, weight: 25%)
+
+| # | Check | Severity |
+|---|-------|----------|
+| F1 | No raw HTML elements | Critical |
+| F2 | No manual fetch() | Critical |
+| F3 | No `interface` declarations | Major |
+| F4 | No `any` types | Critical |
+| F5 | Components under 200 lines | Minor |
+| F6 | Loading/error states | Major |
+| F7 | Colocated single-use components | Minor |
+| F8 | Feature flag on new pages | Major |
+
+### Operator (7 checks, weight: 20%)
+
+| # | Check | Severity |
+|---|-------|----------|
+| O1 | OwnerReferences on child resources | Blocker |
+| O2 | Proper reconciliation patterns | Critical |
+| O3 | SecurityContext on Job pods | Critical |
+| O4 | Resource limits/requests | Major |
+| O5 | No `panic()` in production | Blocker |
+| O6 | Status condition updates | Critical |
+| O7 | No `context.TODO()` | Minor |
+
+### Runner (4 checks, weight: 10%)
+
+| # | Check | Severity |
+|---|-------|----------|
+| R1 | Proper async patterns | Major |
+| R2 | Credential handling | Blocker |
+| R3 | Error propagation | Critical |
+| R4 | No hardcoded secrets | Blocker |
+
+### Security (7 checks, weight: 20%)
+
+| # | Check | Severity |
+|---|-------|----------|
+| S1 | User token for user ops | Blocker |
+| S2 | RBAC before resource access | Critical |
+| S3 | Token redaction | Blocker |
+| S4 | Input validation | Major |
+| S5 | SecurityContext on pods | Critical |
+| S6 | OwnerReferences on Secrets | Critical |
+| S7 | No hardcoded credentials | Blocker |
+
+## Scoring
+
+- Each check: Pass (1) or Fail (0)
+- Category score: passes / total
+- Overall score:
+  - Full scope: weighted average across all categories
+  - Scoped runs: renormalize weights to selected categories (e.g., backend-only uses 100% backend weight)
+
+## Output Format
+
+```markdown
+# Convention Alignment Report
+
+**Scope:** [full | backend | frontend | ...]
+**Date:** [ISO date]
+**Overall Score:** [X%]
+
+## Category Scores
+
+| Category | Score | Pass | Fail | Blockers |
+|----------|-------|------|------|----------|
+| Backend  | X/8   | X    | X    | X        |
+| Frontend | X/8   | X    | X    | X        |
+| Operator | X/7   | X    | X    | X        |
+| Runner   | X/4   | X    | X    | X        |
+| Security | X/7   | X    | X    | X        |
+
+## Failures
+
+### Blockers
+[List with file:line references]
+
+### Critical
+[List with file:line references]
+
+### Major / Minor
+[List]
+
+## Recommendations
+[Top 3 priorities to improve alignment]
+```
diff --git a/.claude/agents/frontend-review.md b/.claude/agents/frontend-review.md
new file mode 100644
index 000000000..4edadb1f7
--- /dev/null
+++ b/.claude/agents/frontend-review.md
@@ -0,0 +1,116 @@
+---
+name: frontend-review
+description: >
+  Review frontend TypeScript/React code for convention violations. Use after
+  modifying files under components/frontend/src/. Checks for raw HTML elements,
+  manual fetch, any types, interface usage, component size, and missing states.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Frontend Review Agent
+
+Review frontend code against documented conventions.
+
+## Context
+
+Load these files before running checks:
+
+1. `components/frontend/DEVELOPMENT.md`
+2. `components/frontend/REACT_QUERY_PATTERNS.md`
+3. `components/frontend/DESIGN_GUIDELINES.md` (if it exists)
+
+## Checks
+
+### F1: No raw HTML elements (Critical)
+
+```bash
+grep -rn "<button\|<input\|<select\|<dialog\|<textarea" components/frontend/src/ --include="*.tsx" | grep -v "components/ui/"
+```
+
+Must use Shadcn UI components from `@/components/ui/`.
+
+### F2: No manual fetch() in components (Critical)
+
+```bash
+grep -rn "fetch(" components/frontend/src/app/ components/frontend/src/components/ --include="*.tsx" --include="*.ts" | grep -v "services/api/\|src/app/api/"
+```
+
+Use React Query hooks from `@/services/queries/`.
+
+### F3: No interface declarations (Major)
+
+```bash
+grep -rn "^export interface \|^interface " components/frontend/src/ --include="*.ts" --include="*.tsx" | grep -v "node_modules"
+```
+
+Use `type` instead of `interface`.
+
+### F4: No any types (Critical)
+
+```bash
+grep -rn ": any\b\|as any\b\|<any>" components/frontend/src/ --include="*.ts" --include="*.tsx" | grep -v "node_modules\|\.d\.ts"
+```
+
+Use proper types, `unknown`, or generic constraints.
+
+### F5: Components under 200 lines (Minor)
+
+```bash
+find components/frontend/src/ -name "*.tsx" -print0 | xargs -0 wc -l | sort -rn | head -20
+```
+
+Flag components exceeding 200 lines. Consider splitting.
+
+### F6: Loading/error/empty states (Major)
+
+For components using `useQuery`:
+- Must reference `isLoading` or `isPending`
+- Must reference `error`
+- Should handle empty data
+
+```bash
+grep -rl "useQuery\|useSessions\|useSession" \
+  components/frontend/src/app/ components/frontend/src/components/ --include="*.tsx"
+```
+
+Then check each file for `isLoading\|isPending` and `error` references.
+
+### F7: Single-use components in shared directories (Minor)
+
+Check `components/frontend/src/components/` for components imported only once. These should be co-located with their page in `_components/`.
+
+### F8: Feature flag on new pages (Major)
+
+New `page.tsx` files should reference `useWorkspaceFlag` or `useFlag` for feature gating.
+
+## Output Format
+
+```markdown
+# Frontend Review
+
+## Summary
+[1-2 sentence overview]
+
+## Findings
+
+### Blocker
+[Must fix — or "None"]
+
+### Critical
+[Should fix — or "None"]
+
+### Major
+[Important — or "None"]
+
+### Minor
+[Nice-to-have — or "None"]
+
+## Score
+[X/8 checks passed]
+```
+
+Each finding includes: file:line, problem description, convention violated, suggested fix.
diff --git a/.claude/agents/operator-review.md b/.claude/agents/operator-review.md
new file mode 100644
index 000000000..001dde89e
--- /dev/null
+++ b/.claude/agents/operator-review.md
@@ -0,0 +1,102 @@
+---
+name: operator-review
+description: >
+  Review Kubernetes operator code for convention violations. Use after modifying
+  files under components/operator/. Checks for OwnerReferences, SecurityContext,
+  reconciliation patterns, resource limits, and panic usage.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Operator Review Agent
+
+Review operator Go code against documented conventions.
+
+## Context
+
+Load these files before running checks:
+
+1. `components/operator/DEVELOPMENT.md`
+2. `components/backend/K8S_CLIENT_PATTERNS.md`
+3. `components/backend/ERROR_PATTERNS.md`
+
+## Checks
+
+### O1: OwnerReferences on child resources (Blocker)
+
+```bash
+grep -rn "Job\|Secret\|PersistentVolumeClaim" components/operator/ --include="*.go" | grep -i "create"
+```
+
+Cross-reference each create call with `OwnerReferences` in the same function. See `DEVELOPMENT.md` for the required pattern.
+
+### O2: Proper reconciliation patterns (Critical)
+
+- `errors.IsNotFound` → return nil (resource deleted, don't retry)
+- Transient errors → return error (triggers requeue with backoff)
+- Terminal errors → update CR status to "Failed", return nil
+
+### O3: SecurityContext on Job pod specs (Critical)
+
+```bash
+grep -rn "SecurityContext" components/operator/ --include="*.go" | grep -v "_test.go"
+```
+
+Required: `AllowPrivilegeEscalation: false`, `Capabilities.Drop: ["ALL"]`
+
+### O4: Resource limits/requests on containers (Major)
+
+```bash
+grep -rn "Resources\|Limits\|Requests" components/operator/ --include="*.go" | grep -v "_test.go"
+```
+
+Job containers should have resource requirements set.
+
+### O5: No panic() in production (Blocker)
+
+```bash
+grep -rn "panic(" components/operator/ --include="*.go" | grep -v "_test.go"
+```
+
+### O6: Status condition updates (Critical)
+
+Error paths must update the CR status to reflect the error.
+
+### O7: No context.TODO() (Minor)
+
+```bash
+grep -rn "context.TODO()" components/operator/ --include="*.go" | grep -v "_test.go"
+```
+
+Use proper context propagation from the reconciliation request.
+
+## Output Format
+
+```markdown
+# Operator Review
+
+## Summary
+[1-2 sentence overview]
+
+## Findings
+
+### Blocker
+[Must fix — or "None"]
+
+### Critical
+[Should fix — or "None"]
+
+### Major
+[Important — or "None"]
+
+### Minor
+[Nice-to-have — or "None"]
+
+## Score
+[X/7 checks passed]
+```
+
+Each finding includes: file:line, problem description, convention violated, suggested fix.
diff --git a/.claude/agents/runner-review.md b/.claude/agents/runner-review.md
new file mode 100644
index 000000000..a2752739f
--- /dev/null
+++ b/.claude/agents/runner-review.md
@@ -0,0 +1,68 @@
+---
+name: runner-review
+description: >
+  Review Python runner code for convention violations. Use after modifying files
+  under components/runners/ambient-runner/. Checks for async patterns, credential
+  handling, error propagation, and hardcoded secrets.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Runner Review Agent
+
+Review runner Python code against documented conventions.
+
+## Context
+
+No runner-specific DEVELOPMENT.md exists yet. Review against general Python best practices and the patterns visible in `components/runners/ambient-runner/src/`.
+
+## Checks
+
+### R1: Proper async patterns (Major)
+
+No blocking calls (`open()`, `requests.`, `time.sleep()`) inside async functions. Use `aiofiles`, `httpx`, `asyncio.sleep()`.
+
+### R2: Credential handling (Blocker)
+
+No hardcoded credential values. Credentials loaded from environment or K8s secrets. No credentials in log statements.
+
+### R3: Error propagation from subprocess (Critical)
+
+Subprocess calls must propagate errors, not swallow them. Return codes checked, errors raised or logged with context.
+
+### R4: No hardcoded secrets or API keys (Blocker)
+
+```bash
+grep -rn "sk-\|api_key=\|password=" components/runners/ambient-runner/ --include="*.py" | grep -v "_test\|test_\|example\|mock"
+```
+
+## Output Format
+
+```markdown
+# Runner Review
+
+## Summary
+[1-2 sentence overview]
+
+## Findings
+
+### Blocker
+[Must fix — or "None"]
+
+### Critical
+[Should fix — or "None"]
+
+### Major
+[Important — or "None"]
+
+### Minor
+[Nice-to-have — or "None"]
+
+## Score
+[X/4 checks passed]
+```
+
+Each finding includes: file:line, problem description, convention violated, suggested fix.
diff --git a/.claude/agents/security-review.md b/.claude/agents/security-review.md
new file mode 100644
index 000000000..44262eef1
--- /dev/null
+++ b/.claude/agents/security-review.md
@@ -0,0 +1,84 @@
+---
+name: security-review
+description: >
+  Cross-cutting security review for code touching auth, RBAC, tokens, or
+  container specs. Use before committing any code that handles authentication,
+  authorization, credentials, or security contexts.
+tools:
+  - Read
+  - Grep
+  - Glob
+  - Bash
+---
+
+# Security Review Agent
+
+Cross-cutting security review against documented security standards.
+
+## Context
+
+Load these files before running checks:
+
+1. `docs/security-standards.md`
+
+## Checks
+
+### S1: User token for user operations (Blocker)
+
+Handlers must use `GetK8sClientsForRequest(c)` for user-initiated operations. Service account only for privileged operations after RBAC validation.
+
+### S2: RBAC before resource access (Critical)
+
+`SelfSubjectAccessReview` (or equivalent authz check) should precede user-scoped resource access.
+
+### S3: Token redaction in all outputs (Blocker)
+
+No tokens in logs, errors, or API responses. Use `len(token)` for logging.
+
+### S4: Input validation (Major)
+
+DNS labels validated, URLs parsed, no raw newlines for log injection.
+
+### S5: SecurityContext on pods (Critical)
+
+`AllowPrivilegeEscalation: false`, `Capabilities.Drop: ["ALL"]`.
+
+### S6: OwnerReferences on Secrets (Critical)
+
+Secrets created by the platform must have OwnerReferences for cleanup.
+
+### S7: No hardcoded credentials (Blocker)
+
+```bash
+grep -rn 'password.*=.*"\|api.key.*=.*"\|secret.*=.*"\|token.*=.*"' components/ --include="*.go" --include="*.py" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.yaml" --include="*.yml" | grep -v "_test\|test_\|mock\|example\|fixture\|\.d\.ts"
+```
+
+## Output Format
+
+```markdown
+# Security Review
+
+## Summary
+[1-2 sentence overview with overall risk assessment]
+
+## Findings
+
+### Blocker
+[Must fix — security vulnerabilities]
+
+### Critical
+[Should fix — security weaknesses]
+
+### Major
+[Important — defense-in-depth gaps]
+
+### Minor
+[Nice-to-have — or "None"]
+
+## Score
+[X/7 checks passed]
+```
+
+Each finding includes: file:line, problem description, convention violated, suggested fix.
+
+**Security reviews should err on the side of flagging potential issues.** False positives are acceptable; false negatives are not.
diff --git a/.claude/commands/acp-compile.md b/.claude/commands/acp-compile.md
deleted file mode 100644
index 779a4b5c9..000000000
--- a/.claude/commands/acp-compile.md
+++ /dev/null
@@ -1,51 +0,0 @@
----
-description: Submit a plan file to ACP for execution as an AgenticSession on the cluster.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-## Steps
-
-1. **Locate the plan file**:
-   - If `$ARGUMENTS` is a non-empty file path, use that file
-   - If `$ARGUMENTS` is empty, find the most recently modified `.md` file in `.claude/plans/`
-   - Read the plan file contents — this becomes the `initial_prompt`
-   - If no plan file is found, stop and ask the user to provide a path
-
-2. **Get repository info**:
-   - Run `git remote get-url origin` to get the repo URL
-   - Run `git branch --show-current` to get the current branch
-
-3. **Build the prompt**:
-   - Prepend a context header to the plan contents:
-     ```
-     You are executing a plan that was compiled and submitted to ACP.
-     Repository: {repo_url}
-     Branch: {branch}
-
-     ---
-
-     {plan_file_contents}
-     ```
-
-4. **Create the session**:
-   - Call the `acp_create_session` MCP tool with:
-     - `initial_prompt`: the assembled prompt from step 3
-     - `repos`: `["{repo_url}"]`
-     - `display_name`: `"Compiled: {plan_file_basename}"`
-     - `interactive`: `false`
-     - `timeout`: `1800`
-   - If the tool returns `created: false`, print the error message and stop
-
-5. **Report results**:
-   - Print the session name and project from the response
-   - Print follow-up commands:
-     ```
-     Check status:  acp_list_sessions(project="...")
-     View logs:     acp_get_session_logs(project="...", session="...")
-     ```
-   - Do NOT wait for the session to complete — return immediately
diff --git a/.claude/commands/cypress-demo.md b/.claude/commands/cypress-demo.md
deleted file mode 100644
index 14a6edca1..000000000
--- a/.claude/commands/cypress-demo.md
+++ /dev/null
@@ -1,285 +0,0 @@
----
-description: Create a Cypress-based video demo for a feature branch with cursor, click effects, and captions.
----
-
-# /cypress-demo Command
-
-Create a polished Cypress demo test that records a human-paced video walkthrough of UI features on the current branch.
-
-## Usage
-
-```
-/cypress-demo                          # Auto-detect features from branch diff
-/cypress-demo chat input refactoring   # Describe what to demo
-```
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-## Behavior
-
-When invoked, Claude will create a Cypress test file in `e2e/cypress/e2e/` that records a demo video with:
-
-- **Synthetic cursor** (white dot) that glides smoothly to each interaction target
-- **Click ripple** (blue expanding ring) on every click action
-- **Caption bar** (compact dark bar at top of viewport) describing each step
-- **Human-paced timing** so every action is clearly visible
-- **`--no-runner-ui`** flag to exclude the Cypress sidebar from the recording
-
-### 1. Determine what to demo
-
-- If `$ARGUMENTS` is provided, use it as the demo description
-- If empty, run `git diff main..HEAD --stat` to identify changed files and infer features
-- Read the changed/new component files to understand what UI to showcase
-- Ask the user if clarification is needed on which features to highlight
-
-### 2. Check prerequisites
-
-- Verify `e2e/.env.test` or `e2e/.env` exists with `TEST_TOKEN`
-- Check if `ANTHROPIC_API_KEY` is available (needed if the demo requires Running state for workflows, agents, or commands)
-- Verify the kind cluster is up: `kubectl get pods -n ambient-code`
-- Verify the frontend is accessible: `curl -s -o /dev/null -w "%{http_code}" http://localhost`
-- If the frontend was rebuilt from this branch, verify imagePullPolicy is `Never` or `IfNotPresent`
-
-### 3. Create the demo test file
-
-Create `e2e/cypress/e2e/<feature-name>-demo.cy.ts` using the template structure below.
-
-#### Required helpers (copy into every demo file)
-
-```typescript
-// Timing constants — adjust per demo, aim for ~2 min total video
-const LONG = 3200    // hold on important visuals
-const PAUSE = 2400   // standard pause between actions
-const SHORT = 1600   // brief pause after small actions
-const TYPE_DELAY = 80 // ms per keystroke
-
-// Target first element (session page renders desktop + mobile layout)
-const chatInput = () => cy.get('textarea[placeholder*="message"]').first()
-
-// Caption: compact bar at TOP of viewport
-function caption(text: string) {
-  cy.document().then((doc) => {
-    let el = doc.getElementById('demo-caption')
-    if (!el) {
-      el = doc.createElement('div')
-      el.id = 'demo-caption'
-      el.style.cssText = [
-        'position:fixed', 'top:0', 'left:0', 'right:0', 'z-index:99998',
-        'background:rgba(0,0,0,0.80)', 'color:#fff', 'font-size:14px',
-        'font-weight:500', 'font-family:system-ui,-apple-system,sans-serif',
-        'padding:6px 20px', 'text-align:center', 'letter-spacing:0.2px',
-        'pointer-events:none', 'transition:opacity 0.4s ease',
-      ].join(';')
-      doc.body.appendChild(el)
-    }
-    el.textContent = text
-    el.style.opacity = '1'
-  })
-}
-
-function clearCaption() {
-  cy.document().then((doc) => {
-    const el = doc.getElementById('demo-caption')
-    if (el) el.style.opacity = '0'
-  })
-}
-
-// Synthetic cursor + click ripple
-function initCursor() {
-  cy.document().then((doc) => {
-    if (doc.getElementById('demo-cursor')) return
-    const cursor = doc.createElement('div')
-    cursor.id = 'demo-cursor'
-    cursor.style.cssText = [
-      'position:fixed', 'z-index:99999', 'pointer-events:none',
-      'width:20px', 'height:20px', 'border-radius:50%',
-      'background:rgba(255,255,255,0.9)', 'border:2px solid #333',
-      'box-shadow:0 0 6px rgba(0,0,0,0.4)',
-      'transform:translate(-50%,-50%)',
-      'transition:left 0.5s cubic-bezier(0.25,0.1,0.25,1), top 0.5s cubic-bezier(0.25,0.1,0.25,1)',
-      'left:-40px', 'top:-40px',
-    ].join(';')
-    doc.body.appendChild(cursor)
-    const ripple = doc.createElement('div')
-    ripple.id = 'demo-ripple'
-    ripple.style.cssText = [
-      'position:fixed', 'z-index:99999', 'pointer-events:none',
-      'width:40px', 'height:40px', 'border-radius:50%',
-      'border:3px solid rgba(59,130,246,0.8)',
-      'transform:translate(-50%,-50%) scale(0)',
-      'opacity:0', 'left:-40px', 'top:-40px',
-    ].join(';')
-    doc.body.appendChild(ripple)
-    const style = doc.createElement('style')
-    style.textContent = `
-      @keyframes demo-ripple-anim {
-        0%   { transform: translate(-50%,-50%) scale(0); opacity: 1; }
-        100% { transform: translate(-50%,-50%) scale(2.5); opacity: 0; }
-      }
-    `
-    doc.head.appendChild(style)
-  })
-}
-
-// Move cursor smoothly to element center
-function moveTo(selector: string, options?: { first?: boolean }) {
-  const chain = options?.first ? cy.get(selector).first() : cy.get(selector)
-  chain.then(($el) => {
-    const rect = $el[0].getBoundingClientRect()
-    cy.document().then((doc) => {
-      const cursor = doc.getElementById('demo-cursor')
-      if (cursor) {
-        cursor.style.left = `${rect.left + rect.width / 2}px`
-        cursor.style.top = `${rect.top + rect.height / 2}px`
-      }
-    })
-    cy.wait(600)
-  })
-}
-
-function moveToText(text: string, tag?: string) {
-  const chain = tag ? cy.contains(tag, text) : cy.contains(text)
-  chain.then(($el) => {
-    const rect = $el[0].getBoundingClientRect()
-    cy.document().then((doc) => {
-      const cursor = doc.getElementById('demo-cursor')
-      if (cursor) {
-        cursor.style.left = `${rect.left + rect.width / 2}px`
-        cursor.style.top = `${rect.top + rect.height / 2}px`
-      }
-    })
-    cy.wait(600)
-  })
-}
-
-function moveToEl($el: JQuery<HTMLElement>) {
-  const rect = $el[0].getBoundingClientRect()
-  cy.document().then((doc) => {
-    const cursor = doc.getElementById('demo-cursor')
-    if (cursor) {
-      cursor.style.left = `${rect.left + rect.width / 2}px`
-      cursor.style.top = `${rect.top + rect.height / 2}px`
-    }
-  })
-  cy.wait(600)
-}
-
-function clickEffect() {
-  cy.document().then((doc) => {
-    const cursor = doc.getElementById('demo-cursor')
-    const ripple = doc.getElementById('demo-ripple')
-    if (cursor && ripple) {
-      ripple.style.left = cursor.style.left
-      ripple.style.top = cursor.style.top
-      ripple.style.animation = 'none'
-      void ripple.offsetHeight
-      ripple.style.animation = 'demo-ripple-anim 0.5s ease-out forwards'
-    }
-  })
-}
-
-// Compound: move → ripple → click
-function cursorClickText(text: string, tag?: string, options?: { force?: boolean }) {
-  moveToText(text, tag)
-  clickEffect()
-  const chain = tag ? cy.contains(tag, text) : cy.contains(text)
-  chain.click({ force: options?.force })
-}
-```
-
-#### Test structure
-
-```typescript
-describe('<Feature> Demo', () => {
-  const workspaceName = `demo-${Date.now()}`
-
-  // ... helpers above ...
-
-  Cypress.on('uncaught:exception', (err) => {
-    if (err.message.includes('Minified React error') || err.message.includes('Hydration')) {
-      return false
-    }
-    return true
-  })
-
-  after(() => {
-    if (!Cypress.env('KEEP_WORKSPACES')) {
-      const token = Cypress.env('TEST_TOKEN')
-      cy.request({
-        method: 'DELETE',
-        url: `/api/projects/${workspaceName}`,
-        headers: { Authorization: `Bearer ${token}` },
-        failOnStatusCode: false,
-      })
-    }
-  })
-
-  it('demonstrates <feature>', () => {
-    // ... single continuous test for one video file ...
-  })
-})
-```
-
-### 4. Key patterns to follow
-
-| Pattern | Rule |
-|---------|------|
-| **Dual layout** | Session page renders desktop + mobile. Always use `.first()` on element queries that match both |
-| **Caption scoping** | When asserting page content with `cy.contains`, scope to a tag (e.g., `cy.contains('p', 'text')`) to avoid matching the caption overlay |
-| **Workspace setup** | Create workspace → poll `/api/projects/:name` until 200 → configure runner-secrets if API key needed |
-| **Running state** | If demo needs agents/commands, configure `ANTHROPIC_API_KEY` via runner-secrets, select a workflow, and wait for `textarea[placeholder*="attach"]` (Running placeholder) with 180s timeout |
-| **Operator pull policy** | For kind clusters, set `IMAGE_PULL_POLICY=IfNotPresent` on the operator to avoid re-pulling the 879MB runner image every session |
-| **File attachment** | Use `cy.get('input[type="file"]').first().selectFile({...}, { force: true })` with a `Cypress.Buffer` — no real file needed |
-| **Caption position** | Always `top:0` — bottom position obscures the chat toolbar |
-| **Timing** | Aim for ~2 min total. LONG=3.2s, PAUSE=2.4s, SHORT=1.6s, TYPE_DELAY=80ms. Adjust if video feels too fast or slow |
-| **Video output** | `e2e/cypress/videos/<name>.cy.ts.mp4` at 2560x1440 (Retina) |
-
-### 5. Run the demo
-
-```bash
-cd e2e
-npx cypress run --no-runner-ui --spec "cypress/e2e/<name>-demo.cy.ts"
-```
-
-- Verify the video plays at human-readable speed
-- Check that captions don't overlap important UI elements
-- Re-run and iterate if needed — adjust timing or add/remove steps
-
-### 6. Commit and push
-
-- Commit the demo test file and any config changes (`cypress.config.ts`)
-- Push to the current branch
-- If a PR exists, note the demo in the PR description
-
-## Reference implementation
-
-See `e2e/cypress/e2e/chatbox-demo.cy.ts` for a complete working example that demonstrates:
-- Workspace creation, session creation
-- WelcomeExperience (streaming text, workflow cards)
-- Workflow selection ("Fix a bug") with Running state wait
-- File attachments (AttachmentPreview)
-- Autocomplete popovers (@agents, /commands) with real workflow data
-- Message queueing (QueuedMessageBubble)
-- Message history and queued message editing
-- Settings dropdown
-- Breadcrumb navigation
-
-## Config requirements
-
-`e2e/cypress.config.ts` must load `.env.test` and wire `TEST_TOKEN`:
-
-```typescript
-// Load env files: .env.local > .env > .env.test
-const envFiles = ['.env.local', '.env', '.env.test'].map(f => path.resolve(__dirname, f))
-for (const envFile of envFiles) {
-  if (fs.existsSync(envFile)) { dotenv.config({ path: envFile }) }
-}
-
-// In setupNodeEvents:
-config.env.TEST_TOKEN = process.env.CYPRESS_TEST_TOKEN || process.env.TEST_TOKEN || config.env.TEST_TOKEN || ''
-config.env.ANTHROPIC_API_KEY = process.env.CYPRESS_ANTHROPIC_API_KEY || process.env.ANTHROPIC_API_KEY || ''
-```
diff --git a/.claude/commands/speckit.analyze.md b/.claude/commands/speckit.analyze.md
deleted file mode 100644
index 98b04b0c8..000000000
--- a/.claude/commands/speckit.analyze.md
+++ /dev/null
@@ -1,184 +0,0 @@
----
-description: Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Goal
-
-Identify inconsistencies, duplications, ambiguities, and underspecified items across the three core artifacts (`spec.md`, `plan.md`, `tasks.md`) before implementation. This command MUST run only after `/speckit.tasks` has successfully produced a complete `tasks.md`.
-
-## Operating Constraints
-
-**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually).
-
-**Constitution Authority**: The project constitution (`.specify/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `/speckit.analyze`.
-
-## Execution Steps
-
-### 1. Initialize Analysis Context
-
-Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` once from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS. Derive absolute paths:
-
-- SPEC = FEATURE_DIR/spec.md
-- PLAN = FEATURE_DIR/plan.md
-- TASKS = FEATURE_DIR/tasks.md
-
-Abort with an error message if any required file is missing (instruct the user to run missing prerequisite command).
-For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-### 2. Load Artifacts (Progressive Disclosure)
-
-Load only the minimal necessary context from each artifact:
-
-**From spec.md:**
-
-- Overview/Context
-- Functional Requirements
-- Non-Functional Requirements
-- User Stories
-- Edge Cases (if present)
-
-**From plan.md:**
-
-- Architecture/stack choices
-- Data Model references
-- Phases
-- Technical constraints
-
-**From tasks.md:**
-
-- Task IDs
-- Descriptions
-- Phase grouping
-- Parallel markers [P]
-- Referenced file paths
-
-**From constitution:**
-
-- Load `.specify/memory/constitution.md` for principle validation
-
-### 3. Build Semantic Models
-
-Create internal representations (do not include raw artifacts in output):
-
-- **Requirements inventory**: Each functional + non-functional requirement with a stable key (derive slug based on imperative phrase; e.g., "User can upload file" → `user-can-upload-file`)
-- **User story/action inventory**: Discrete user actions with acceptance criteria
-- **Task coverage mapping**: Map each task to one or more requirements or stories (inference by keyword / explicit reference patterns like IDs or key phrases)
-- **Constitution rule set**: Extract principle names and MUST/SHOULD normative statements
-
-### 4. Detection Passes (Token-Efficient Analysis)
-
-Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary.
-
-#### A. Duplication Detection
-
-- Identify near-duplicate requirements
-- Mark lower-quality phrasing for consolidation
-
-#### B. Ambiguity Detection
-
-- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria
-- Flag unresolved placeholders (TODO, TKTK, ???, `<placeholder>`, etc.)
-
-#### C. Underspecification
-
-- Requirements with verbs but missing object or measurable outcome
-- User stories missing acceptance criteria alignment
-- Tasks referencing files or components not defined in spec/plan
-
-#### D. Constitution Alignment
-
-- Any requirement or plan element conflicting with a MUST principle
-- Missing mandated sections or quality gates from constitution
-
-#### E. Coverage Gaps
-
-- Requirements with zero associated tasks
-- Tasks with no mapped requirement/story
-- Non-functional requirements not reflected in tasks (e.g., performance, security)
-
-#### F. Inconsistency
-
-- Terminology drift (same concept named differently across files)
-- Data entities referenced in plan but absent in spec (or vice versa)
-- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note)
-- Conflicting requirements (e.g., one requires Next.js while other specifies Vue)
-
-### 5. Severity Assignment
-
-Use this heuristic to prioritize findings:
-
-- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality
-- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion
-- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case
-- **LOW**: Style/wording improvements, minor redundancy not affecting execution order
-
-### 6. Produce Compact Analysis Report
-
-Output a Markdown report (no file writes) with the following structure:
-
-## Specification Analysis Report
-
-| ID | Category | Severity | Location(s) | Summary | Recommendation |
-|----|----------|----------|-------------|---------|----------------|
-| A1 | Duplication | HIGH | spec.md:L120-134 | Two similar requirements ... | Merge phrasing; keep clearer version |
-
-(Add one row per finding; generate stable IDs prefixed by category initial.)
-
-**Coverage Summary Table:**
-
-| Requirement Key | Has Task? | Task IDs | Notes |
-|-----------------|-----------|----------|-------|
-
-**Constitution Alignment Issues:** (if any)
-
-**Unmapped Tasks:** (if any)
-
-**Metrics:**
-
-- Total Requirements
-- Total Tasks
-- Coverage % (requirements with >=1 task)
-- Ambiguity Count
-- Duplication Count
-- Critical Issues Count
-
-### 7. Provide Next Actions
-
-At end of report, output a concise Next Actions block:
-
-- If CRITICAL issues exist: Recommend resolving before `/speckit.implement`
-- If only LOW/MEDIUM: User may proceed, but provide improvement suggestions
-- Provide explicit command suggestions: e.g., "Run /speckit.specify with refinement", "Run /speckit.plan to adjust architecture", "Manually edit tasks.md to add coverage for 'performance-metrics'"
-
-### 8. Offer Remediation
-
-Ask the user: "Would you like me to suggest concrete remediation edits for the top N issues?" (Do NOT apply them automatically.)
-
-## Operating Principles
-
-### Context Efficiency
-
-- **Minimal high-signal tokens**: Focus on actionable findings, not exhaustive documentation
-- **Progressive disclosure**: Load artifacts incrementally; don't dump all content into analysis
-- **Token-efficient output**: Limit findings table to 50 rows; summarize overflow
-- **Deterministic results**: Rerunning without changes should produce consistent IDs and counts
-
-### Analysis Guidelines
-
-- **NEVER modify files** (this is read-only analysis)
-- **NEVER hallucinate missing sections** (if absent, report them accurately)
-- **Prioritize constitution violations** (these are always CRITICAL)
-- **Use examples over exhaustive rules** (cite specific instances, not generic patterns)
-- **Report zero issues gracefully** (emit success report with coverage statistics)
-
-## Context
-
-$ARGUMENTS
diff --git a/.claude/commands/speckit.checklist.md b/.claude/commands/speckit.checklist.md
deleted file mode 100644
index 970e6c9ed..000000000
--- a/.claude/commands/speckit.checklist.md
+++ /dev/null
@@ -1,294 +0,0 @@
----
-description: Generate a custom checklist for the current feature based on user requirements.
----
-
-## Checklist Purpose: "Unit Tests for English"
-
-**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain.
-
-**NOT for verification/testing**:
-
-- ❌ NOT "Verify the button clicks correctly"
-- ❌ NOT "Test error handling works"
-- ❌ NOT "Confirm the API returns 200"
-- ❌ NOT checking if code/implementation matches the spec
-
-**FOR requirements quality validation**:
-
-- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
-- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
-- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
-- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
-- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)
-
-**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Execution Steps
-
-1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse JSON for FEATURE_DIR and AVAILABLE_DOCS list.
-   - All file paths must be absolute.
-   - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-2. **Clarify intent (dynamic)**: Derive up to THREE initial contextual clarifying questions (no pre-baked catalog). They MUST:
-   - Be generated from the user's phrasing + extracted signals from spec/plan/tasks
-   - Only ask about information that materially changes checklist content
-   - Be skipped individually if already unambiguous in `$ARGUMENTS`
-   - Prefer precision over breadth
-
-   Generation algorithm:
-   1. Extract signals: feature domain keywords (e.g., auth, latency, UX, API), risk indicators ("critical", "must", "compliance"), stakeholder hints ("QA", "review", "security team"), and explicit deliverables ("a11y", "rollback", "contracts").
-   2. Cluster signals into candidate focus areas (max 4) ranked by relevance.
-   3. Identify probable audience & timing (author, reviewer, QA, release) if not explicit.
-   4. Detect missing dimensions: scope breadth, depth/rigor, risk emphasis, exclusion boundaries, measurable acceptance criteria.
-   5. Formulate questions chosen from these archetypes:
-      - Scope refinement (e.g., "Should this include integration touchpoints with X and Y or stay limited to local module correctness?")
-      - Risk prioritization (e.g., "Which of these potential risk areas should receive mandatory gating checks?")
-      - Depth calibration (e.g., "Is this a lightweight pre-commit sanity list or a formal release gate?")
-      - Audience framing (e.g., "Will this be used by the author only or peers during PR review?")
-      - Boundary exclusion (e.g., "Should we explicitly exclude performance tuning items this round?")
-      - Scenario class gap (e.g., "No recovery flows detected—are rollback / partial failure paths in scope?")
-
-   Question formatting rules:
-   - If presenting options, generate a compact table with columns: Option | Candidate | Why It Matters
-   - Limit to A–E options maximum; omit table if a free-form answer is clearer
-   - Never ask the user to restate what they already said
-   - Avoid speculative categories (no hallucination). If uncertain, ask explicitly: "Confirm whether X belongs in scope."
-
-   Defaults when interaction impossible:
-   - Depth: Standard
-   - Audience: Reviewer (PR) if code-related; Author otherwise
-   - Focus: Top 2 relevance clusters
-
-   Output the questions (label Q1/Q2/Q3). After answers: if ≥2 scenario classes (Alternate / Exception / Recovery / Non-Functional domain) remain unclear, you MAY ask up to TWO more targeted follow‑ups (Q4/Q5) with a one-line justification each (e.g., "Unresolved recovery path risk"). Do not exceed five total questions. Skip escalation if user explicitly declines more.
-
-3. **Understand user request**: Combine `$ARGUMENTS` + clarifying answers:
-   - Derive checklist theme (e.g., security, review, deploy, ux)
-   - Consolidate explicit must-have items mentioned by user
-   - Map focus selections to category scaffolding
-   - Infer any missing context from spec/plan/tasks (do NOT hallucinate)
-
-4. **Load feature context**: Read from FEATURE_DIR:
-   - spec.md: Feature requirements and scope
-   - plan.md (if exists): Technical details, dependencies
-   - tasks.md (if exists): Implementation tasks
-
-   **Context Loading Strategy**:
-   - Load only necessary portions relevant to active focus areas (avoid full-file dumping)
-   - Prefer summarizing long sections into concise scenario/requirement bullets
-   - Use progressive disclosure: add follow-on retrieval only if gaps detected
-   - If source docs are large, generate interim summary items instead of embedding raw text
-
-5. **Generate checklist** - Create "Unit Tests for Requirements":
-   - Create `FEATURE_DIR/checklists/` directory if it doesn't exist
-   - Generate unique checklist filename:
-     - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`)
-     - Format: `[domain].md`
-     - If file exists, append to existing file
-   - Number items sequentially starting from CHK001
-   - Each `/speckit.checklist` run creates a NEW file (never overwrites existing checklists)
-
-   **CORE PRINCIPLE - Test the Requirements, Not the Implementation**:
-   Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for:
-   - **Completeness**: Are all necessary requirements present?
-   - **Clarity**: Are requirements unambiguous and specific?
-   - **Consistency**: Do requirements align with each other?
-   - **Measurability**: Can requirements be objectively verified?
-   - **Coverage**: Are all scenarios/edge cases addressed?
-
-   **Category Structure** - Group items by requirement quality dimensions:
-   - **Requirement Completeness** (Are all necessary requirements documented?)
-   - **Requirement Clarity** (Are requirements specific and unambiguous?)
-   - **Requirement Consistency** (Do requirements align without conflicts?)
-   - **Acceptance Criteria Quality** (Are success criteria measurable?)
-   - **Scenario Coverage** (Are all flows/cases addressed?)
-   - **Edge Case Coverage** (Are boundary conditions defined?)
-   - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?)
-   - **Dependencies & Assumptions** (Are they documented and validated?)
-   - **Ambiguities & Conflicts** (What needs clarification?)
-
-   **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**:
-
-   ❌ **WRONG** (Testing implementation):
-   - "Verify landing page displays 3 episode cards"
-   - "Test hover states work on desktop"
-   - "Confirm logo click navigates home"
-
-   ✅ **CORRECT** (Testing requirements quality):
-   - "Are the exact number and layout of featured episodes specified?" [Completeness]
-   - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity]
-   - "Are hover state requirements consistent across all interactive elements?" [Consistency]
-   - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
-   - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
-   - "Are loading states defined for asynchronous episode data?" [Completeness]
-   - "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
-
-   **ITEM STRUCTURE**:
-   Each item should follow this pattern:
-   - Question format asking about requirement quality
-   - Focus on what's WRITTEN (or not written) in the spec/plan
-   - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.]
-   - Reference spec section `[Spec §X.Y]` when checking existing requirements
-   - Use `[Gap]` marker when checking for missing requirements
-
-   **EXAMPLES BY QUALITY DIMENSION**:
-
-   Completeness:
-   - "Are error handling requirements defined for all API failure modes? [Gap]"
-   - "Are accessibility requirements specified for all interactive elements? [Completeness]"
-   - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
-
-   Clarity:
-   - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
-   - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
-   - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
-
-   Consistency:
-   - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
-   - "Are card component requirements consistent between landing and detail pages? [Consistency]"
-
-   Coverage:
-   - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
-   - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
-   - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
-
-   Measurability:
-   - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
-   - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
-
-   **Scenario Classification & Coverage** (Requirements Quality Focus):
-   - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
-   - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
-   - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]"
-   - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]"
-
-   **Traceability Requirements**:
-   - MINIMUM: ≥80% of items MUST include at least one traceability reference
-   - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`
-   - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
-
-   **Surface & Resolve Issues** (Requirements Quality Problems):
-   Ask questions about the requirements themselves:
-   - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
-   - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
-   - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
-   - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
-   - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
-
-   **Content Consolidation**:
-   - Soft cap: If raw candidate items > 40, prioritize by risk/impact
-   - Merge near-duplicates checking the same requirement aspect
-   - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]"
-
-   **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test:
-   - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
-   - ❌ References to code execution, user actions, system behavior
-   - ❌ "Displays correctly", "works properly", "functions as expected"
-   - ❌ "Click", "navigate", "render", "load", "execute"
-   - ❌ Test cases, test plans, QA procedures
-   - ❌ Implementation details (frameworks, APIs, algorithms)
-
-   **✅ REQUIRED PATTERNS** - These test requirements quality:
-   - ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
-   - ✅ "Is [vague term] quantified/clarified with specific criteria?"
-   - ✅ "Are requirements consistent between [section A] and [section B]?"
-   - ✅ "Can [requirement] be objectively measured/verified?"
-   - ✅ "Are [edge cases/scenarios] addressed in requirements?"
-   - ✅ "Does the spec define [missing aspect]?"
-
-6. **Structure Reference**: Generate the checklist following the canonical template in `.specify/templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001.
-
-7. **Report**: Output full path to created checklist, item count, and remind user that each run creates a new file. Summarize:
-   - Focus areas selected
-   - Depth level
-   - Actor/timing
-   - Any explicit user-specified must-have items incorporated
-
-**Important**: Each `/speckit.checklist` command invocation creates a checklist file using short, descriptive names unless file already exists. This allows:
-
-- Multiple checklists of different types (e.g., `ux.md`, `test.md`, `security.md`)
-- Simple, memorable filenames that indicate checklist purpose
-- Easy identification and navigation in the `checklists/` folder
-
-To avoid clutter, use descriptive types and clean up obsolete checklists when done.
-
-## Example Checklist Types & Sample Items
-
-**UX Requirements Quality:** `ux.md`
-
-Sample items (testing the requirements, NOT the implementation):
-
-- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]"
-- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]"
-- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]"
-- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]"
-- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]"
-- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]"
-
-**API Requirements Quality:** `api.md`
-
-Sample items:
-
-- "Are error response formats specified for all failure scenarios? [Completeness]"
-- "Are rate limiting requirements quantified with specific thresholds? [Clarity]"
-- "Are authentication requirements consistent across all endpoints? [Consistency]"
-- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
-- "Is versioning strategy documented in requirements? [Gap]"
-
-**Performance Requirements Quality:** `performance.md`
-
-Sample items:
-
-- "Are performance requirements quantified with specific metrics? [Clarity]"
-- "Are performance targets defined for all critical user journeys? [Coverage]"
-- "Are performance requirements under different load conditions specified? [Completeness]"
-- "Can performance requirements be objectively measured? [Measurability]"
-- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]"
-
-**Security Requirements Quality:** `security.md`
-
-Sample items:
-
-- "Are authentication requirements specified for all protected resources? [Coverage]"
-- "Are data protection requirements defined for sensitive information? [Completeness]"
-- "Is the threat model documented and requirements aligned to it? [Traceability]"
-- "Are security requirements consistent with compliance obligations? [Consistency]"
-- "Are security failure/breach response requirements defined? [Gap, Exception Flow]"
-
-## Anti-Examples: What NOT To Do
-
-**❌ WRONG - These test implementation, not requirements:**
-
-```markdown
-- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
-- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
-- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
-- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]
-```
-
-**✅ CORRECT - These test requirements quality:**
-
-```markdown
-- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
-- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
-- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
-- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
-- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
-- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
-```
-
-**Key Differences:**
-
-- Wrong: Tests if the system works correctly
-- Correct: Tests if the requirements are written correctly
-- Wrong: Verification of behavior
-- Correct: Validation of requirement quality
-- Wrong: "Does it do X?"
-- Correct: "Is X clearly specified?"
diff --git a/.claude/commands/speckit.clarify.md b/.claude/commands/speckit.clarify.md
deleted file mode 100644
index 8ff62c348..000000000
--- a/.claude/commands/speckit.clarify.md
+++ /dev/null
@@ -1,177 +0,0 @@
----
-description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-Goal: Detect and reduce ambiguity or missing decision points in the active feature specification and record the clarifications directly in the spec file.
-
-Note: This clarification workflow is expected to run (and be completed) BEFORE invoking `/speckit.plan`. If the user explicitly states they are skipping clarification (e.g., exploratory spike), you may proceed, but must warn that downstream rework risk increases.
-
-Execution steps:
-
-1. Run `.specify/scripts/bash/check-prerequisites.sh --json --paths-only` from repo root **once** (combined `--json --paths-only` mode / `-Json -PathsOnly`). Parse minimal JSON payload fields:
-   - `FEATURE_DIR`
-   - `FEATURE_SPEC`
-   - (Optionally capture `IMPL_PLAN`, `TASKS` for future chained flows.)
-   - If JSON parsing fails, abort and instruct user to re-run `/speckit.specify` or verify feature branch environment.
-   - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked).
-
-   Functional Scope & Behavior:
-   - Core user goals & success criteria
-   - Explicit out-of-scope declarations
-   - User roles / personas differentiation
-
-   Domain & Data Model:
-   - Entities, attributes, relationships
-   - Identity & uniqueness rules
-   - Lifecycle/state transitions
-   - Data volume / scale assumptions
-
-   Interaction & UX Flow:
-   - Critical user journeys / sequences
-   - Error/empty/loading states
-   - Accessibility or localization notes
-
-   Non-Functional Quality Attributes:
-   - Performance (latency, throughput targets)
-   - Scalability (horizontal/vertical, limits)
-   - Reliability & availability (uptime, recovery expectations)
-   - Observability (logging, metrics, tracing signals)
-   - Security & privacy (authN/Z, data protection, threat assumptions)
-   - Compliance / regulatory constraints (if any)
-
-   Integration & External Dependencies:
-   - External services/APIs and failure modes
-   - Data import/export formats
-   - Protocol/versioning assumptions
-
-   Edge Cases & Failure Handling:
-   - Negative scenarios
-   - Rate limiting / throttling
-   - Conflict resolution (e.g., concurrent edits)
-
-   Constraints & Tradeoffs:
-   - Technical constraints (language, storage, hosting)
-   - Explicit tradeoffs or rejected alternatives
-
-   Terminology & Consistency:
-   - Canonical glossary terms
-   - Avoided synonyms / deprecated terms
-
-   Completion Signals:
-   - Acceptance criteria testability
-   - Measurable Definition of Done style indicators
-
-   Misc / Placeholders:
-   - TODO markers / unresolved decisions
-   - Ambiguous adjectives ("robust", "intuitive") lacking quantification
-
-   For each category with Partial or Missing status, add a candidate question opportunity unless:
-   - Clarification would not materially change implementation or validation strategy
-   - Information is better deferred to planning phase (note internally)
-
-3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints:
-    - Maximum of 10 total questions across the whole session.
-    - Each question must be answerable with EITHER:
-       - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR
-       - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words").
-    - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation.
-    - Ensure category coverage balance: attempt to cover the highest impact unresolved categories first; avoid asking two low-impact questions when a single high-impact area (e.g., security posture) is unresolved.
-    - Exclude questions already answered, trivial stylistic preferences, or plan-level execution details (unless blocking correctness).
-    - Favor clarifications that reduce downstream rework risk or prevent misaligned acceptance tests.
-    - If more than 5 categories remain unresolved, select the top 5 by (Impact * Uncertainty) heuristic.
-
-4. Sequential questioning loop (interactive):
-    - Present EXACTLY ONE question at a time.
-    - For multiple‑choice questions:
-       - **Analyze all options** and determine the **most suitable option** based on:
-          - Best practices for the project type
-          - Common patterns in similar implementations
-          - Risk reduction (security, performance, maintainability)
-          - Alignment with any explicit project goals or constraints visible in the spec
-       - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice).
-       - Format as: `**Recommended:** Option [X] - <reasoning>`
-       - Then render all options as a Markdown table:
-
-       | Option | Description |
-       |--------|-------------|
-       | A | <Option A description> |
-       | B | <Option B description> |
-       | C | <Option C description> (add D/E as needed up to 5) |
-       | Short | Provide a different short answer (<=5 words) (Include only if free-form alternative is appropriate) |
-
-       - After the table, add: `You can reply with the option letter (e.g., "A"), accept the recommendation by saying "yes" or "recommended", or provide your own short answer.`
-    - For short‑answer style (no meaningful discrete options):
-       - Provide your **suggested answer** based on best practices and context.
-       - Format as: `**Suggested:** <your proposed answer> - <brief reasoning>`
-       - Then output: `Format: Short answer (<=5 words). You can accept the suggestion by saying "yes" or "suggested", or provide your own answer.`
-    - After the user answers:
-       - If the user replies with "yes", "recommended", or "suggested", use your previously stated recommendation/suggestion as the answer.
-       - Otherwise, validate the answer maps to one option or fits the <=5 word constraint.
-       - If ambiguous, ask for a quick disambiguation (count still belongs to same question; do not advance).
-       - Once satisfactory, record it in working memory (do not yet write to disk) and move to the next queued question.
-    - Stop asking further questions when:
-       - All critical ambiguities resolved early (remaining queued items become unnecessary), OR
-       - User signals completion ("done", "good", "no more"), OR
-       - You reach 5 asked questions.
-    - Never reveal future queued questions in advance.
-    - If no valid questions exist at start, immediately report no critical ambiguities.
-
-5. Integration after EACH accepted answer (incremental update approach):
-    - Maintain in-memory representation of the spec (loaded once at start) plus the raw file contents.
-    - For the first integrated answer in this session:
-       - Ensure a `## Clarifications` section exists (create it just after the highest-level contextual/overview section per the spec template if missing).
-       - Under it, create (if not present) a `### Session YYYY-MM-DD` subheading for today.
-    - Append a bullet line immediately after acceptance: `- Q: <question> → A: <final answer>`.
-    - Then immediately apply the clarification to the most appropriate section(s):
-       - Functional ambiguity → Update or add a bullet in Functional Requirements.
-       - User interaction / actor distinction → Update User Stories or Actors subsection (if present) with clarified role, constraint, or scenario.
-       - Data shape / entities → Update Data Model (add fields, types, relationships) preserving ordering; note added constraints succinctly.
-       - Non-functional constraint → Add/modify measurable criteria in Non-Functional / Quality Attributes section (convert vague adjective to metric or explicit target).
-       - Edge case / negative flow → Add a new bullet under Edge Cases / Error Handling (or create such subsection if template provides placeholder for it).
-       - Terminology conflict → Normalize term across spec; retain original only if necessary by adding `(formerly referred to as "X")` once.
-    - If the clarification invalidates an earlier ambiguous statement, replace that statement instead of duplicating; leave no obsolete contradictory text.
-    - Save the spec file AFTER each integration to minimize risk of context loss (atomic overwrite).
-    - Preserve formatting: do not reorder unrelated sections; keep heading hierarchy intact.
-    - Keep each inserted clarification minimal and testable (avoid narrative drift).
-
-6. Validation (performed after EACH write plus final pass):
-   - Clarifications session contains exactly one bullet per accepted answer (no duplicates).
-   - Total asked (accepted) questions ≤ 5.
-   - Updated sections contain no lingering vague placeholders the new answer was meant to resolve.
-   - No contradictory earlier statement remains (scan for now-invalid alternative choices removed).
-   - Markdown structure valid; only allowed new headings: `## Clarifications`, `### Session YYYY-MM-DD`.
-   - Terminology consistency: same canonical term used across all updated sections.
-
-7. Write the updated spec back to `FEATURE_SPEC`.
-
-8. Report completion (after questioning loop ends or early termination):
-   - Number of questions asked & answered.
-   - Path to updated spec.
-   - Sections touched (list names).
-   - Coverage summary table listing each taxonomy category with Status: Resolved (was Partial/Missing and addressed), Deferred (exceeds question quota or better suited for planning), Clear (already sufficient), Outstanding (still Partial/Missing but low impact).
-   - If any Outstanding or Deferred remain, recommend whether to proceed to `/speckit.plan` or run `/speckit.clarify` again later post-plan.
-   - Suggested next command.
-
-Behavior rules:
-
-- If no meaningful ambiguities found (or all potential questions would be low-impact), respond: "No critical ambiguities detected worth formal clarification." and suggest proceeding.
-- If spec file missing, instruct user to run `/speckit.specify` first (do not create a new spec here).
-- Never exceed 5 total asked questions (clarification retries for a single question do not count as new questions).
-- Avoid speculative tech stack questions unless the absence blocks functional clarity.
-- Respect user early termination signals ("stop", "done", "proceed").
-- If no questions asked due to full coverage, output a compact coverage summary (all categories Clear) then suggest advancing.
-- If quota reached with unresolved high-impact categories remaining, explicitly flag them under Deferred with rationale.
-
-Context for prioritization: $ARGUMENTS
diff --git a/.claude/commands/speckit.constitution.md b/.claude/commands/speckit.constitution.md
deleted file mode 100644
index f37fb058c..000000000
--- a/.claude/commands/speckit.constitution.md
+++ /dev/null
@@ -1,78 +0,0 @@
----
-description: Create or update the project constitution from interactive or provided principle inputs, ensuring all dependent templates stay in sync
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-You are updating the project constitution at `.specify/memory/constitution.md`. This file is a TEMPLATE containing placeholder tokens in square brackets (e.g. `[PROJECT_NAME]`, `[PRINCIPLE_1_NAME]`). Your job is to (a) collect/derive concrete values, (b) fill the template precisely, and (c) propagate any amendments across dependent artifacts.
-
-Follow this execution flow:
-
-1. Load the existing constitution template at `.specify/memory/constitution.md`.
-   - Identify every placeholder token of the form `[ALL_CAPS_IDENTIFIER]`.
-   **IMPORTANT**: The user might require less or more principles than the ones used in the template. If a number is specified, respect that - follow the general template. You will update the doc accordingly.
-
-2. Collect/derive values for placeholders:
-   - If user input (conversation) supplies a value, use it.
-   - Otherwise infer from existing repo context (README, docs, prior constitution versions if embedded).
-   - For governance dates: `RATIFICATION_DATE` is the original adoption date (if unknown ask or mark TODO), `LAST_AMENDED_DATE` is today if changes are made, otherwise keep previous.
-   - `CONSTITUTION_VERSION` must increment according to semantic versioning rules:
-     - MAJOR: Backward incompatible governance/principle removals or redefinitions.
-     - MINOR: New principle/section added or materially expanded guidance.
-     - PATCH: Clarifications, wording, typo fixes, non-semantic refinements.
-   - If version bump type ambiguous, propose reasoning before finalizing.
-
-3. Draft the updated constitution content:
-   - Replace every placeholder with concrete text (no bracketed tokens left except intentionally retained template slots that the project has chosen not to define yet—explicitly justify any left).
-   - Preserve heading hierarchy and comments can be removed once replaced unless they still add clarifying guidance.
-   - Ensure each Principle section: succinct name line, paragraph (or bullet list) capturing non‑negotiable rules, explicit rationale if not obvious.
-   - Ensure Governance section lists amendment procedure, versioning policy, and compliance review expectations.
-
-4. Consistency propagation checklist (convert prior checklist into active validations):
-   - Read `.specify/templates/plan-template.md` and ensure any "Constitution Check" or rules align with updated principles.
-   - Read `.specify/templates/spec-template.md` for scope/requirements alignment—update if constitution adds/removes mandatory sections or constraints.
-   - Read `.specify/templates/tasks-template.md` and ensure task categorization reflects new or removed principle-driven task types (e.g., observability, versioning, testing discipline).
-   - Read each command file in `.specify/templates/commands/*.md` (including this one) to verify no outdated references (agent-specific names like CLAUDE only) remain when generic guidance is required.
-   - Read any runtime guidance docs (e.g., `README.md`, `docs/quickstart.md`, or agent-specific guidance files if present). Update references to principles changed.
-
-5. Produce a Sync Impact Report (prepend as an HTML comment at top of the constitution file after update):
-   - Version change: old → new
-   - List of modified principles (old title → new title if renamed)
-   - Added sections
-   - Removed sections
-   - Templates requiring updates (✅ updated / ⚠ pending) with file paths
-   - Follow-up TODOs if any placeholders intentionally deferred.
-
-6. Validation before final output:
-   - No remaining unexplained bracket tokens.
-   - Version line matches report.
-   - Dates ISO format YYYY-MM-DD.
-   - Principles are declarative, testable, and free of vague language ("should" → replace with MUST/SHOULD rationale where appropriate).
-
-7. Write the completed constitution back to `.specify/memory/constitution.md` (overwrite).
-
-8. Output a final summary to the user with:
-   - New version and bump rationale.
-   - Any files flagged for manual follow-up.
-   - Suggested commit message (e.g., `docs: amend constitution to vX.Y.Z (principle additions + governance update)`).
-
-Formatting & Style Requirements:
-
-- Use Markdown headings exactly as in the template (do not demote/promote levels).
-- Wrap long rationale lines to keep readability (<100 chars ideally) but do not hard enforce with awkward breaks.
-- Keep a single blank line between sections.
-- Avoid trailing whitespace.
-
-If the user supplies partial updates (e.g., only one principle revision), still perform validation and version decision steps.
-
-If critical info missing (e.g., ratification date truly unknown), insert `TODO(<FIELD_NAME>): explanation` and include in the Sync Impact Report under deferred items.
-
-Do not create a new template; always operate on the existing `.specify/memory/constitution.md` file.
diff --git a/.claude/commands/speckit.implement.md b/.claude/commands/speckit.implement.md
deleted file mode 100644
index 8e2287a1c..000000000
--- a/.claude/commands/speckit.implement.md
+++ /dev/null
@@ -1,134 +0,0 @@
----
-description: Execute the implementation plan by processing and executing all tasks defined in tasks.md
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-1. Run `.specify/scripts/bash/check-prerequisites.sh --json --require-tasks --include-tasks` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-2. **Check checklists status** (if FEATURE_DIR/checklists/ exists):
-   - Scan all checklist files in the checklists/ directory
-   - For each checklist, count:
-     - Total items: All lines matching `- [ ]` or `- [X]` or `- [x]`
-     - Completed items: Lines matching `- [X]` or `- [x]`
-     - Incomplete items: Lines matching `- [ ]`
-   - Create a status table:
-
-     ```text
-     | Checklist | Total | Completed | Incomplete | Status |
-     |-----------|-------|-----------|------------|--------|
-     | ux.md     | 12    | 12        | 0          | ✓ PASS |
-     | test.md   | 8     | 5         | 3          | ✗ FAIL |
-     | security.md | 6   | 6         | 0          | ✓ PASS |
-     ```
-
-   - Calculate overall status:
-     - **PASS**: All checklists have 0 incomplete items
-     - **FAIL**: One or more checklists have incomplete items
-
-   - **If any checklist is incomplete**:
-     - Display the table with incomplete item counts
-     - **STOP** and ask: "Some checklists are incomplete. Do you want to proceed with implementation anyway? (yes/no)"
-     - Wait for user response before continuing
-     - If user says "no" or "wait" or "stop", halt execution
-     - If user says "yes" or "proceed" or "continue", proceed to step 3
-
-   - **If all checklists are complete**:
-     - Display the table showing all checklists passed
-     - Automatically proceed to step 3
-
-3. Load and analyze the implementation context:
-   - **REQUIRED**: Read tasks.md for the complete task list and execution plan
-   - **REQUIRED**: Read plan.md for tech stack, architecture, and file structure
-   - **IF EXISTS**: Read data-model.md for entities and relationships
-   - **IF EXISTS**: Read contracts/ for API specifications and test requirements
-   - **IF EXISTS**: Read research.md for technical decisions and constraints
-   - **IF EXISTS**: Read quickstart.md for integration scenarios
-
-4. **Project Setup Verification**:
-   - **REQUIRED**: Create/verify ignore files based on actual project setup:
-
-   **Detection & Creation Logic**:
-   - Check if the following command succeeds to determine if the repository is a git repo (create/verify .gitignore if so):
-
-     ```sh
-     git rev-parse --git-dir 2>/dev/null
-     ```
-
-   - Check if Dockerfile* exists or Docker in plan.md → create/verify .dockerignore
-   - Check if .eslintrc*or eslint.config.* exists → create/verify .eslintignore
-   - Check if .prettierrc* exists → create/verify .prettierignore
-   - Check if .npmrc or package.json exists → create/verify .npmignore (if publishing)
-   - Check if terraform files (*.tf) exist → create/verify .terraformignore
-   - Check if .helmignore needed (helm charts present) → create/verify .helmignore
-
-   **If ignore file already exists**: Verify it contains essential patterns, append missing critical patterns only
-   **If ignore file missing**: Create with full pattern set for detected technology
-
-   **Common Patterns by Technology** (from plan.md tech stack):
-   - **Node.js/JavaScript/TypeScript**: `node_modules/`, `dist/`, `build/`, `*.log`, `.env*`
-   - **Python**: `__pycache__/`, `*.pyc`, `.venv/`, `venv/`, `dist/`, `*.egg-info/`
-   - **Java**: `target/`, `*.class`, `*.jar`, `.gradle/`, `build/`
-   - **C#/.NET**: `bin/`, `obj/`, `*.user`, `*.suo`, `packages/`
-   - **Go**: `*.exe`, `*.test`, `vendor/`, `*.out`
-   - **Ruby**: `.bundle/`, `log/`, `tmp/`, `*.gem`, `vendor/bundle/`
-   - **PHP**: `vendor/`, `*.log`, `*.cache`, `*.env`
-   - **Rust**: `target/`, `debug/`, `release/`, `*.rs.bk`, `*.rlib`, `*.prof*`, `.idea/`, `*.log`, `.env*`
-   - **Kotlin**: `build/`, `out/`, `.gradle/`, `.idea/`, `*.class`, `*.jar`, `*.iml`, `*.log`, `.env*`
-   - **C++**: `build/`, `bin/`, `obj/`, `out/`, `*.o`, `*.so`, `*.a`, `*.exe`, `*.dll`, `.idea/`, `*.log`, `.env*`
-   - **C**: `build/`, `bin/`, `obj/`, `out/`, `*.o`, `*.a`, `*.so`, `*.exe`, `Makefile`, `config.log`, `.idea/`, `*.log`, `.env*`
-   - **Swift**: `.build/`, `DerivedData/`, `*.swiftpm/`, `Packages/`
-   - **R**: `.Rproj.user/`, `.Rhistory`, `.RData`, `.Ruserdata`, `*.Rproj`, `packrat/`, `renv/`
-   - **Universal**: `.DS_Store`, `Thumbs.db`, `*.tmp`, `*.swp`, `.vscode/`, `.idea/`
-
-   **Tool-Specific Patterns**:
-   - **Docker**: `node_modules/`, `.git/`, `Dockerfile*`, `.dockerignore`, `*.log*`, `.env*`, `coverage/`
-   - **ESLint**: `node_modules/`, `dist/`, `build/`, `coverage/`, `*.min.js`
-   - **Prettier**: `node_modules/`, `dist/`, `build/`, `coverage/`, `package-lock.json`, `yarn.lock`, `pnpm-lock.yaml`
-   - **Terraform**: `.terraform/`, `*.tfstate*`, `*.tfvars`, `.terraform.lock.hcl`
-   - **Kubernetes/k8s**: `*.secret.yaml`, `secrets/`, `.kube/`, `kubeconfig*`, `*.key`, `*.crt`
-
-5. Parse tasks.md structure and extract:
-   - **Task phases**: Setup, Tests, Core, Integration, Polish
-   - **Task dependencies**: Sequential vs parallel execution rules
-   - **Task details**: ID, description, file paths, parallel markers [P]
-   - **Execution flow**: Order and dependency requirements
-
-6. Execute implementation following the task plan:
-   - **Phase-by-phase execution**: Complete each phase before moving to the next
-   - **Respect dependencies**: Run sequential tasks in order, parallel tasks [P] can run together
-   - **Follow TDD approach**: Execute test tasks before their corresponding implementation tasks
-   - **File-based coordination**: Tasks affecting the same files must run sequentially
-   - **Validation checkpoints**: Verify each phase completion before proceeding
-
-7. Implementation execution rules:
-   - **Setup first**: Initialize project structure, dependencies, configuration
-   - **Tests before code**: If you need to write tests for contracts, entities, and integration scenarios
-   - **Core development**: Implement models, services, CLI commands, endpoints
-   - **Integration work**: Database connections, middleware, logging, external services
-   - **Polish and validation**: Unit tests, performance optimization, documentation
-
-8. Progress tracking and error handling:
-   - Report progress after each completed task
-   - Halt execution if any non-parallel task fails
-   - For parallel tasks [P], continue with successful tasks, report failed ones
-   - Provide clear error messages with context for debugging
-   - Suggest next steps if implementation cannot proceed
-   - **IMPORTANT** For completed tasks, make sure to mark the task off as [X] in the tasks file.
-
-9. Completion validation:
-   - Verify all required tasks are completed
-   - Check that implemented features match the original specification
-   - Validate that tests pass and coverage meets requirements
-   - Confirm the implementation follows the technical plan
-   - Report final status with summary of completed work
-
-Note: This command assumes a complete task breakdown exists in tasks.md. If tasks are incomplete or missing, suggest running `/speckit.tasks` first to regenerate the task list.
diff --git a/.claude/commands/speckit.plan.md b/.claude/commands/speckit.plan.md
deleted file mode 100644
index 67188c688..000000000
--- a/.claude/commands/speckit.plan.md
+++ /dev/null
@@ -1,81 +0,0 @@
----
-description: Execute the implementation planning workflow using the plan template to generate design artifacts.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-1. **Setup**: Run `.specify/scripts/bash/setup-plan.sh --json` from repo root and parse JSON for FEATURE_SPEC, IMPL_PLAN, SPECS_DIR, BRANCH. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-2. **Load context**: Read FEATURE_SPEC and `.specify/memory/constitution.md`. Load IMPL_PLAN template (already copied).
-
-3. **Execute plan workflow**: Follow the structure in IMPL_PLAN template to:
-   - Fill Technical Context (mark unknowns as "NEEDS CLARIFICATION")
-   - Fill Constitution Check section from constitution
-   - Evaluate gates (ERROR if violations unjustified)
-   - Phase 0: Generate research.md (resolve all NEEDS CLARIFICATION)
-   - Phase 1: Generate data-model.md, contracts/, quickstart.md
-   - Phase 1: Update agent context by running the agent script
-   - Re-evaluate Constitution Check post-design
-
-4. **Stop and report**: Command ends after Phase 2 planning. Report branch, IMPL_PLAN path, and generated artifacts.
-
-## Phases
-
-### Phase 0: Outline & Research
-
-1. **Extract unknowns from Technical Context** above:
-   - For each NEEDS CLARIFICATION → research task
-   - For each dependency → best practices task
-   - For each integration → patterns task
-
-2. **Generate and dispatch research agents**:
-
-   ```text
-   For each unknown in Technical Context:
-     Task: "Research {unknown} for {feature context}"
-   For each technology choice:
-     Task: "Find best practices for {tech} in {domain}"
-   ```
-
-3. **Consolidate findings** in `research.md` using format:
-   - Decision: [what was chosen]
-   - Rationale: [why chosen]
-   - Alternatives considered: [what else evaluated]
-
-**Output**: research.md with all NEEDS CLARIFICATION resolved
-
-### Phase 1: Design & Contracts
-
-**Prerequisites:** `research.md` complete
-
-1. **Extract entities from feature spec** → `data-model.md`:
-   - Entity name, fields, relationships
-   - Validation rules from requirements
-   - State transitions if applicable
-
-2. **Generate API contracts** from functional requirements:
-   - For each user action → endpoint
-   - Use standard REST/GraphQL patterns
-   - Output OpenAPI/GraphQL schema to `/contracts/`
-
-3. **Agent context update**:
-   - Run `.specify/scripts/bash/update-agent-context.sh claude`
-   - These scripts detect which AI agent is in use
-   - Update the appropriate agent-specific context file
-   - Add only new technology from current plan
-   - Preserve manual additions between markers
-
-**Output**: data-model.md, /contracts/*, quickstart.md, agent-specific file
-
-## Key rules
-
-- Use absolute paths
-- ERROR on gate failures or unresolved clarifications
diff --git a/.claude/commands/speckit.specify.md b/.claude/commands/speckit.specify.md
deleted file mode 100644
index 1705f5e83..000000000
--- a/.claude/commands/speckit.specify.md
+++ /dev/null
@@ -1,249 +0,0 @@
----
-description: Create or update the feature specification from a natural language feature description.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-The text the user typed after `/speckit.specify` in the triggering message **is** the feature description. Assume you always have it available in this conversation even if `$ARGUMENTS` appears literally below. Do not ask the user to repeat it unless they provided an empty command.
-
-Given that feature description, do this:
-
-1. **Generate a concise short name** (2-4 words) for the branch:
-   - Analyze the feature description and extract the most meaningful keywords
-   - Create a 2-4 word short name that captures the essence of the feature
-   - Use action-noun format when possible (e.g., "add-user-auth", "fix-payment-bug")
-   - Preserve technical terms and acronyms (OAuth2, API, JWT, etc.)
-   - Keep it concise but descriptive enough to understand the feature at a glance
-   - Examples:
-     - "I want to add user authentication" → "user-auth"
-     - "Implement OAuth2 integration for the API" → "oauth2-api-integration"
-     - "Create a dashboard for analytics" → "analytics-dashboard"
-     - "Fix payment processing timeout bug" → "fix-payment-timeout"
-
-2. **Check for existing branches before creating new one**:
-
-   a. First, fetch all remote branches to ensure we have the latest information:
-      ```bash
-      git fetch --all --prune
-      ```
-
-   b. Find the highest feature number across all sources for the short-name:
-      - Remote branches: `git ls-remote --heads origin | grep -E 'refs/heads/[0-9]+-<short-name>$'`
-      - Local branches: `git branch | grep -E '^[* ]*[0-9]+-<short-name>$'`
-      - Specs directories: Check for directories matching `specs/[0-9]+-<short-name>`
-
-   c. Determine the next available number:
-      - Extract all numbers from all three sources
-      - Find the highest number N
-      - Use N+1 for the new branch number
-
-   d. Run the script `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS"` with the calculated number and short-name:
-      - Pass `--number N+1` and `--short-name "your-short-name"` along with the feature description
-      - Bash example: `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS" --json --number 5 --short-name "user-auth" "Add user authentication"`
-      - PowerShell example: `.specify/scripts/bash/create-new-feature.sh --json "$ARGUMENTS" -Json -Number 5 -ShortName "user-auth" "Add user authentication"`
-
-   **IMPORTANT**:
-   - Check all three sources (remote branches, local branches, specs directories) to find the highest number
-   - Only match branches/directories with the exact short-name pattern
-   - If no existing branches/directories found with this short-name, start with number 1
-   - You must only ever run this script once per feature
-   - The JSON is provided in the terminal as output - always refer to it to get the actual content you're looking for
-   - The JSON output will contain BRANCH_NAME and SPEC_FILE paths
-   - For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot")
-
-3. Load `.specify/templates/spec-template.md` to understand required sections.
-
-4. Follow this execution flow:
-
-    1. Parse user description from Input
-       If empty: ERROR "No feature description provided"
-    2. Extract key concepts from description
-       Identify: actors, actions, data, constraints
-    3. For unclear aspects:
-       - Make informed guesses based on context and industry standards
-       - Only mark with [NEEDS CLARIFICATION: specific question] if:
-         - The choice significantly impacts feature scope or user experience
-         - Multiple reasonable interpretations exist with different implications
-         - No reasonable default exists
-       - **LIMIT: Maximum 3 [NEEDS CLARIFICATION] markers total**
-       - Prioritize clarifications by impact: scope > security/privacy > user experience > technical details
-    4. Fill User Scenarios & Testing section
-       If no clear user flow: ERROR "Cannot determine user scenarios"
-    5. Generate Functional Requirements
-       Each requirement must be testable
-       Use reasonable defaults for unspecified details (document assumptions in Assumptions section)
-    6. Define Success Criteria
-       Create measurable, technology-agnostic outcomes
-       Include both quantitative metrics (time, performance, volume) and qualitative measures (user satisfaction, task completion)
-       Each criterion must be verifiable without implementation details
-    7. Identify Key Entities (if data involved)
-    8. Return: SUCCESS (spec ready for planning)
-
-5. Write the specification to SPEC_FILE using the template structure, replacing placeholders with concrete details derived from the feature description (arguments) while preserving section order and headings.
-
-6. **Specification Quality Validation**: After writing the initial spec, validate it against quality criteria:
-
-   a. **Create Spec Quality Checklist**: Generate a checklist file at `FEATURE_DIR/checklists/requirements.md` using the checklist template structure with these validation items:
-
-      ```markdown
-      # Specification Quality Checklist: [FEATURE NAME]
-
-      **Purpose**: Validate specification completeness and quality before proceeding to planning
-      **Created**: [DATE]
-      **Feature**: [Link to spec.md]
-
-      ## Content Quality
-
-      - [ ] No implementation details (languages, frameworks, APIs)
-      - [ ] Focused on user value and business needs
-      - [ ] Written for non-technical stakeholders
-      - [ ] All mandatory sections completed
-
-      ## Requirement Completeness
-
-      - [ ] No [NEEDS CLARIFICATION] markers remain
-      - [ ] Requirements are testable and unambiguous
-      - [ ] Success criteria are measurable
-      - [ ] Success criteria are technology-agnostic (no implementation details)
-      - [ ] All acceptance scenarios are defined
-      - [ ] Edge cases are identified
-      - [ ] Scope is clearly bounded
-      - [ ] Dependencies and assumptions identified
-
-      ## Feature Readiness
-
-      - [ ] All functional requirements have clear acceptance criteria
-      - [ ] User scenarios cover primary flows
-      - [ ] Feature meets measurable outcomes defined in Success Criteria
-      - [ ] No implementation details leak into specification
-
-      ## Notes
-
-      - Items marked incomplete require spec updates before `/speckit.clarify` or `/speckit.plan`
-      ```
-
-   b. **Run Validation Check**: Review the spec against each checklist item:
-      - For each item, determine if it passes or fails
-      - Document specific issues found (quote relevant spec sections)
-
-   c. **Handle Validation Results**:
-
-      - **If all items pass**: Mark checklist complete and proceed to step 6
-
-      - **If items fail (excluding [NEEDS CLARIFICATION])**:
-        1. List the failing items and specific issues
-        2. Update the spec to address each issue
-        3. Re-run validation until all items pass (max 3 iterations)
-        4. If still failing after 3 iterations, document remaining issues in checklist notes and warn user
-
-      - **If [NEEDS CLARIFICATION] markers remain**:
-        1. Extract all [NEEDS CLARIFICATION: ...] markers from the spec
-        2. **LIMIT CHECK**: If more than 3 markers exist, keep only the 3 most critical (by scope/security/UX impact) and make informed guesses for the rest
-        3. For each clarification needed (max 3), present options to user in this format:
-
-           ```markdown
-           ## Question [N]: [Topic]
-
-           **Context**: [Quote relevant spec section]
-
-           **What we need to know**: [Specific question from NEEDS CLARIFICATION marker]
-
-           **Suggested Answers**:
-
-           | Option | Answer | Implications |
-           |--------|--------|--------------|
-           | A      | [First suggested answer] | [What this means for the feature] |
-           | B      | [Second suggested answer] | [What this means for the feature] |
-           | C      | [Third suggested answer] | [What this means for the feature] |
-           | Custom | Provide your own answer | [Explain how to provide custom input] |
-
-           **Your choice**: _[Wait for user response]_
-           ```
-
-        4. **CRITICAL - Table Formatting**: Ensure markdown tables are properly formatted:
-           - Use consistent spacing with pipes aligned
-           - Each cell should have spaces around content: `| Content |` not `|Content|`
-           - Header separator must have at least 3 dashes: `|--------|`
-           - Test that the table renders correctly in markdown preview
-        5. Number questions sequentially (Q1, Q2, Q3 - max 3 total)
-        6. Present all questions together before waiting for responses
-        7. Wait for user to respond with their choices for all questions (e.g., "Q1: A, Q2: Custom - [details], Q3: B")
-        8. Update the spec by replacing each [NEEDS CLARIFICATION] marker with the user's selected or provided answer
-        9. Re-run validation after all clarifications are resolved
-
-   d. **Update Checklist**: After each validation iteration, update the checklist file with current pass/fail status
-
-7. Report completion with branch name, spec file path, checklist results, and readiness for the next phase (`/speckit.clarify` or `/speckit.plan`).
-
-**NOTE:** The script creates and checks out the new branch and initializes the spec file before writing.
-
-## General Guidelines
-
-## Quick Guidelines
-
-- Focus on **WHAT** users need and **WHY**.
-- Avoid HOW to implement (no tech stack, APIs, code structure).
-- Written for business stakeholders, not developers.
-- DO NOT create any checklists that are embedded in the spec. That will be a separate command.
-
-### Section Requirements
-
-- **Mandatory sections**: Must be completed for every feature
-- **Optional sections**: Include only when relevant to the feature
-- When a section doesn't apply, remove it entirely (don't leave as "N/A")
-
-### For AI Generation
-
-When creating this spec from a user prompt:
-
-1. **Make informed guesses**: Use context, industry standards, and common patterns to fill gaps
-2. **Document assumptions**: Record reasonable defaults in the Assumptions section
-3. **Limit clarifications**: Maximum 3 [NEEDS CLARIFICATION] markers - use only for critical decisions that:
-   - Significantly impact feature scope or user experience
-   - Have multiple reasonable interpretations with different implications
-   - Lack any reasonable default
-4. **Prioritize clarifications**: scope > security/privacy > user experience > technical details
-5. **Think like a tester**: Every vague requirement should fail the "testable and unambiguous" checklist item
-6. **Common areas needing clarification** (only if no reasonable default exists):
-   - Feature scope and boundaries (include/exclude specific use cases)
-   - User types and permissions (if multiple conflicting interpretations possible)
-   - Security/compliance requirements (when legally/financially significant)
-
-**Examples of reasonable defaults** (don't ask about these):
-
-- Data retention: Industry-standard practices for the domain
-- Performance targets: Standard web/mobile app expectations unless specified
-- Error handling: User-friendly messages with appropriate fallbacks
-- Authentication method: Standard session-based or OAuth2 for web apps
-- Integration patterns: RESTful APIs unless specified otherwise
-
-### Success Criteria Guidelines
-
-Success criteria must be:
-
-1. **Measurable**: Include specific metrics (time, percentage, count, rate)
-2. **Technology-agnostic**: No mention of frameworks, languages, databases, or tools
-3. **User-focused**: Describe outcomes from user/business perspective, not system internals
-4. **Verifiable**: Can be tested/validated without knowing implementation details
-
-**Good examples**:
-
-- "Users can complete checkout in under 3 minutes"
-- "System supports 10,000 concurrent users"
-- "95% of searches return results in under 1 second"
-- "Task completion rate improves by 40%"
-
-**Bad examples** (implementation-focused):
-
-- "API response time is under 200ms" (too technical, use "Users see results instantly")
-- "Database can handle 1000 TPS" (implementation detail, use user-facing metric)
-- "React components render efficiently" (framework-specific)
-- "Redis cache hit rate above 80%" (technology-specific)
diff --git a/.claude/commands/speckit.tasks.md b/.claude/commands/speckit.tasks.md
deleted file mode 100644
index 634e0b698..000000000
--- a/.claude/commands/speckit.tasks.md
+++ /dev/null
@@ -1,128 +0,0 @@
----
-description: Generate an actionable, dependency-ordered tasks.md for the feature based on available design artifacts.
----
-
-## User Input
-
-```text
-$ARGUMENTS
-```
-
-You **MUST** consider the user input before proceeding (if not empty).
-
-## Outline
-
-1. **Setup**: Run `.specify/scripts/bash/check-prerequisites.sh --json` from repo root and parse FEATURE_DIR and AVAILABLE_DOCS list. All paths must be absolute. For single quotes in args like "I'm Groot", use escape syntax: e.g 'I'\''m Groot' (or double-quote if possible: "I'm Groot").
-
-2. **Load design documents**: Read from FEATURE_DIR:
-   - **Required**: plan.md (tech stack, libraries, structure), spec.md (user stories with priorities)
-   - **Optional**: data-model.md (entities), contracts/ (API endpoints), research.md (decisions), quickstart.md (test scenarios)
-   - Note: Not all projects have all documents. Generate tasks based on what's available.
-
-3. **Execute task generation workflow**:
-   - Load plan.md and extract tech stack, libraries, project structure
-   - Load spec.md and extract user stories with their priorities (P1, P2, P3, etc.)
-   - If data-model.md exists: Extract entities and map to user stories
-   - If contracts/ exists: Map endpoints to user stories
-   - If research.md exists: Extract decisions for setup tasks
-   - Generate tasks organized by user story (see Task Generation Rules below)
-   - Generate dependency graph showing user story completion order
-   - Create parallel execution examples per user story
-   - Validate task completeness (each user story has all needed tasks, independently testable)
-
-4. **Generate tasks.md**: Use `.specify.specify/templates/tasks-template.md` as structure, fill with:
-   - Correct feature name from plan.md
-   - Phase 1: Setup tasks (project initialization)
-   - Phase 2: Foundational tasks (blocking prerequisites for all user stories)
-   - Phase 3+: One phase per user story (in priority order from spec.md)
-   - Each phase includes: story goal, independent test criteria, tests (if requested), implementation tasks
-   - Final Phase: Polish & cross-cutting concerns
-   - All tasks must follow the strict checklist format (see Task Generation Rules below)
-   - Clear file paths for each task
-   - Dependencies section showing story completion order
-   - Parallel execution examples per story
-   - Implementation strategy section (MVP first, incremental delivery)
-
-5. **Report**: Output path to generated tasks.md and summary:
-   - Total task count
-   - Task count per user story
-   - Parallel opportunities identified
-   - Independent test criteria for each story
-   - Suggested MVP scope (typically just User Story 1)
-   - Format validation: Confirm ALL tasks follow the checklist format (checkbox, ID, labels, file paths)
-
-Context for task generation: $ARGUMENTS
-
-The tasks.md should be immediately executable - each task must be specific enough that an LLM can complete it without additional context.
-
-## Task Generation Rules
-
-**CRITICAL**: Tasks MUST be organized by user story to enable independent implementation and testing.
-
-**Tests are OPTIONAL**: Only generate test tasks if explicitly requested in the feature specification or if user requests TDD approach.
-
-### Checklist Format (REQUIRED)
-
-Every task MUST strictly follow this format:
-
-```text
-- [ ] [TaskID] [P?] [Story?] Description with file path
-```
-
-**Format Components**:
-
-1. **Checkbox**: ALWAYS start with `- [ ]` (markdown checkbox)
-2. **Task ID**: Sequential number (T001, T002, T003...) in execution order
-3. **[P] marker**: Include ONLY if task is parallelizable (different files, no dependencies on incomplete tasks)
-4. **[Story] label**: REQUIRED for user story phase tasks only
-   - Format: [US1], [US2], [US3], etc. (maps to user stories from spec.md)
-   - Setup phase: NO story label
-   - Foundational phase: NO story label
-   - User Story phases: MUST have story label
-   - Polish phase: NO story label
-5. **Description**: Clear action with exact file path
-
-**Examples**:
-
-- ✅ CORRECT: `- [ ] T001 Create project structure per implementation plan`
-- ✅ CORRECT: `- [ ] T005 [P] Implement authentication middleware in src/middleware/auth.py`
-- ✅ CORRECT: `- [ ] T012 [P] [US1] Create User model in src/models/user.py`
-- ✅ CORRECT: `- [ ] T014 [US1] Implement UserService in src/services/user_service.py`
-- ❌ WRONG: `- [ ] Create User model` (missing ID and Story label)
-- ❌ WRONG: `T001 [US1] Create model` (missing checkbox)
-- ❌ WRONG: `- [ ] [US1] Create User model` (missing Task ID)
-- ❌ WRONG: `- [ ] T001 [US1] Create model` (missing file path)
-
-### Task Organization
-
-1. **From User Stories (spec.md)** - PRIMARY ORGANIZATION:
-   - Each user story (P1, P2, P3...) gets its own phase
-   - Map all related components to their story:
-     - Models needed for that story
-     - Services needed for that story
-     - Endpoints/UI needed for that story
-     - If tests requested: Tests specific to that story
-   - Mark story dependencies (most stories should be independent)
-
-2. **From Contracts**:
-   - Map each contract/endpoint → to the user story it serves
-   - If tests requested: Each contract → contract test task [P] before implementation in that story's phase
-
-3. **From Data Model**:
-   - Map each entity to the user story(ies) that need it
-   - If entity serves multiple stories: Put in earliest story or Setup phase
-   - Relationships → service layer tasks in appropriate story phase
-
-4. **From Setup/Infrastructure**:
-   - Shared infrastructure → Setup phase (Phase 1)
-   - Foundational/blocking tasks → Foundational phase (Phase 2)
-   - Story-specific setup → within that story's phase
-
-### Phase Structure
-
-- **Phase 1**: Setup (project initialization)
-- **Phase 2**: Foundational (blocking prerequisites - MUST complete before user stories)
-- **Phase 3+**: User Stories in priority order (P1, P2, P3...)
-  - Within each story: Tests (if requested) → Models → Services → Endpoints → Integration
-  - Each phase should be a complete, independently testable increment
-- **Final Phase**: Polish & Cross-Cutting Concerns
diff --git a/.claude/settings.json b/.claude/settings.json
new file mode 100644
index 000000000..3615049c5
--- /dev/null
+++ b/.claude/settings.json
@@ -0,0 +1,70 @@
+{
+  "hooks": {
+    "PreToolUse": [
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being edited matches 'components/frontend/**/*.tsx' and the new content contains raw HTML elements like '<button', '<input', '<select', '<dialog', or '<textarea' (not as part of a Shadcn component import), remind: 'Use Shadcn UI components from @/components/ui/ instead of raw HTML elements.'"
+          }
+        ]
+      },
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being edited matches 'components/frontend/src/app/**/*.tsx' or 'components/frontend/src/components/**/*.tsx' and the new content contains a 'fetch(' call that is not inside a services/api/ file, remind: 'Use React Query hooks from @/services/queries/ instead of manual fetch() calls in components.'"
+          }
+        ]
+      },
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being edited matches 'components/backend/handlers/**/*.go' and the new content uses 'DynamicClient.Resource' or 'K8sClient' directly for List or Get operations (not after a GetK8sClientsForRequest call), remind: 'User operations MUST use GetK8sClientsForRequest(c), not the backend service account.'"
+          }
+        ]
+      },
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being edited matches 'components/backend/**/*.go' or 'components/operator/**/*.go' (excluding *_test.go files) and the new content contains 'panic(', remind: 'Do not use panic() in production code. Return fmt.Errorf with context instead.'"
+          }
+        ]
+      },
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being edited is under '.claude/skills/', remind: 'Follow the Anthropic skill-creator standard. Required: pushy description, under 500 lines, explanation over rigidity, evals in evals/evals.json.'"
+          }
+        ]
+      },
+      {
+        "matcher": "Write",
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "If the file being written matches 'components/frontend/src/app/**/page.tsx' or 'components/backend/handlers/**/*.go' and it is a NEW file (not editing an existing one), remind: 'New feature code detected. Consider gating behind a feature flag. Use /unleash-flag to set one up.'"
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "bash scripts/claude-hooks/stop-review.sh"
+          }
+        ]
+      }
+    ]
+  }
+}
diff --git a/.claude/skills/align/SKILL.md b/.claude/skills/align/SKILL.md
new file mode 100644
index 000000000..b947a78e4
--- /dev/null
+++ b/.claude/skills/align/SKILL.md
@@ -0,0 +1,82 @@
+---
+name: align
+description: >
+  Run a convention alignment check across the codebase to measure adherence to
+  documented standards. Use when you want to check health of the codebase,
+  verify convention compliance, get a scored report, find violations, check
+  alignment before a release, or run periodic quality scans. Triggers on:
+  "check conventions", "alignment scan", "codebase health", "are we following
+  our standards", "convention violations", "quality check", "how aligned are we".
+---
+
+# Convention Alignment Check
+
+Measure codebase adherence to documented conventions. Produces a scored report across 5 categories with ~34 checks.
+
+## Usage
+
+```text
+/align                # Full codebase scan
+/align backend        # Backend checks only
+/align frontend       # Frontend checks only
+/align operator       # Operator checks only
+/align runner         # Runner checks only
+/align security       # Security checks only
+```
+
+## User Input
+
+```text
+$ARGUMENTS
+```
+
+## How It Works
+
+1. **Parse scope** from `$ARGUMENTS` (default: full scan)
+2. **Dispatch** the `convention-eval` agent with the scope
+   - The agent runs in its own context window
+   - It loads component-level convention docs
+   - It runs all checks via grep/glob/bash
+   - It produces a scored report
+3. **Display** the report to the user
+
+## Dispatching the Agent
+
+Use the Agent tool to dispatch the convention-eval agent:
+
+```javascript
+Agent({
+  description: "Convention alignment check",
+  prompt: "You are the convention-eval agent. Read your definition at .claude/agents/convention-eval.md, then run a ${scope} convention alignment check. Scope: ${scope} (one of: full, backend, frontend, operator, runner, security). Load all context files listed in your definition, run the checks, and report findings in the standard output format with scores."
+})
+```
+
+If no scope is specified, run all categories.
+
+## Report Format
+
+The agent produces a markdown report with:
+- Overall weighted score (0-100%)
+- Per-category scores (Backend, Frontend, Operator, Runner, Security)
+- Pass/fail per check with file:line references
+- Failures grouped by severity (Blocker > Critical > Major > Minor)
+- Top 3 recommendations for improvement
+
+## Categories and Weights
+
+| Category | Checks | Weight | Key concerns |
+|----------|--------|--------|-------------|
+| Backend | 8 | 25% | panic, service account misuse, error handling |
+| Frontend | 8 | 25% | any types, raw HTML, manual fetch |
+| Operator | 7 | 20% | OwnerReferences, SecurityContext, reconciliation |
+| Runner | 4 | 10% | async patterns, credential handling |
+| Security | 7 | 20% | RBAC, token redaction, input validation |
+
+## Interpreting Results
+
+- **90-100%**: Excellent alignment. Ship with confidence.
+- **70-89%**: Good alignment. Address blockers before merge.
+- **50-69%**: Moderate alignment. Technical debt accumulating.
+- **Below 50%**: Significant drift. Prioritize convention adherence.
+
+Any **Blocker** failures should be addressed immediately regardless of overall score.
diff --git a/.claude/skills/align/evals/evals.json b/.claude/skills/align/evals/evals.json
new file mode 100644
index 000000000..e74e7efc4
--- /dev/null
+++ b/.claude/skills/align/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "/align",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "align" },
+    "description": "Direct slash command for full codebase alignment"
+  },
+  {
+    "input": "check how well we're following our conventions",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "align" },
+    "description": "Natural language convention compliance check"
+  },
+  {
+    "input": "/align frontend",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "align", "args": "frontend" },
+    "description": "Scoped alignment check for frontend only"
+  }
+]
diff --git a/.claude/commands/amber.review.md b/.claude/skills/amber-review/SKILL.md
similarity index 51%
rename from .claude/commands/amber.review.md
rename to .claude/skills/amber-review/SKILL.md
index cc2723ec4..fc7882740 100644
--- a/.claude/commands/amber.review.md
+++ b/.claude/skills/amber-review/SKILL.md
@@ -1,32 +1,38 @@
 ---
-description: Perform a comprehensive code review using repository-specific standards from the Amber memory system.
+name: amber-review
+description: >
+  Perform a comprehensive code review using repository-specific standards. Use
+  when reviewing code changes, checking PR quality, auditing convention compliance,
+  or validating changes before merge. Triggers on: "review code", "check changes",
+  "code review", "amber review", "review PR", "audit conventions", "quality check".
 ---
 
+# Amber Review
+
+Stringent, standards-driven code review against this repository's documented patterns, security requirements, and architectural conventions.
+
 ## User Input
 
 ```text
 $ARGUMENTS
 ```
 
-You **MUST** consider the user input before proceeding (if not empty). The input may specify files, a PR number, a branch, or a focus area.
-
-## Goal
-
-Perform a stringent, standards-driven code review against this repository's documented patterns, security requirements, and architectural conventions.
+Consider the user input before proceeding (if not empty). The input may specify files, a PR number, a branch, or a focus area.
 
 ## Execution Steps
 
-### 1. Load Memory System
+### 1. Load Review Context
 
 Read all of the following files to build your review context. Do not skip any.
 
 1. `CLAUDE.md` (master project instructions)
-2. `.claude/context/backend-development.md` (Go backend, Gin, K8s integration)
-3. `.claude/context/frontend-development.md` (NextJS, Shadcn UI, React Query)
-4. `.claude/context/security-standards.md` (auth, RBAC, token handling, container security)
-5. `.claude/patterns/k8s-client-usage.md` (user token vs service account)
-6. `.claude/patterns/error-handling.md` (consistent error patterns)
-7. `.claude/patterns/react-query-usage.md` (data fetching patterns)
+2. `components/backend/DEVELOPMENT.md` (Go backend, Gin, K8s integration)
+3. `components/frontend/DEVELOPMENT.md` (NextJS, Shadcn UI, React Query)
+4. `docs/security-standards.md` (auth, RBAC, token handling, container security)
+5. `components/backend/K8S_CLIENT_PATTERNS.md` (user token vs service account)
+6. `components/backend/ERROR_PATTERNS.md` (consistent error patterns)
+7. `components/frontend/REACT_QUERY_PATTERNS.md` (data fetching patterns)
+8. `components/operator/DEVELOPMENT.md` (K8s operator, reconciliation, OwnerReferences)
 
 ### 2. Identify Changes to Review
 
@@ -39,16 +45,16 @@ Determine the scope based on user input:
 
 ### 3. Perform Review
 
-Evaluate every changed file against the loaded standards. Apply ALL relevant checks — do not cherry-pick.
+Evaluate every changed file against the loaded standards. Apply ALL relevant checks.
 
 #### Review Axes
 
-1. **Code Quality** — Does it follow CLAUDE.md patterns? Naming conventions? No unnecessary comments?
-2. **Security** — User token auth (`GetK8sClientsForRequest`), RBAC checks before operations, token redaction in logs, input validation, SecurityContext on Job pods, no secrets in code
+1. **Code Quality** — Does it follow CLAUDE.md patterns? Naming conventions?
+2. **Security** — User token auth (`GetK8sClientsForRequest`), RBAC checks, token redaction, input validation, SecurityContext on Job pods, no secrets in code
 3. **Performance** — Unnecessary re-renders, missing query key parameters, N+1 queries, unbounded list operations
 4. **Testing** — Adequate coverage for new functionality? Tests follow existing patterns?
-5. **Architecture** — Follows project structure from memory context? Correct layer separation (api/ vs queries/ in frontend, handlers/ vs types/ in backend)?
-6. **Error Handling** — Follows error handling patterns? No `panic()`, no silent failures, wrapped errors with context, generic user messages with detailed server logs
+5. **Architecture** — Follows project structure? Correct layer separation?
+6. **Error Handling** — No `panic()`, no silent failures, wrapped errors with context, generic user messages with detailed server logs
 
 #### Backend-Specific Checks (Go)
 
@@ -81,58 +87,52 @@ Evaluate every changed file against the loaded standards. Apply ALL relevant che
 
 ### 4. Classify Findings by Severity
 
-Assign each finding exactly one severity level:
-
-- **Blocker** — Must fix before merge. Security vulnerabilities, data loss risk, service account misuse for user operations, token leaks
-- **Critical** — Should fix before merge. RBAC bypasses, missing error handling on K8s operations, `any` types in new code, `panic()` in handlers
-- **Major** — Important to address. Architecture violations, missing tests for new logic, performance concerns, pattern deviations
-- **Minor** — Nice-to-have. Style improvements, documentation gaps, minor naming inconsistencies
+- **Blocker** — Must fix before merge. Security vulnerabilities, data loss risk, service account misuse, token leaks
+- **Critical** — Should fix before merge. RBAC bypasses, missing error handling, `any` types, `panic()` in handlers
+- **Major** — Important to address. Architecture violations, missing tests, performance concerns
+- **Minor** — Nice-to-have. Style improvements, documentation gaps
 
 ### 5. Produce Review Report
 
-Output the review in this exact format:
-
 ```markdown
 # Claude Code Review
 
 ## Summary
-[1-3 sentence overview of the changes and overall assessment]
+[1-3 sentence overview]
 
-## Issues by Severity
+## Findings
 
-### Blocker Issues
-[Must fix before merge — or "None" if clean]
+### Blocker
+[Must fix — or "None"]
 
-### Critical Issues
-[Should fix before merge — or "None"]
+### Critical
+[Should fix — or "None"]
 
-### Major Issues
-[Important to address — or "None"]
+### Major
+[Important — or "None"]
 
-### Minor Issues
-[Nice-to-have improvements — or "None"]
+### Minor
+[Nice-to-have — or "None"]
 
 ## Positive Highlights
 [Things done well — always include at least one]
 
 ## Recommendations
-[Prioritized action items, most important first]
+[Prioritized action items]
 ```
 
-For each issue, include:
-- File path and line number(s)
-- What the problem is
-- Which standard it violates (reference the memory file)
-- Suggested fix (code snippet when helpful)
+For each issue, include: file path and line number, what the problem is, which standard it violates, suggested fix.
 
-## Operating Principles
+## When to Use This vs Individual Agents
 
-- **Be stringent**: This is a quality gate, not a rubber stamp. Flag real issues.
-- **Be specific**: Reference exact file:line, exact standard violated, exact fix.
-- **Be fair**: Always acknowledge what was done well in Positive Highlights.
-- **No false positives**: Only flag issues backed by the loaded standards. Do not invent rules.
-- **Existing code is not in scope**: Only review changed/added lines unless existing code is directly affected.
+- **`/amber-review`**: Comprehensive single-session review across all components. Best for pre-merge quality gates.
+- **Individual agents** (backend-review, frontend-review, operator-review, runner-review, security-review): Specialized checks for a single component. Best for focused work on one area or ongoing automated checks.
+- **`/align`**: Codebase-wide convention scoring. Best for periodic health checks.
 
-## Context
+## Operating Principles
 
-$ARGUMENTS
+- **Be stringent**: This is a quality gate, not a rubber stamp.
+- **Be specific**: Reference exact file:line, exact standard, exact fix.
+- **Be fair**: Always acknowledge what was done well.
+- **No false positives**: Only flag issues backed by loaded standards.
+- **Existing code is not in scope**: Only review changed/added lines.
diff --git a/.claude/skills/amber-review/evals/evals.json b/.claude/skills/amber-review/evals/evals.json
new file mode 100644
index 000000000..c99ec5a93
--- /dev/null
+++ b/.claude/skills/amber-review/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "/amber-review",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "amber-review" },
+    "description": "Direct slash command invocation"
+  },
+  {
+    "input": "review my code changes before I commit",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "amber-review" },
+    "description": "Natural language code review request"
+  },
+  {
+    "input": "review PR 1234",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "amber-review", "args": "1234" },
+    "description": "PR-specific review request"
+  }
+]
diff --git a/.claude/skills/cypress-demo/SKILL.md b/.claude/skills/cypress-demo/SKILL.md
new file mode 100644
index 000000000..e65600ae9
--- /dev/null
+++ b/.claude/skills/cypress-demo/SKILL.md
@@ -0,0 +1,80 @@
+---
+name: cypress-demo
+description: >
+  Create a Cypress-based video demo for a feature branch with cursor, click
+  effects, and captions. Use when recording feature demos, creating PR videos,
+  showcasing UI changes, or generating visual walkthroughs. Triggers on: "demo",
+  "record demo", "create demo video", "cypress demo", "feature walkthrough",
+  "PR video", "showcase".
+---
+
+# Cypress Demo
+
+Create a polished Cypress demo test that records a human-paced video walkthrough of UI features on the current branch.
+
+## Usage
+
+```bash
+/cypress-demo                          # Auto-detect features from branch diff
+/cypress-demo chat input refactoring   # Describe what to demo
+```
+
+## User Input
+
+```text
+$ARGUMENTS
+```
+
+## Behavior
+
+When invoked, create a Cypress test file in `e2e/cypress/e2e/` that records a demo video with:
+
+- **Synthetic cursor** (white dot) that glides smoothly to each interaction target
+- **Click ripple** (blue expanding ring) on every click action
+- **Caption bar** (compact dark bar at top of viewport) describing each step
+- **Human-paced timing** so every action is clearly visible
+- **`--no-runner-ui`** flag to exclude the Cypress sidebar from the recording
+
+### 1. Determine what to demo
+
+- If `$ARGUMENTS` is provided, use it as the demo description
+- If empty, run `git diff main..HEAD --stat` to identify changed files and infer features
+- Read the changed/new component files to understand what UI to showcase
+- Ask the user if clarification is needed
+
+### 2. Check prerequisites
+
+- Verify `e2e/.env.test` or `e2e/.env` exists with `TEST_TOKEN`
+- Check if `ANTHROPIC_API_KEY` is available (needed for Running state)
+- Verify the kind cluster is up: `kubectl get pods -n ambient-code`
+- Verify the frontend is accessible: `curl -s -o /dev/null -w "%{http_code}" http://localhost`
+
+### 3. Create the demo test file
+
+Create `e2e/cypress/e2e/<feature-name>-demo.cy.ts` using the helpers below.
+
+#### Required helpers
+
+Copy the demo helpers (cursor, caption, click ripple, timing constants) from the reference implementation at `e2e/cypress/e2e/sessions.cy.ts` into each new demo file. The helpers are: `caption()`, `clearCaption()`, `initCursor()`, `moveTo()`, `moveToText()`, `clickEffect()`, `cursorClickText()`, plus timing constants (`LONG`, `PAUSE`, `SHORT`, `TYPE_DELAY`).
+
+### 4. Key patterns
+
+| Pattern | Rule |
+|---------|------|
+| **Dual layout** | Session page renders desktop + mobile. Always use `.first()` |
+| **Caption scoping** | Scope `cy.contains` to a tag to avoid matching caption overlay |
+| **Workspace setup** | Create workspace, poll `/api/projects/:name` until 200 |
+| **Caption position** | Always `top:0` — bottom obscures chat toolbar |
+| **Timing** | Aim for ~2 min total. Adjust constants if too fast/slow |
+| **Video output** | `e2e/cypress/videos/<name>.cy.ts.mp4` at 2560x1440 |
+
+### 5. Run the demo
+
+```bash
+cd e2e
+npx cypress run --no-runner-ui --spec "cypress/e2e/<name>-demo.cy.ts"
+```
+
+### 6. Reference implementation
+
+See `e2e/cypress/e2e/sessions.cy.ts` for a complete working example.
diff --git a/.claude/skills/cypress-demo/evals/evals.json b/.claude/skills/cypress-demo/evals/evals.json
new file mode 100644
index 000000000..3a81dbffa
--- /dev/null
+++ b/.claude/skills/cypress-demo/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "/cypress-demo",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "cypress-demo" },
+    "description": "Direct slash command invocation"
+  },
+  {
+    "input": "record a demo video of the new session creation flow",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "cypress-demo", "args": "session creation flow" },
+    "description": "Natural language demo recording request"
+  },
+  {
+    "input": "create a PR video showcasing the integrations page",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "cypress-demo", "args": "integrations page" },
+    "description": "PR video request for specific feature"
+  }
+]
diff --git a/.claude/skills/dev-cluster/SKILL.md b/.claude/skills/dev-cluster/SKILL.md
index f43c87cc6..62afbb52e 100644
--- a/.claude/skills/dev-cluster/SKILL.md
+++ b/.claude/skills/dev-cluster/SKILL.md
@@ -1,6 +1,11 @@
 ---
 name: dev-cluster
-description: Manages Ambient Code Platform development clusters (kind) for testing changes
+description: >
+  Manages Ambient Code Platform development clusters (kind) for testing changes
+  locally. Use when deploying PRs to kind, bringing up local clusters, rebuilding
+  images, troubleshooting pod issues, or running benchmarks. Triggers on: "test
+  in kind", "deploy locally", "kind cluster", "rebuild images", "pod crashing",
+  "bring up cluster", "kind-up", "dev environment", "local dev".
 ---
 
 # Development Cluster Management Skill
@@ -281,7 +286,7 @@ make kind-rebuild
 
 ### "Just rebuild the backend"
 ```bash
-make build-backend
+make build-backend CONTAINER_ENGINE=$CONTAINER_ENGINE
 kind load docker-image localhost/vteam_backend:latest --name $KIND_CLUSTER_NAME
 kubectl set image deployment/backend backend=localhost/vteam_backend:latest -n ambient-code
 kubectl rollout restart deployment/backend -n ambient-code
@@ -315,7 +320,7 @@ kubectl get deployments -n ambient-code
 **Solution:**
 ```bash
 # Ensure images are built locally
-make build-all
+make build-all CONTAINER_ENGINE=$CONTAINER_ENGINE
 
 # Load images into kind
 kind load docker-image localhost/vteam_backend:latest --name $KIND_CLUSTER_NAME
@@ -346,14 +351,7 @@ kubectl describe pod -l app=backend -n ambient-code
 ### Sessions fail with init-hydrate exit code 1
 **Cause:** MinIO `ambient-sessions` bucket doesn't exist. This happens when `make kind-up` fails partway through (e.g., due to image pull errors) and the `init-minio.sh` step is skipped.
 
-**Solution:**
-```bash
-# Create the bucket manually
-kubectl exec deployment/minio -n ambient-code -- mc alias set local http://localhost:9000 minioadmin minioadmin123
-kubectl exec deployment/minio -n ambient-code -- mc mb local/ambient-sessions
-```
-
-**Prevention:** If `make kind-up` fails, fix the underlying issue and re-run it rather than manually recovering individual steps. The Makefile runs `init-minio.sh` near the end of `kind-up`.
+**Solution:** Fix the underlying issue (e.g., image pull errors) and re-run `make kind-down && make kind-up`. The Makefile runs `init-minio.sh` near the end of `kind-up`, which creates the required buckets. If `make kind-up` completes successfully, the bucket will exist.
 
 ### Port forwarding not working
 **Cause:** Port already in use or forwarding process died
diff --git a/.claude/skills/dev-cluster/evals/evals.json b/.claude/skills/dev-cluster/evals/evals.json
new file mode 100644
index 000000000..805ce8d40
--- /dev/null
+++ b/.claude/skills/dev-cluster/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "test this changeset in kind",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "dev-cluster" },
+    "description": "Natural language trigger for deploying changes to kind cluster"
+  },
+  {
+    "input": "bring up a local dev environment",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "dev-cluster" },
+    "description": "Request to create a development cluster"
+  },
+  {
+    "input": "the backend pod is crash looping in kind",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "dev-cluster" },
+    "description": "Troubleshooting a pod issue in development cluster"
+  }
+]
diff --git a/.claude/skills/memory/SKILL.md b/.claude/skills/memory/SKILL.md
new file mode 100644
index 000000000..5b06069b2
--- /dev/null
+++ b/.claude/skills/memory/SKILL.md
@@ -0,0 +1,128 @@
+---
+name: memory
+description: >
+  Manage the auto-memory system for this project. Search, audit, prune, and
+  create memories with proper frontmatter. Use when you need to find a past
+  decision, check if memories are stale, clean up duplicates, add a new
+  memory, or understand what context is available. Triggers on: "check
+  memory", "what do we remember about", "find the memory about", "clean up
+  memories", "audit memories", "add to memory", "is there a memory for".
+---
+
+# Memory Management
+
+Manage the auto-memory system for the Ambient Code Platform project.
+
+## Usage
+
+```text
+/memory                    # Show summary of all memories
+/memory search <query>     # Search for a topic
+/memory audit              # Check for stale/duplicate memories
+/memory prune              # Remove stale memories (with confirmation)
+/memory add <topic>        # Create a new memory
+```
+
+## User Input
+
+```text
+$ARGUMENTS
+```
+
+Parse the subcommand from `$ARGUMENTS`. Default to summary if empty.
+
+## Memory Location
+
+All memory files live at:
+```text
+$HOME/.claude/projects/<project-slug>/memory/
+```
+
+Use the active project's slug (the repo path with `/` replaced by `-`).
+
+The index file is `MEMORY.md` in that directory.
+
+## Subcommands
+
+### `/memory` — Summary
+
+1. Read `MEMORY.md` to get the index
+2. Count total memories by type (user, feedback, project, reference)
+3. List the most recently modified files
+4. Report: total count, breakdown by type, last modified dates
+
+### `/memory search <query>`
+
+1. Read `MEMORY.md` for the index
+2. Grep through all memory files for the query term
+3. Read matching files and show relevant excerpts
+4. Report: matching files with frontmatter (name, type, description)
+
+### `/memory audit`
+
+Check for quality issues:
+
+1. **Stale memories** — project/reference memories older than 3 months may be outdated
+2. **Duplicate memories** — similar names or descriptions across files
+3. **Missing frontmatter** — files without proper `name`, `description`, `type` fields
+4. **Orphaned files** — memory files not referenced in `MEMORY.md`
+5. **Broken links** — `MEMORY.md` entries pointing to nonexistent files
+6. **Oversized index** — `MEMORY.md` approaching the 200-line truncation limit
+
+Report each issue with the file path and suggested action.
+
+### `/memory prune`
+
+1. Run the audit checks
+2. Present findings to the user
+3. For each stale/duplicate/orphaned memory, ask: keep, update, or delete?
+4. Execute confirmed deletions
+5. Update `MEMORY.md` index accordingly
+
+Never delete without explicit confirmation.
+
+### `/memory add <topic>`
+
+1. Ask the user what they want to remember (if not clear from context)
+2. Determine the memory type (user, feedback, project, reference)
+3. Create a new file with proper frontmatter:
+
+```markdown
+---
+name: <descriptive name>
+description: <one-line description for relevance matching>
+type: <user|feedback|project|reference>
+---
+
+<memory content>
+```
+
+4. Add an entry to `MEMORY.md`
+5. Verify the entry was added correctly
+
+## Memory Types
+
+| Type | What to store | When to save |
+|------|--------------|-------------|
+| **user** | Role, preferences, knowledge | Learning about the user |
+| **feedback** | Corrections, validated approaches | User corrects or confirms approach |
+| **project** | Decisions, initiatives, deadlines | Learning project context |
+| **reference** | Pointers to external systems | Discovering external resources |
+
+## What NOT to Store
+
+- Code patterns derivable from reading current code
+- Git history (use `git log`)
+- Debugging solutions (the fix is in the code)
+- Anything in CLAUDE.md
+- Ephemeral task details
+- Secrets or credentials (API keys, tokens, passwords, private keys, OAuth secrets)
+- Sensitive personal data unless explicitly required
+
+## Quality Guidelines
+
+- Keep `MEMORY.md` under 200 lines (truncation risk)
+- Each entry: one line, under 150 characters
+- Update existing memories rather than creating duplicates
+- Include absolute dates (not "next Thursday")
+- For feedback/project types: include **Why:** and **How to apply:** lines
diff --git a/.claude/skills/memory/evals/evals.json b/.claude/skills/memory/evals/evals.json
new file mode 100644
index 000000000..563dbc542
--- /dev/null
+++ b/.claude/skills/memory/evals/evals.json
@@ -0,0 +1,26 @@
+[
+  {
+    "input": "/memory audit",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "memory", "args": "audit" },
+    "description": "Audit memories for staleness and duplicates"
+  },
+  {
+    "input": "what do we remember about feature flags?",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "memory", "args": "search feature flags" },
+    "description": "Natural language memory search"
+  },
+  {
+    "input": "clean up the project memories",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "memory", "args": "prune" },
+    "description": "Request to prune stale memories"
+  },
+  {
+    "input": "/memory",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "memory" },
+    "description": "Default to summary when no arguments provided"
+  }
+]
diff --git a/.claude/skills/pr-fixer/SKILL.md b/.claude/skills/pr-fixer/SKILL.md
index ef82fcf56..1b761b813 100644
--- a/.claude/skills/pr-fixer/SKILL.md
+++ b/.claude/skills/pr-fixer/SKILL.md
@@ -1,6 +1,10 @@
 ---
 name: pr-fixer
-description: Trigger the PR Fixer GitHub Actions workflow to automatically fix a pull request (rebase, address review comments, run lints/tests, push fixes). Use when user types /pr-fixer <number>.
+description: >
+  Trigger the PR Fixer GitHub Actions workflow to automatically fix a pull request
+  (rebase, address review comments, run lints/tests, push fixes). Use when user
+  types /pr-fixer <number>, says "fix PR", "run pr-fixer", "address PR comments",
+  or "auto-fix pull request".
 ---
 
 # PR Fixer Skill
@@ -15,17 +19,22 @@ The PR number is required. Example: `/pr-fixer 1234`
 
 ## What It Does
 
-1. **Validate prerequisites**
+1. **Pre-flight checks**
    - Confirm `gh` CLI is authenticated (`gh auth status`)
-   - Detect the repo from the local git remote (`gh repo view --json nameWithOwner -q .nameWithOwner`)
+   - Detect the repo from the local git remote (`gh repo view --json nameWithOwner --jq .nameWithOwner`)
+   - Verify the PR exists and is open: `gh pr view <N> --repo <owner/repo> --json state --jq .state`
+   - If PR is closed/merged, abort with: "PR #N is already <state>. Nothing to fix."
+   - If PR doesn't exist, abort with: "PR #N not found in <owner/repo>."
+
+2. **Validate prerequisites**
    - Confirm the repo has a `pr-fixer.yml` workflow
 
-2. **Dispatch the workflow**
+3. **Dispatch the workflow**
    ```bash
    gh workflow run pr-fixer.yml -f pr_number=<N> --repo <owner/repo>
    ```
 
-3. **Locate the triggered run**
+4. **Locate the triggered run**
    - Wait a few seconds for the run to register
    - Find it via:
      ```bash
@@ -33,7 +42,7 @@ The PR number is required. Example: `/pr-fixer 1234`
      ```
    - Match the most recent run created after dispatch
 
-4. **Print the run URL** immediately so the user has it:
+5. **Print the run URL** immediately so the user has it:
    ```
    PR Fixer dispatched for PR #<N>
    Run: https://github.com/<owner/repo>/actions/runs/<run-id>
@@ -42,8 +51,9 @@ The PR number is required. Example: `/pr-fixer 1234`
    Monitoring in background — you'll be notified when it completes.
    ```
 
-5. **Spawn a background agent** to monitor the run:
+6. **Spawn a background agent** to monitor the run (30-minute timeout):
    - Poll `gh run view <run-id> --repo <owner/repo> --json status,conclusion` every 30 seconds
+   - If 30 minutes elapse without completion, notify: "PR Fixer timed out after 30 minutes. Check the run manually."
    - When the run reaches a terminal state, notify with:
      - Run conclusion (success/failure/cancelled)
      - Session name and phase (parse from `gh run view <run-id> --repo <owner/repo> --json jobs` — look for the "Session summary" step output)
diff --git a/.claude/skills/pr-fixer/evals/evals.json b/.claude/skills/pr-fixer/evals/evals.json
new file mode 100644
index 000000000..b8e8d38e5
--- /dev/null
+++ b/.claude/skills/pr-fixer/evals/evals.json
@@ -0,0 +1,26 @@
+[
+  {
+    "input": "/pr-fixer 1234",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "pr-fixer", "args": "1234" },
+    "description": "Direct slash command invocation"
+  },
+  {
+    "input": "fix PR 1234",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "pr-fixer", "args": "1234" },
+    "description": "Natural language trigger to fix a PR"
+  },
+  {
+    "input": "run the pr fixer on PR #987",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "pr-fixer", "args": "987" },
+    "description": "Natural language with hash prefix on PR number"
+  },
+  {
+    "input": "/pr-fixer",
+    "expected_tool_call": "None",
+    "expected_args": {},
+    "description": "Missing required PR number should not invoke skill"
+  }
+]
diff --git a/.claude/skills/scaffold/SKILL.md b/.claude/skills/scaffold/SKILL.md
new file mode 100644
index 000000000..b464cee39
--- /dev/null
+++ b/.claude/skills/scaffold/SKILL.md
@@ -0,0 +1,132 @@
+---
+name: scaffold
+description: >
+  Generate the complete file set for a new integration, endpoint, or feature
+  flag following established project patterns. Use when adding a new
+  integration (like Jira, CodeRabbit, Google Drive), creating a new API
+  endpoint with full stack, or setting up a new feature flag. Triggers on:
+  "new integration", "add an integration", "scaffold", "create endpoint",
+  "add a new provider", "wire up a new service". Always includes a feature
+  flag. Follows the Jira integration pattern.
+---
+
+# Scaffold
+
+Generate the complete file set for a new integration, endpoint, or feature flag.
+
+## Usage
+
+```bash
+/scaffold integration <name>    # Full integration scaffold
+/scaffold endpoint <name>       # API endpoint scaffold
+/scaffold feature-flag <name>   # Feature flag scaffold (delegates to /unleash-flag)
+```
+
+## User Input
+
+```text
+$ARGUMENTS
+```
+
+Parse the scaffold type and name from `$ARGUMENTS`. If ambiguous, ask the user.
+
+## Integration Scaffold
+
+Based on the established integration pattern (Jira, CodeRabbit, Google Drive), generate the full file set.
+
+### Backend Files
+
+| File | Purpose | Template |
+|------|---------|----------|
+| `components/backend/handlers/{provider}_auth.go` | Auth handlers + K8s Secret CRUD | Follow `jira_auth.go` pattern |
+| `components/backend/handlers/integration_validation.go` | Add validation + test endpoint | Add `Validate{Provider}` function |
+| `components/backend/handlers/integrations_status.go` | Add to unified status | Add provider to status aggregation |
+| `components/backend/handlers/runtime_credentials.go` | Session credential fetch with RBAC | Add `fetch{Provider}Credentials` |
+| `components/backend/routes.go` | Register endpoints | Add route group with auth middleware |
+
+### Frontend Files
+
+| File | Purpose | Template |
+|------|---------|----------|
+| `components/frontend/src/components/integrations/{provider}-connection-card.tsx` | Integration card UI | Follow existing integration card (e.g., Jira) |
+| `components/frontend/src/services/api/{provider}-auth.ts` | API client | Follow existing auth service pattern |
+| `components/frontend/src/services/queries/use-{provider}.ts` | React Query hooks | Follow existing query hook pattern |
+| `components/frontend/src/app/api/auth/{provider}/route.ts` | Next.js proxy route | Follow existing auth proxy |
+| `components/frontend/src/components/integrations/IntegrationsClient.tsx` | Add card import | Update imports + render |
+| `components/frontend/src/components/integrations/integrations-panel.tsx` | Add to panel | Update panel |
+
+> **Note:** Before scaffolding, verify the reference files exist by checking `components/frontend/src/components/integrations/` and `components/frontend/src/services/`. File names may differ from examples above.
+
+### Runner Files
+
+| File | Purpose | Template |
+|------|---------|----------|
+| `components/runners/ambient-runner/src/auth.py` | Add `fetch_{provider}_credentials()` | Follow `fetch_jira_credentials` pattern |
+
+### Feature Flag
+
+| File | Purpose |
+|------|---------|
+| `components/manifests/base/core/flags.json` | Add `integration.{provider}.enabled` |
+
+### Checklist
+
+After scaffolding, verify:
+
+- [ ] All backend handlers use `GetK8sClientsForRequest` for user operations
+- [ ] Credentials stored in K8s Secret with OwnerReferences
+- [ ] Frontend uses React Query hooks (no manual fetch)
+- [ ] Frontend uses Shadcn UI components
+- [ ] Feature flag gates the integration card
+- [ ] Tests mock the feature flag hook
+- [ ] Runner credential fetch is added to `populate_runtime_credentials()`
+
+## Endpoint Scaffold
+
+For adding a new API endpoint with full-stack support.
+
+### Files to Create/Modify
+
+| Layer | File | Action |
+|-------|------|--------|
+| Backend handler | `components/backend/handlers/{resource}.go` | Create |
+| Backend routes | `components/backend/routes.go` | Add routes |
+| Backend types | `components/backend/types/{resource}.go` | Create if needed |
+| Frontend API | `components/frontend/src/services/api/{resource}.ts` | Create |
+| Frontend queries | `components/frontend/src/services/queries/{resource}.ts` | Create |
+| Frontend proxy | `components/frontend/src/app/api/{resource}/route.ts` | Create |
+
+### Backend Handler Template
+
+```go
+func List{Resource}(c *gin.Context) {
+    projectName := c.Param("projectName")
+
+    reqK8s, reqDyn := GetK8sClientsForRequest(c)
+    if reqK8s == nil || reqDyn == nil {
+        c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid or missing token"})
+        return
+    }
+
+    // RBAC check
+    // List operation with reqDyn
+    // Return response
+}
+```
+
+### Frontend Query Template
+
+```typescript
+export function use{Resource}(projectName: string) {
+  return useQuery({
+    queryKey: ["{resource}", projectName],
+    queryFn: () => {resource}Api.list(projectName),
+  })
+}
+```
+
+## Feature Flag Scaffold
+
+Delegates to the `/unleash-flag` skill for the full feature flag workflow.
+
+Run `/unleash-flag` with the flag name and follow its checklist.
diff --git a/.claude/skills/scaffold/evals/evals.json b/.claude/skills/scaffold/evals/evals.json
new file mode 100644
index 000000000..42e94f43f
--- /dev/null
+++ b/.claude/skills/scaffold/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "/scaffold integration slack",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "scaffold", "args": "integration slack" },
+    "description": "Scaffold a new integration"
+  },
+  {
+    "input": "add a new integration for PagerDuty",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "scaffold" },
+    "description": "Natural language integration scaffolding request"
+  },
+  {
+    "input": "/scaffold endpoint audit-logs",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "scaffold", "args": "endpoint audit-logs" },
+    "description": "Scaffold a new API endpoint"
+  }
+]
diff --git a/.claude/skills/unleash-flag/SKILL.md b/.claude/skills/unleash-flag/SKILL.md
index d4fbdc1ef..5320028ca 100644
--- a/.claude/skills/unleash-flag/SKILL.md
+++ b/.claude/skills/unleash-flag/SKILL.md
@@ -145,6 +145,67 @@ if !FeatureEnabledForRequest(c, "category.feature.enabled") {
 }
 ```
 
+### Backend Middleware Gating
+
+For endpoints that should be entirely hidden behind a flag, create a middleware in `routes.go` that checks `FeatureEnabledForRequest()` and returns 404 when the flag is off:
+
+```go
+// routes.go — gate an entire route group
+flagged := router.Group("/api/v1/feature")
+flagged.Use(func(c *gin.Context) {
+    if !handlers.FeatureEnabledForRequest(c, "category.feature.enabled") {
+        c.AbortWithStatus(http.StatusNotFound)
+        return
+    }
+    c.Next()
+})
+{
+    flagged.GET("/resource", handlers.ListResource)
+    flagged.POST("/resource", handlers.CreateResource)
+}
+```
+
+This returns 404 (not 403) when the flag is off, so the endpoint appears not to exist. Use this for features that shouldn't even be discoverable until enabled.
+
+### E2E Testing with Feature Flags
+
+When writing E2E tests for flagged features, set the admin token via environment variable (obtain from Unleash UI > API Access, or from deployment secrets):
+```bash
+export CYPRESS_UNLEASH_ADMIN_TOKEN='<your-unleash-admin-token>'
+```
+
+Alternatively, map it in `e2e/cypress.config.ts`:
+```typescript
+env: { UNLEASH_ADMIN_TOKEN: process.env.CYPRESS_UNLEASH_ADMIN_TOKEN }
+```
+
+```typescript
+// e2e/cypress/e2e/flagged-feature.cy.ts
+describe('Flagged Feature', () => {
+  before(() => {
+    // Enable the flag via Unleash admin API
+    cy.request({
+      method: 'POST',
+      url: 'http://localhost:4242/api/admin/projects/default/features/category.feature.enabled/environments/development/on',
+      headers: { Authorization: Cypress.env('UNLEASH_ADMIN_TOKEN') },
+    }).its('status').should('eq', 200);
+  });
+
+  after(() => {
+    // Disable the flag after tests
+    cy.request({
+      method: 'POST',
+      url: 'http://localhost:4242/api/admin/projects/default/features/category.feature.enabled/environments/development/off',
+      headers: { Authorization: Cypress.env('UNLEASH_ADMIN_TOKEN') },
+    }).its('status').should('eq', 200);
+  });
+
+  it('renders when flag is enabled', () => {
+    // ... test the feature
+  });
+});
+```
+
 ## 3. Update Tests
 
 Any component that now calls `useWorkspaceFlag` or `useFlag` needs its tests updated.
@@ -197,11 +258,18 @@ This means a workspace admin can override the global Unleash state in either dir
 |-------|--------|
 | **Create** | Add to `flags.json`, gate frontend, assign ownership |
 | **Rollout** | Workspace admins enable per-workspace via settings UI |
-| **GA** | Remove flag checks from code, remove from `flags.json` |
+| **GA** | Remove flag checks from code, remove from `flags.json`, create Jira for cleanup tracking |
 | **Cleanup** | Archive flag in Unleash, remove stale ConfigMap overrides |
 
 Treat flags as technical debt. When a feature is fully rolled out, remove the flag — don't leave it permanently enabled.
 
+**When a feature reaches GA**, create a Jira issue (use `/jira-log`) to track cleanup in this order:
+1. Verify feature is fully rolled out (all workspaces enabled or no longer need it)
+2. Remove `useWorkspaceFlag` / `useFlag` calls from code and deploy
+3. Remove the flag from `flags.json` and deploy (SyncFlags stops recreating it)
+4. Remove any ConfigMap overrides in workspace namespaces
+5. Archive and purge the flag in Unleash
+
 **Exceptions for long-lived flags:** kill switches for graceful degradation, and debug flags for expensive tracing.
 
 ## Common Mistakes
diff --git a/.claude/skills/unleash-flag/evals/evals.json b/.claude/skills/unleash-flag/evals/evals.json
new file mode 100644
index 000000000..28629bf22
--- /dev/null
+++ b/.claude/skills/unleash-flag/evals/evals.json
@@ -0,0 +1,20 @@
+[
+  {
+    "input": "add a feature flag for the new export feature",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "unleash-flag" },
+    "description": "Request to create a new feature flag"
+  },
+  {
+    "input": "put this feature behind a workspace flag",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "unleash-flag" },
+    "description": "Request to gate a feature with workspace-scoped flag"
+  },
+  {
+    "input": "how do I use useWorkspaceFlag?",
+    "expected_tool_call": "Skill",
+    "expected_args": { "skill": "unleash-flag" },
+    "description": "Question about feature flag hook usage"
+  }
+]
diff --git a/BOOKMARKS.md b/BOOKMARKS.md
index 80e1537b7..6853ed542 100644
--- a/BOOKMARKS.md
+++ b/BOOKMARKS.md
@@ -6,8 +6,7 @@ Progressive disclosure for task-specific documentation and references.
 
 - [Governance](#governance)
 - [Architecture Decisions](#architecture-decisions)
-- [Development Context](#development-context)
-- [Code Patterns](#code-patterns)
+- [Component Development Guides](#component-development-guides)
 - [Component Guides](#component-guides)
 - [Development Environment](#development-environment)
 - [Testing](#testing)
@@ -38,21 +37,19 @@ Progressive disclosure for task-specific documentation and references.
 | [ADR-0006](docs/internal/adr/0006-ambient-runner-sdk-architecture.md) | Runner SDK design and architecture |
 | [ADR-0007](docs/internal/adr/0007-unleash-feature-flags.md) | Unleash with workspace-scoped overrides |
 
-## Development Context
+## Component Development Guides
 
-| Context | Scope |
-|---------|-------|
-| [Backend](.claude/context/backend-development.md) | Go patterns, K8s integration, handler conventions, user-scoped clients |
-| [Frontend](.claude/context/frontend-development.md) | NextJS patterns, Shadcn, React Query, component guidelines |
-| [Security](.claude/context/security-standards.md) | Auth flows, RBAC, token handling, container security |
+Convention documentation for each component. Loaded by review agents on demand.
 
-## Code Patterns
-
-| Pattern | Scope |
-|---------|-------|
-| [Error Handling](.claude/patterns/error-handling.md) | Consistent error patterns across backend, operator, runner |
-| [K8s Client Usage](.claude/patterns/k8s-client-usage.md) | User token vs. service account — critical for RBAC compliance |
-| [React Query](.claude/patterns/react-query-usage.md) | Data fetching hooks, mutations, cache invalidation |
+| Guide | Scope |
+|-------|-------|
+| [Backend Development](components/backend/DEVELOPMENT.md) | Go patterns, K8s integration, handler conventions, user-scoped clients |
+| [Backend Error Patterns](components/backend/ERROR_PATTERNS.md) | Consistent error patterns across backend and operator |
+| [Backend K8s Client Patterns](components/backend/K8S_CLIENT_PATTERNS.md) | User token vs. service account — critical for RBAC compliance |
+| [Frontend Development](components/frontend/DEVELOPMENT.md) | NextJS patterns, Shadcn, React Query, component guidelines |
+| [Frontend React Query Patterns](components/frontend/REACT_QUERY_PATTERNS.md) | Data fetching hooks, mutations, cache invalidation |
+| [Operator Development](components/operator/DEVELOPMENT.md) | OwnerReferences, reconciliation patterns, SecurityContext, resource limits |
+| [Security Standards](docs/security-standards.md) | Auth flows, RBAC, token handling, container security |
 
 ## Component Guides
 
diff --git a/CLAUDE.md b/CLAUDE.md
index d4520eace..efcf8ca90 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -88,14 +88,24 @@ Benchmark notes:
 - `warm` measures rebuild proxies, not browser-observed hot reload latency
 - See `scripts/benchmarks/README.md` for semantics and caveats
 
-## Critical Context
+## Critical Conventions
+
+Rules that apply across ALL components. Per-component details live in component DEVELOPMENT.md files.
 
 - **User token auth required**: All user-facing API ops use `GetK8sClientsForRequest(c)`, never the backend service account
+- **No tokens in logs/errors/responses**: Use `len(token)` for logging, generic messages to users
 - **OwnerReferences on all child resources**: Jobs, Secrets, PVCs must have controller owner refs
 - **No `panic()` in production**: Return explicit `fmt.Errorf` with context
 - **No `any` types in frontend**: Use proper types, `unknown`, or generic constraints
+- **Feature flags strongly recommended**: Gate new features behind Unleash flags. Use `/unleash-flag` to set up
 - **Conventional commits**: Squashed on merge to `main`
 
+Component-specific conventions:
+- Backend: [DEVELOPMENT.md](components/backend/DEVELOPMENT.md), [ERROR_PATTERNS.md](components/backend/ERROR_PATTERNS.md), [K8S_CLIENT_PATTERNS.md](components/backend/K8S_CLIENT_PATTERNS.md)
+- Frontend: [DEVELOPMENT.md](components/frontend/DEVELOPMENT.md), [REACT_QUERY_PATTERNS.md](components/frontend/REACT_QUERY_PATTERNS.md)
+- Operator: [DEVELOPMENT.md](components/operator/DEVELOPMENT.md)
+- Security: [security-standards.md](docs/security-standards.md)
+
 ## Pre-commit Hooks
 
 Configured in `.pre-commit-config.yaml`. Install: `make setup-hooks`. Run all: `make lint`.
diff --git a/.claude/context/backend-development.md b/components/backend/DEVELOPMENT.md
similarity index 92%
rename from .claude/context/backend-development.md
rename to components/backend/DEVELOPMENT.md
index 4d5aa9c8f..3769d83f5 100644
--- a/.claude/context/backend-development.md
+++ b/components/backend/DEVELOPMENT.md
@@ -1,5 +1,7 @@
 # Backend Development Context
 
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
 **When to load:** Working on Go backend API, handlers, or Kubernetes integration
 
 ## Quick Reference
@@ -120,9 +122,3 @@ if !found || err != nil {
 - `handlers/helpers.go` - Utility functions (StringPtr, BoolPtr)
 - `types/session.go` - Type definitions
 - `server/server.go` - Server setup, token redaction
-
-## Recent Issues & Learnings
-
-- **2024-11-15:** Fixed token leak in logs - never log raw tokens
-- **2024-11-10:** Multi-repo support added - `mainRepoIndex` specifies working directory
-- **2024-10-20:** Added RBAC validation middleware - always check permissions
diff --git a/.claude/patterns/error-handling.md b/components/backend/ERROR_PATTERNS.md
similarity index 98%
rename from .claude/patterns/error-handling.md
rename to components/backend/ERROR_PATTERNS.md
index 7bb6155e3..d372fc4f4 100644
--- a/.claude/patterns/error-handling.md
+++ b/components/backend/ERROR_PATTERNS.md
@@ -1,5 +1,7 @@
 # Error Handling Patterns
 
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
 Consistent error handling patterns across backend and operator components.
 
 ## Backend Handler Errors
diff --git a/.claude/patterns/k8s-client-usage.md b/components/backend/K8S_CLIENT_PATTERNS.md
similarity index 98%
rename from .claude/patterns/k8s-client-usage.md
rename to components/backend/K8S_CLIENT_PATTERNS.md
index a25fea8cd..2d44f52c6 100644
--- a/.claude/patterns/k8s-client-usage.md
+++ b/components/backend/K8S_CLIENT_PATTERNS.md
@@ -1,5 +1,7 @@
 # Kubernetes Client Usage Patterns
 
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
 When to use user-scoped clients vs. backend service account clients.
 
 ## The Two Client Types
diff --git a/.claude/context/frontend-development.md b/components/frontend/DEVELOPMENT.md
similarity index 94%
rename from .claude/context/frontend-development.md
rename to components/frontend/DEVELOPMENT.md
index 02e944cab..70acd73be 100644
--- a/.claude/context/frontend-development.md
+++ b/components/frontend/DEVELOPMENT.md
@@ -1,5 +1,7 @@
 # Frontend Development Context
 
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
 **When to load:** Working on NextJS application, UI components, or React Query integration
 
 ## Quick Reference
@@ -175,9 +177,3 @@ export function useCreateSession(projectName: string) {
 - `src/components/ui/` - Shadcn UI components
 - `src/services/queries/` - React Query hooks
 - `src/services/api/` - API client layer
-
-## Recent Issues & Learnings
-
-- **2024-11-18:** Migrated all data fetching to React Query - no more manual fetch calls
-- **2024-11-15:** Enforced Shadcn UI only - removed custom button components
-- **2024-11-10:** Added breadcrumb pattern for nested pages
diff --git a/.claude/patterns/react-query-usage.md b/components/frontend/REACT_QUERY_PATTERNS.md
similarity index 99%
rename from .claude/patterns/react-query-usage.md
rename to components/frontend/REACT_QUERY_PATTERNS.md
index e97012b71..ba10701f5 100644
--- a/.claude/patterns/react-query-usage.md
+++ b/components/frontend/REACT_QUERY_PATTERNS.md
@@ -1,5 +1,7 @@
 # React Query Usage Patterns
 
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
 Standard patterns for data fetching, mutations, and cache management in the frontend.
 
 ## Core Principles
diff --git a/components/operator/DEVELOPMENT.md b/components/operator/DEVELOPMENT.md
new file mode 100644
index 000000000..f751de261
--- /dev/null
+++ b/components/operator/DEVELOPMENT.md
@@ -0,0 +1,106 @@
+# Operator Development Context
+
+> Part of [CLAUDE.md Critical Conventions](../../CLAUDE.md#critical-conventions)
+
+**When to load:** Working on the Kubernetes operator, reconciliation logic, or Job management
+
+## Quick Reference
+
+- **Language:** Go 1.21+
+- **Pattern:** Controller-runtime based operator
+- **Primary Files:** `internal/handlers/sessions.go`, `internal/config/config.go`
+
+## Critical Rules
+
+### OwnerReferences on All Child Resources
+
+Every Job, Secret, and PVC the operator creates **must** set OwnerReferences pointing to the parent AgenticSession CR. This ensures automatic cleanup when the session is deleted.
+
+```go
+OwnerReferences: []metav1.OwnerReference{
+    {
+        APIVersion:         obj.GetAPIVersion(),
+        Kind:               obj.GetKind(),
+        Name:               obj.GetName(),
+        UID:                obj.GetUID(),
+        Controller:         boolPtr(true),
+        // BlockOwnerDeletion omitted — causes permission issues in constrained RBAC environments
+    },
+},
+```
+
+### SecurityContext on Job Pod Specs
+
+All Job pod specs must include a restrictive SecurityContext:
+
+```go
+SecurityContext: &corev1.SecurityContext{
+    AllowPrivilegeEscalation: boolPtr(false),
+    ReadOnlyRootFilesystem:   boolPtr(false),
+    Capabilities: &corev1.Capabilities{
+        Drop: []corev1.Capability{"ALL"},
+    },
+},
+```
+
+### Resource Limits and Requests
+
+Job containers must specify resource requirements to prevent unbounded resource consumption.
+
+### Reconciliation Error Handling
+
+```go
+// Resource deleted during reconciliation — NOT an error
+if errors.IsNotFound(err) {
+    log.Printf("Resource %s/%s deleted, skipping", namespace, name)
+    return ctrl.Result{}, nil  // Don't requeue
+}
+
+// Transient error — return error to requeue
+if err != nil {
+    return ctrl.Result{}, fmt.Errorf("failed to get object: %w", err)
+}
+```
+
+**Key patterns:**
+- `IsNotFound` → return `ctrl.Result{}, nil` (resource gone, no retry)
+- Transient errors → return `ctrl.Result{}, err` (triggers requeue with backoff)
+- Terminal errors → update CR status to "Failed", return `ctrl.Result{}, nil` (don't retry)
+
+### Status Updates on Error
+
+When an operation fails, always update the CR status before returning:
+
+```go
+updateAgenticSessionStatus(namespace, name, map[string]interface{}{
+    "phase":   "Failed",
+    "message": fmt.Sprintf("Failed to create job: %v", err),
+})
+```
+
+### Context Propagation
+
+Use the context from the reconciliation request, not `context.TODO()`:
+
+```go
+// Bad
+ctx := context.TODO()
+
+// Good — use the ctx parameter from the Reconcile(ctx, req) signature
+// The ctx is already provided as the first argument to Reconcile and phase handlers
+```
+
+### No panic() in Production
+
+Same as backend: return `fmt.Errorf` with context instead. A panic crashes the entire operator, affecting all sessions.
+
+## Pre-Commit Checklist
+
+- [ ] OwnerReferences set on all child resources
+- [ ] SecurityContext on all Job pod specs
+- [ ] Resource limits/requests on containers
+- [ ] Status updated on error paths
+- [ ] No `panic()` in non-test code
+- [ ] Proper context propagation (no `context.TODO()`)
+- [ ] `gofmt -w .` applied
+- [ ] `go vet ./...` passes
diff --git a/.claude/context/security-standards.md b/docs/security-standards.md
similarity index 95%
rename from .claude/context/security-standards.md
rename to docs/security-standards.md
index 89b7d2dff..c59eb7ea5 100644
--- a/.claude/context/security-standards.md
+++ b/docs/security-standards.md
@@ -1,5 +1,7 @@
 # Security Standards Quick Reference
 
+> Part of [CLAUDE.md Critical Conventions](../CLAUDE.md#critical-conventions)
+
 **When to load:** Working on authentication, authorization, RBAC, or handling sensitive data
 
 ## Critical Security Rules
@@ -239,12 +241,6 @@ Before committing code that handles:
 - [ ] Capabilities dropped (ALL)
 - [ ] OwnerReferences set for cleanup
 
-## Recent Security Issues
-
-- **2024-11-15:** Fixed token leak in logs - added custom redacting formatter
-- **2024-10-20:** Added RBAC validation middleware - prevent unauthorized access
-- **2024-10-10:** Fixed privilege escalation risk - added SecurityContext to Job pods
-
 ## Security Review Resources
 
 - OWASP Top 10: <https://owasp.org/www-project-top-ten/>
diff --git a/scripts/claude-hooks/stop-review.sh b/scripts/claude-hooks/stop-review.sh
new file mode 100755
index 000000000..67421d6f4
--- /dev/null
+++ b/scripts/claude-hooks/stop-review.sh
@@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+# Stop hook: suggest running /amber-review if files were modified during the session.
+# Called by Claude Code's Stop hook — reads the stop_hook_input from stdin.
+
+set -euo pipefail
+
+# Check for any working tree changes (staged, unstaged, or untracked)
+if [[ -z "$(git status --porcelain 2>/dev/null)" ]]; then
+  exit 0
+fi
+
+cat <<'MSG'
+Files modified. Consider running /amber-review before completing.
+MSG