This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
The Ambient Code Platform is a Kubernetes-native AI automation platform that orchestrates intelligent agentic sessions through containerized microservices. The platform enables AI-powered automation for analysis, research, development, and content creation tasks via a modern web interface.
Note: This project was formerly known as "vTeam". Technical artifacts (image names, namespaces, API groups, routes) still use "vteam" for backward compatibility. The docs use ACP naming.
The platform includes Amber, a background agent that automates common development tasks via GitHub Issues. Team members can trigger automated fixes, refactoring, and test additions without requiring direct access to Claude Code.
Quick Links:
- Amber Quickstart - Get started in 5 minutes
- Full Documentation - Complete automation guide
- Amber Config - Automation policies
Common Workflows:
- 🤖 Auto-Fix (label:
amber:auto-fix): Formatting, linting, trivial fixes - 🔧 Refactoring (label:
amber:refactor): Break large files, extract patterns - 🧪 Test Coverage (label:
amber:test-coverage): Add missing tests
The system follows a Kubernetes-native pattern with Custom Resources, Operators, and Job execution:
- Frontend (NextJS + Shadcn): Web UI for session management and monitoring
- Backend API (Go + Gin): REST API managing Kubernetes Custom Resources with multi-tenant project isolation
- Agentic Operator (Go): Kubernetes controller watching CRs and creating Jobs
- Claude Code Runner (Python): Job pods executing Claude Code CLI with multi-agent collaboration
User Creates Session → Backend Creates CR → Operator Spawns Job →
Pod Runs Claude CLI → Results Stored in CR → UI Displays Progress
This repository uses a structured memory system to provide targeted, loadable context instead of relying solely on this comprehensive CLAUDE.md file.
Load these files when working in specific areas:
| Task Type | Context File | Architecture View | Pattern File |
|---|---|---|---|
| Backend API work | .claude/context/backend-development.md |
repomix-analysis/03-architecture-only.xml |
.claude/patterns/k8s-client-usage.md |
| Frontend UI work | .claude/context/frontend-development.md |
repomix-analysis/03-architecture-only.xml |
.claude/patterns/react-query-usage.md |
| Security review | .claude/context/security-standards.md |
repomix-analysis/03-architecture-only.xml |
.claude/patterns/error-handling.md |
| Architecture questions | - | repomix-analysis/03-architecture-only.xml |
See ADRs below |
Note: We use a single repomix architecture view (grade 8.8/10, 187K tokens) for all tasks. See .claude/repomix-guide.md for details.
1. Context Files (.claude/context/)
backend-development.md- Go backend, K8s integration, handler patternsfrontend-development.md- NextJS, Shadcn UI, React Query patternssecurity-standards.md- Auth, RBAC, token handling, security patterns
2. Architectural Decision Records (docs/adr/)
- Documents WHY decisions were made, not just WHAT
0001-kubernetes-native-architecture.md0002-user-token-authentication.md0003-multi-repo-support.md0004-go-backend-python-runner.md0005-nextjs-shadcn-react-query.md
3. Code Pattern Catalog (.claude/patterns/)
error-handling.md- Consistent error patterns (backend, operator, runner)k8s-client-usage.md- When to use user token vs. service accountreact-query-usage.md- Data fetching patterns (queries, mutations, caching)
4. Repomix Usage Guide (.claude/repomix-guide.md)
- Guide for using the architecture view effectively
- Why we use a single view approach (vs. 7 views)
5. Decision Log (docs/decisions.md)
- Lightweight chronological record of major decisions
- Links to ADRs, code, and context files
"Claude, load the architecture view (repomix-analysis/03-architecture-only.xml) and the
backend-development context file, then help me add a new endpoint for listing RFE workflows."
"Claude, load the architecture view and security-standards context file,
then review this PR for token handling issues."
"Claude, check ADR-0002 (User Token Authentication) and explain why we use user tokens
instead of service accounts for API operations."
Single command setup with OpenShift Local (CRC):
# Prerequisites: brew install crc
# Get free Red Hat pull secret from console.redhat.com/openshift/create/local
make dev-start
# Access at https://vteam-frontend-vteam-dev.apps-crc.testingHot-reloading development:
# Terminal 1
DEV_MODE=true make dev-start
# Terminal 2 (separate terminal)
make dev-sync# Build all container images (default: docker, linux/amd64)
make build-all
# Build with podman
make build-all CONTAINER_ENGINE=podman
# Build for ARM64
make build-all PLATFORM=linux/arm64
# Build individual components
make build-frontend
make build-backend
make build-operator
make build-runner
# Push to registry
make push-all REGISTRY=quay.io/your-username# Deploy with default images from quay.io/ambient_code
make deploy
# Deploy to custom namespace
make deploy NAMESPACE=my-namespace
# Deploy with custom images
cd components/manifests
cp env.example .env
# Edit .env with ANTHROPIC_API_KEY and CONTAINER_REGISTRY
./deploy.sh
# Clean up deployment
make cleanSee component-specific documentation for detailed development commands:
- Backend (
components/backend/README.md): Go API development, testing, linting - Frontend (
components/frontend/README.md): NextJS development, see alsoDESIGN_GUIDELINES.md - Operator (
components/operator/README.md): Operator development, watch patterns - Claude Code Runner (
components/runners/claude-code-runner/README.md): Python runner development
Common commands:
make build-all # Build all components
make deploy # Deploy to cluster
make test # Run tests
make lint # Lint code# Install documentation dependencies
pip install -r requirements-docs.txt
# Serve locally at http://127.0.0.1:8000
mkdocs serve
# Build static site
mkdocs build
# Deploy to GitHub Pages
mkdocs gh-deploy
# Markdown linting
markdownlint docs/**/*.md# View logs
make dev-logs # Both backend and frontend
make dev-logs-backend # Backend only
make dev-logs-frontend # Frontend only
make dev-logs-operator # Operator only
# Operator management
make dev-restart-operator # Restart operator deployment
make dev-operator-status # Show operator status and events
# Cleanup
make dev-stop # Stop processes, keep CRC running
make dev-stop-cluster # Stop processes and shutdown CRC
make dev-clean # Stop and delete OpenShift project
# Testing
make dev-test # Run smoke tests
make dev-test-operator # Test operator onlyThe platform defines three primary CRDs:
-
AgenticSession (
agenticsessions.vteam.ambient-code): Represents an AI execution session- Spec: prompt, repos (multi-repo support), interactive mode, timeout, model selection
- Status: phase, startTime, completionTime, results, error messages, per-repo push status
-
ProjectSettings (
projectsettings.vteam.ambient-code): Project-scoped configuration- Manages API keys, default models, timeout settings
- Namespace-isolated for multi-tenancy
-
RFEWorkflow (
rfeworkflows.vteam.ambient-code): RFE (Request For Enhancement) workflows- 7-step agent council process for engineering refinement
- Agent roles: PM, Architect, Staff Engineer, PO, Team Lead, Team Member, Delivery Owner
AgenticSessions support operating on multiple repositories simultaneously:
- Each repo has required
input(URL, branch) and optionaloutput(fork/target) configuration mainRepoIndexspecifies which repo is the Claude working directory (default: 0)- Per-repo status tracking:
pushedorabandoned
- Batch Mode (default): Single prompt execution with timeout
- Interactive Mode (
interactive: true): Long-running chat sessions using inbox/outbox files
The Go backend (components/backend/) implements:
- Project-scoped endpoints:
/api/projects/:project/*for namespaced resources - Multi-tenant isolation: Each project maps to a Kubernetes namespace
- WebSocket support: Real-time session updates via
websocket_messaging.go - Git operations: Repository cloning, forking, PR creation via
git.go - RBAC integration: OpenShift OAuth for authentication
Main handler logic in handlers.go (3906 lines) manages:
- Project CRUD operations
- AgenticSession lifecycle
- ProjectSettings management
- RFE workflow orchestration
The Kubernetes operator (components/operator/) watches for:
- AgenticSession creation/updates → spawns Jobs with runner pods
- Job completion → updates CR status with results
- Timeout handling and cleanup
The Claude Code runner (components/runners/claude-code-runner/) provides:
- Claude Code SDK integration (
claude-code-sdk>=0.0.23) - Workspace synchronization via PVC proxy
- Multi-agent collaboration capabilities
- Anthropic API streaming (
anthropic>=0.68.0)
- Virtual environments: Always use
python -m venv venvoruv venv - Package manager: Prefer
uvoverpip - Formatting: black (double quotes)
- Import sorting: isort with black profile
- Linting: flake8 (ignore E203, W503)
- Formatting:
go fmt ./...(enforced) - Linting: golangci-lint (install via
make install-tools) - Testing: Table-driven tests with subtests
- Error handling: Explicit error returns, no panic in production code
- Default registry:
quay.io/ambient_code - Image tags: Component-specific (vteam_frontend, vteam_backend, vteam_operator, vteam_claude_runner)
- Platform: Default
linux/amd64, ARM64 supported viaPLATFORM=linux/arm64 - Build tool: Docker or Podman (
CONTAINER_ENGINE=podman)
- Default branch:
main - Feature branches: Required for development
- Commit style: Conventional commits (squashed on merge)
- Branch verification: Always check current branch before file modifications
- Default namespace:
ambient-code(production),vteam-dev(local dev) - CRD group:
vteam.ambient-code - API version:
v1alpha1(current) - RBAC: Namespace-scoped service accounts with minimal permissions
The platform includes optional Langfuse integration for LLM observability, tracking usage metrics while protecting user privacy.
- Default behavior: User messages and assistant responses are REDACTED in traces
- Preserved data: Usage metrics (tokens, costs), metadata (model, turn count, timestamps)
- Rationale: Track costs and usage patterns without exposing potentially sensitive user data
Enable Langfuse (disabled by default):
# In ambient-admin-langfuse-secret
LANGFUSE_ENABLED=true
LANGFUSE_PUBLIC_KEY=<your-key>
LANGFUSE_SECRET_KEY=<your-secret>
LANGFUSE_HOST=http://langfuse-web.langfuse.svc.cluster.local:3000Privacy Controls (optional - masking enabled by default):
# Masking is ENABLED BY DEFAULT (no environment variable needed)
# The runner defaults to LANGFUSE_MASK_MESSAGES=true if not set
# To explicitly set (optional):
LANGFUSE_MASK_MESSAGES=true
# To disable masking (dev/testing ONLY - exposes full message content):
LANGFUSE_MASK_MESSAGES=falseDeploy Langfuse to your cluster:
# Deploy with default privacy-preserving settings
./e2e/scripts/deploy-langfuse.sh
# For OpenShift
./e2e/scripts/deploy-langfuse.sh --openshift
# For Kubernetes
./e2e/scripts/deploy-langfuse.sh --kubernetes- Location:
components/runners/claude-code-runner/observability.py - Masking function:
_privacy_masking_function()- redacts content while preserving metrics - Test coverage:
tests/test_privacy_masking.py- validates masking behavior
With Masking Enabled (Default):
- ✅ Token counts (input, output, cache read, cache creation)
- ✅ Cost calculations (USD per session)
- ✅ Model names and versions
- ✅ Turn counts and session durations
- ✅ Tool usage (names, execution status)
- ✅ Error states and completion status
- ❌ User prompts (redacted)
- ❌ Assistant responses (redacted)
- ❌ Tool outputs with long content (redacted)
With Masking Disabled (dev/testing only):
- ✅ All of the above
⚠️ Full user message content (potentially sensitive!)⚠️ Full assistant response content⚠️ Complete tool outputs
Langfuse supports OpenTelemetry as of 2025:
- Current implementation: Langfuse Python SDK (v3, OTel-based)
- Alternative: Pure OpenTelemetry SDK → Langfuse OTLP endpoint (
/api/public/otel) - Migration: Not recommended unless vendor neutrality is required
- Benefit: Current SDK already uses OTel underneath
IMPORTANT: When working on backend (components/backend/) or operator (components/operator/) code, you MUST follow these strict guidelines based on established patterns in the codebase.
-
User Token Authentication Required
- FORBIDDEN: Using backend service account for user-initiated API operations
- REQUIRED: Always use
GetK8sClientsForRequest(c)to get user-scoped K8s clients - REQUIRED: Return
401 Unauthorizedif user token is missing or invalid - Exception: Backend service account ONLY for CR writes and token minting (handlers/sessions.go:227, handlers/sessions.go:449)
-
Never Panic in Production Code
- FORBIDDEN:
panic()in handlers, reconcilers, or any production path - REQUIRED: Return explicit errors with context:
return fmt.Errorf("failed to X: %w", err) - REQUIRED: Log errors before returning:
log.Printf("Operation failed: %v", err)
- FORBIDDEN:
-
Token Security and Redaction
- FORBIDDEN: Logging tokens, API keys, or sensitive headers
- REQUIRED: Redact tokens in logs using custom formatters (server/server.go:22-34)
- REQUIRED: Use
log.Printf("tokenLen=%d", len(token))instead of logging token content - Example:
path = strings.Split(path, "?")[0] + "?token=[REDACTED]"
-
Type-Safe Unstructured Access
- FORBIDDEN: Direct type assertions without checking:
obj.Object["spec"].(map[string]interface{}) - REQUIRED: Use
unstructured.Nested*helpers with three-value returns - Example:
spec, found, err := unstructured.NestedMap(obj.Object, "spec") - REQUIRED: Check
foundbefore using values; handle type mismatches gracefully
- FORBIDDEN: Direct type assertions without checking:
-
OwnerReferences for Resource Lifecycle
- REQUIRED: Set OwnerReferences on all child resources (Jobs, Secrets, PVCs, Services)
- REQUIRED: Use
Controller: boolPtr(true)for primary owner - FORBIDDEN:
BlockOwnerDeletion(causes permission issues in multi-tenant environments) - Pattern: (operator/internal/handlers/sessions.go:125-134, handlers/sessions.go:470-476)
Backend Structure (components/backend/):
backend/
├── handlers/ # HTTP handlers grouped by resource
│ ├── sessions.go # AgenticSession CRUD + lifecycle
│ ├── projects.go # Project management
│ ├── rfe.go # RFE workflows
│ ├── helpers.go # Shared utilities (StringPtr, etc.)
│ └── middleware.go # Auth, validation, RBAC
├── types/ # Type definitions (no business logic)
│ ├── session.go
│ ├── project.go
│ └── common.go
├── server/ # Server setup, CORS, middleware
├── k8s/ # K8s resource templates
├── git/, github/ # External integrations
├── websocket/ # Real-time messaging
├── routes.go # HTTP route registration
└── main.go # Wiring, dependency injection
Operator Structure (components/operator/):
operator/
├── internal/
│ ├── config/ # K8s client init, config loading
│ ├── types/ # GVR definitions, resource helpers
│ ├── handlers/ # Watch handlers (sessions, namespaces, projectsettings)
│ └── services/ # Reusable services (PVC provisioning, etc.)
└── main.go # Watch coordination
Rules:
- Handlers contain HTTP/watch logic ONLY
- Types are pure data structures
- Business logic in separate service packages
- No cyclic dependencies between packages
User-Scoped Clients (for API operations):
// ALWAYS use for user-initiated operations (list, get, create, update, delete)
reqK8s, reqDyn := GetK8sClientsForRequest(c)
if reqK8s == nil {
c.JSON(http.StatusUnauthorized, gin.H{"error": "Invalid or missing token"})
c.Abort()
return
}
// Use reqDyn for CR operations in user's authorized namespaces
list, err := reqDyn.Resource(gvr).Namespace(project).List(ctx, v1.ListOptions{})Backend Service Account Clients (limited use cases):
// ONLY use for:
// 1. Writing CRs after validation (handlers/sessions.go:417)
// 2. Minting tokens/secrets for runners (handlers/sessions.go:449)
// 3. Cross-namespace operations backend is authorized for
// Available as: DynamicClient, K8sClient (package-level in handlers/)
created, err := DynamicClient.Resource(gvr).Namespace(project).Create(ctx, obj, v1.CreateOptions{})Never:
- ❌ Fall back to service account when user token is invalid
- ❌ Use service account for list/get operations on behalf of users
- ❌ Skip RBAC checks by using elevated permissions
Handler Errors:
// Pattern 1: Resource not found
if errors.IsNotFound(err) {
c.JSON(http.StatusNotFound, gin.H{"error": "Session not found"})
return
}
// Pattern 2: Log + return error
if err != nil {
log.Printf("Failed to create session %s in project %s: %v", name, project, err)
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create session"})
return
}
// Pattern 3: Non-fatal errors (continue operation)
if err := updateStatus(...); err != nil {
log.Printf("Warning: status update failed: %v", err)
// Continue - session was created successfully
}Operator Errors:
// Pattern 1: Resource deleted during processing (non-fatal)
if errors.IsNotFound(err) {
log.Printf("AgenticSession %s no longer exists, skipping", name)
return nil // Don't treat as error
}
// Pattern 2: Retriable errors in watch loop
if err != nil {
log.Printf("Failed to create job: %v", err)
updateAgenticSessionStatus(ns, name, map[string]interface{}{
"phase": "Error",
"message": fmt.Sprintf("Failed to create job: %v", err),
})
return fmt.Errorf("failed to create job: %v", err)
}Never:
- ❌ Silent failures (always log errors)
- ❌ Generic error messages ("operation failed")
- ❌ Retrying indefinitely without backoff
OwnerReferences Pattern:
// Always set owner when creating child resources
ownerRef := v1.OwnerReference{
APIVersion: obj.GetAPIVersion(), // e.g., "vteam.ambient-code/v1alpha1"
Kind: obj.GetKind(), // e.g., "AgenticSession"
Name: obj.GetName(),
UID: obj.GetUID(),
Controller: boolPtr(true), // Only one controller per resource
// BlockOwnerDeletion: intentionally omitted (permission issues)
}
// Apply to child resources
job := &batchv1.Job{
ObjectMeta: v1.ObjectMeta{
Name: jobName,
Namespace: namespace,
OwnerReferences: []v1.OwnerReference{ownerRef},
},
// ...
}Cleanup Patterns:
// Rely on OwnerReferences for automatic cleanup, but delete explicitly when needed
policy := v1.DeletePropagationBackground
err := K8sClient.BatchV1().Jobs(ns).Delete(ctx, jobName, v1.DeleteOptions{
PropagationPolicy: &policy,
})
if err != nil && !errors.IsNotFound(err) {
log.Printf("Failed to delete job: %v", err)
return err
}Token Handling:
// Extract token from Authorization header
rawAuth := c.GetHeader("Authorization")
parts := strings.SplitN(rawAuth, " ", 2)
if len(parts) != 2 || !strings.EqualFold(parts[0], "Bearer") {
c.JSON(http.StatusUnauthorized, gin.H{"error": "invalid Authorization header"})
return
}
token := strings.TrimSpace(parts[1])
// NEVER log the token itself
log.Printf("Processing request with token (len=%d)", len(token))RBAC Enforcement:
// Always check permissions before operations
ssar := &authv1.SelfSubjectAccessReview{
Spec: authv1.SelfSubjectAccessReviewSpec{
ResourceAttributes: &authv1.ResourceAttributes{
Group: "vteam.ambient-code",
Resource: "agenticsessions",
Verb: "list",
Namespace: project,
},
},
}
res, err := reqK8s.AuthorizationV1().SelfSubjectAccessReviews().Create(ctx, ssar, v1.CreateOptions{})
if err != nil || !res.Status.Allowed {
c.JSON(http.StatusForbidden, gin.H{"error": "Unauthorized"})
return
}Container Security:
// Always set SecurityContext for Job pods
SecurityContext: &corev1.SecurityContext{
AllowPrivilegeEscalation: boolPtr(false),
ReadOnlyRootFilesystem: boolPtr(false), // Only if temp files needed
Capabilities: &corev1.Capabilities{
Drop: []corev1.Capability{"ALL"}, // Drop all by default
},
},Project-Scoped Endpoints:
// Standard pattern: /api/projects/:projectName/resource
r.GET("/api/projects/:projectName/agentic-sessions", ValidateProjectContext(), ListSessions)
r.POST("/api/projects/:projectName/agentic-sessions", ValidateProjectContext(), CreateSession)
r.GET("/api/projects/:projectName/agentic-sessions/:sessionName", ValidateProjectContext(), GetSession)
// ValidateProjectContext middleware:
// 1. Extracts project from route param
// 2. Validates user has access via RBAC check
// 3. Sets project in context: c.Set("project", projectName)Middleware Chain:
// Order matters: Recovery → Logging → CORS → Identity → Validation → Handler
r.Use(gin.Recovery())
r.Use(gin.LoggerWithFormatter(customRedactingFormatter))
r.Use(cors.New(corsConfig))
r.Use(forwardedIdentityMiddleware()) // Extracts X-Forwarded-User, etc.
r.Use(ValidateProjectContext()) // RBAC checkResponse Patterns:
// Success with data
c.JSON(http.StatusOK, gin.H{"items": sessions})
// Success with created resource
c.JSON(http.StatusCreated, gin.H{"message": "Session created", "name": name, "uid": uid})
// Success with no content
c.Status(http.StatusNoContent)
// Errors with structured messages
c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid request"})Watch Loop with Reconnection:
func WatchAgenticSessions() {
gvr := types.GetAgenticSessionResource()
for { // Infinite loop with reconnection
watcher, err := config.DynamicClient.Resource(gvr).Watch(ctx, v1.ListOptions{})
if err != nil {
log.Printf("Failed to create watcher: %v", err)
time.Sleep(5 * time.Second) // Backoff before retry
continue
}
log.Println("Watching for events...")
for event := range watcher.ResultChan() {
switch event.Type {
case watch.Added, watch.Modified:
obj := event.Object.(*unstructured.Unstructured)
handleEvent(obj)
case watch.Deleted:
// Handle cleanup
}
}
log.Println("Watch channel closed, restarting...")
watcher.Stop()
time.Sleep(2 * time.Second)
}
}Reconciliation Pattern:
func handleEvent(obj *unstructured.Unstructured) error {
name := obj.GetName()
namespace := obj.GetNamespace()
// 1. Verify resource still exists (avoid race conditions)
currentObj, err := getDynamicClient().Get(ctx, name, namespace)
if errors.IsNotFound(err) {
log.Printf("Resource %s no longer exists, skipping", name)
return nil // Not an error
}
// 2. Get current phase/status
status, found, _ := unstructured.NestedMap(currentObj.Object, "status")
phase := getPhaseOrDefault(status, "Pending")
// 3. Only reconcile if in expected state
if phase != "Pending" {
return nil // Already processed
}
// 4. Create resources idempotently (check existence first)
if _, err := getResource(name); err == nil {
log.Printf("Resource %s already exists", name)
return nil
}
// 5. Create and update status
createResource(...)
updateStatus(namespace, name, map[string]interface{}{"phase": "Creating"})
return nil
}Status Updates (use UpdateStatus subresource):
func updateAgenticSessionStatus(namespace, name string, updates map[string]interface{}) error {
gvr := types.GetAgenticSessionResource()
obj, err := config.DynamicClient.Resource(gvr).Namespace(namespace).Get(ctx, name, v1.GetOptions{})
if errors.IsNotFound(err) {
log.Printf("Resource deleted, skipping status update")
return nil // Not an error
}
if obj.Object["status"] == nil {
obj.Object["status"] = make(map[string]interface{})
}
status := obj.Object["status"].(map[string]interface{})
for k, v := range updates {
status[k] = v
}
// Use UpdateStatus subresource (requires /status permission)
_, err = config.DynamicClient.Resource(gvr).Namespace(namespace).UpdateStatus(ctx, obj, v1.UpdateOptions{})
if errors.IsNotFound(err) {
return nil // Resource deleted during update
}
return err
}Goroutine Monitoring:
// Start background monitoring (operator/internal/handlers/sessions.go:477)
go monitorJob(jobName, sessionName, namespace)
// Monitoring loop checks both K8s Job status AND custom container status
func monitorJob(jobName, sessionName, namespace string) {
for {
time.Sleep(5 * time.Second)
// 1. Check if parent resource still exists (exit if deleted)
if _, err := getSession(namespace, sessionName); errors.IsNotFound(err) {
log.Printf("Session deleted, stopping monitoring")
return
}
// 2. Check Job status
job, err := K8sClient.BatchV1().Jobs(namespace).Get(ctx, jobName, v1.GetOptions{})
if errors.IsNotFound(err) {
return
}
// 3. Update status based on Job conditions
if job.Status.Succeeded > 0 {
updateStatus(namespace, sessionName, map[string]interface{}{
"phase": "Completed",
"completionTime": time.Now().Format(time.RFC3339),
})
cleanup(namespace, jobName)
return
}
}
}Before committing backend or operator code, verify:
- Authentication: All user-facing endpoints use
GetK8sClientsForRequest(c) - Authorization: RBAC checks performed before resource access
- Error Handling: All errors logged with context, appropriate HTTP status codes
- Token Security: No tokens or sensitive data in logs
- Type Safety: Used
unstructured.Nested*helpers, checkedfoundbefore using values - Resource Cleanup: OwnerReferences set on all child resources
- Status Updates: Used
UpdateStatussubresource, handled IsNotFound gracefully - Tests: Added/updated tests for new functionality
- Logging: Structured logs with relevant context (namespace, resource name, etc.)
- Code Quality: Ran all linting checks locally (see below)
Run these commands before committing:
# Backend
cd components/backend
gofmt -l . # Check formatting (should output nothing)
go vet ./... # Detect suspicious constructs
golangci-lint run # Run comprehensive linting
# Operator
cd components/operator
gofmt -l .
go vet ./...
golangci-lint runAuto-format code:
gofmt -w components/backend components/operatorNote: GitHub Actions will automatically run these checks on your PR. Fix any issues locally before pushing.
Backend:
- ❌ Using service account client for user operations (always use user token)
- ❌ Not checking if user-scoped client creation succeeded
- ❌ Logging full token values (use
len(token)instead) - ❌ Not validating project access in middleware
- ❌ Type assertions without checking:
val := obj["key"].(string)(useval, ok := ...) - ❌ Not setting OwnerReferences (causes resource leaks)
- ❌ Treating IsNotFound as fatal error during cleanup
- ❌ Exposing internal error details to API responses (use generic messages)
Operator:
- ❌ Not reconnecting watch on channel close
- ❌ Processing events without verifying resource still exists
- ❌ Updating status on main object instead of /status subresource
- ❌ Not checking current phase before reconciliation (causes duplicate resources)
- ❌ Creating resources without idempotency checks
- ❌ Goroutine leaks (not exiting monitor when resource deleted)
- ❌ Using
panic()in watch/reconciliation loops - ❌ Not setting SecurityContext on Job pods
Study these files to understand established patterns:
Backend:
components/backend/handlers/sessions.go- Complete session lifecycle, user/SA client usagecomponents/backend/handlers/middleware.go- Auth patterns, token extraction, RBACcomponents/backend/handlers/helpers.go- Utility functions (StringPtr, BoolPtr)components/backend/types/common.go- Type definitionscomponents/backend/server/server.go- Server setup, middleware chain, token redactioncomponents/backend/routes.go- HTTP route definitions and registration
Operator:
components/operator/internal/handlers/sessions.go- Watch loop, reconciliation, status updatescomponents/operator/internal/config/config.go- K8s client initializationcomponents/operator/internal/types/resources.go- GVR definitionscomponents/operator/internal/services/infrastructure.go- Reusable services
- Change detection: Only builds modified components (frontend, backend, operator, claude-runner)
- Multi-platform builds: linux/amd64 and linux/arm64
- Registry: Pushes to
quay.io/ambient_codeon main branch - PR builds: Build-only, no push on pull requests
- amber-issue-handler.yml: Amber background agent - automated fixes via GitHub issue labels (
amber:auto-fix,amber:refactor,amber:test-coverage) or/amber executecommand - amber-dependency-sync.yml: Daily sync of dependency versions to Amber agent knowledge base
- claude.yml: Claude Code integration - responds to
@claudementions in issues/PRs - claude-code-review.yml: Automated code reviews on pull requests
- go-lint.yml: Go code formatting, vetting, and linting (gofmt, go vet, golangci-lint)
- frontend-lint.yml: Frontend code quality (ESLint, TypeScript checking, build validation)
- prod-release-deploy.yaml: Production releases with semver versioning and changelog generation
- e2e.yml: End-to-end Cypress testing in kind cluster (see Testing Strategy section)
- test-local-dev.yml: Local development environment validation
- docs.yml: Deploy MkDocs documentation to GitHub Pages
- dependabot-auto-merge.yml: Auto-approve and merge Dependabot dependency updates
Purpose: Automated end-to-end testing of the complete vTeam stack in a Kubernetes environment.
Location: e2e/
Quick Start:
make e2e-test CONTAINER_ENGINE=podman # Or dockerWhat Gets Tested:
- ✅ Full vTeam deployment in kind (Kubernetes in Docker)
- ✅ Frontend UI rendering and navigation
- ✅ Backend API connectivity
- ✅ Project creation workflow (main user journey)
- ✅ Authentication with ServiceAccount tokens
- ✅ Ingress routing
- ✅ All pods deploy and become ready
What Doesn't Get Tested:
- ❌ OAuth proxy flow (uses direct token auth for simplicity)
- ❌ Session pod execution (requires Anthropic API key)
- ❌ Multi-user scenarios
Test Suite (e2e/cypress/e2e/vteam.cy.ts):
- UI loads with token authentication
- Navigate to new project page
- Create a new project
- List created projects
- Backend API cluster-info endpoint
CI Integration: Tests run automatically on all PRs via GitHub Actions (.github/workflows/e2e.yml)
Key Implementation Details:
- Architecture: Frontend without oauth-proxy, direct token injection via environment variables
- Authentication: Test user ServiceAccount with cluster-admin permissions
- Token Handling: Frontend deployment includes
OC_TOKEN,OC_USER,OC_EMAILenv vars - Podman Support: Auto-detects runtime, uses ports 8080/8443 for rootless Podman
- Ingress: Standard nginx-ingress with path-based routing
Adding New Tests:
it('should test new feature', () => {
cy.visit('/some-page')
cy.contains('Expected Content').should('be.visible')
cy.get('#button').click()
// Auth header automatically injected via beforeEach interceptor
})Debugging Tests:
cd e2e
source .env.test
CYPRESS_TEST_TOKEN="$TEST_TOKEN" CYPRESS_BASE_URL="http://vteam.local:8080" npm run test:headedDocumentation: See e2e/README.md and docs/testing/e2e-guide.md for comprehensive testing guide.
- Unit tests (
tests/unit/): Isolated component logic - Contract tests (
tests/contract/): API contract validation - Integration tests (
tests/integration/): End-to-end with real k8s cluster- Requires
TEST_NAMESPACEenvironment variable - Set
CLEANUP_RESOURCES=truefor automatic cleanup - Permission tests validate RBAC boundaries
- Requires
- Jest for component testing (when configured)
- Cypress for e2e testing (see E2E Tests section above)
- Controller reconciliation logic tests
- CRD validation tests
The MkDocs site (mkdocs.yml) provides:
- User Guide: Getting started, RFE creation, agent framework, configuration
- Developer Guide: Setup, architecture, plugin development, API reference, testing
- Labs: Hands-on exercises (basic → advanced → production)
- Basic: First RFE, agent interaction, workflow basics
- Advanced: Custom agents, workflow modification, integration testing
- Production: Jira integration, OpenShift deployment, scaling
- Reference: Agent personas, API endpoints, configuration schema, glossary
Default to improving existing documentation rather than creating new files. When adding or updating documentation (standalone files like .md, design docs, guides):
- Prefer inline updates: Improve existing markdown files or code comments
- Colocate new docs: When feasible, documentation should live in the subdirectory that has the relevant code (e.g.,
components/backend/README.md) not at the top level - Avoid top-level proliferation: Only create top-level docs for cross-cutting concerns (architecture, security, deployment)
- Follow established patterns: See
docs/amber-quickstart.mdandcomponents/backend/README.mdfor examples of well-organized documentation
Special lab track for leadership training located in docs/labs/director-training/:
- Structured exercises for understanding the vTeam system from a strategic perspective
- Validation reports for tracking completion and understanding
- API keys: Store in Kubernetes Secrets, managed via ProjectSettings CR
- RBAC: Namespace-scoped isolation prevents cross-project access
- OAuth integration: OpenShift OAuth for cluster-based authentication (see
docs/OPENSHIFT_OAUTH.md) - Network policies: Component isolation and secure communication
- Health endpoints:
/healthon backend API - Logs: Structured logging with OpenShift integration
- Metrics: Prometheus-compatible (when configured)
- Events: Kubernetes events for operator actions
- Horizontal Pod Autoscaling: Configure based on CPU/memory
- Job concurrency: Operator manages concurrent session execution
- Resource limits: Set appropriate requests/limits per component
- Multi-tenancy: Project-based isolation with shared infrastructure
See components/frontend/DESIGN_GUIDELINES.md for complete frontend development patterns.
- Zero
anyTypes - Use proper types,unknown, or generic constraints - Shadcn UI Components Only - Use
@/components/ui/*components, no custom UI from scratch - React Query for ALL Data Operations - Use hooks from
@/services/queries/*, no manualfetch() - Use
typeoverinterface- Always prefertypefor type definitions - Colocate Single-Use Components - Keep page-specific components with their pages
Before committing frontend code:
- Zero
anytypes (or justified with eslint-disable) - All UI uses Shadcn components
- All data operations use React Query
- Components under 200 lines
- Single-use components colocated with their pages
- All buttons have loading states
- All lists have empty states
- All nested pages have breadcrumbs
- All routes have loading.tsx, error.tsx
-
npm run buildpasses with 0 errors, 0 warnings - All types use
typeinstead ofinterface
components/frontend/DESIGN_GUIDELINES.md- Detailed patterns and examplescomponents/frontend/COMPONENT_PATTERNS.md- Architecture patternscomponents/frontend/src/components/ui/- Available Shadcn componentscomponents/frontend/src/services/- API service layer examples