diff --git a/docs/metrics.md b/docs/metrics.md
new file mode 100644
index 0000000..c236a73
--- /dev/null
+++ b/docs/metrics.md
@@ -0,0 +1,278 @@
+# Metrics Documentation
+
+This document describes all Prometheus metrics exposed by HyperFleet API, including their meanings, expected ranges, and example queries for common investigations.
+
+## Metrics Endpoint
+
+Metrics are exposed at:
+- **Endpoint**: `/metrics`
+- **Port**: 9090 (default, configurable via `--metrics-server-bindaddress`)
+- **Format**: OpenMetrics/Prometheus text format
+
+## Application Metrics
+
+### API Request Metrics
+
+These metrics track all inbound HTTP requests to the API server.
+
+#### `api_inbound_request_count`
+
+**Type:** Counter
+
+**Description:** Total number of HTTP requests served by the API.
+
+**Labels:**
+
+| Label | Description | Example Values |
+|-------|-------------|----------------|
+| `method` | HTTP method | `GET`, `POST`, `PUT`, `PATCH`, `DELETE` |
+| `path` | Request path (with IDs replaced by `-`) | `/api/hyperfleet/v1/clusters/-` |
+| `code` | HTTP response status code | `200`, `201`, `400`, `404`, `500` |
+
+**Path normalization:** Object identifiers in paths are replaced with `-` to reduce cardinality. For example, `/api/hyperfleet/v1/clusters/abc123` becomes `/api/hyperfleet/v1/clusters/-`.
+
+**Example output:**
+```text
+api_inbound_request_count{code="200",method="GET",path="/api/hyperfleet/v1/clusters"} 1523
+api_inbound_request_count{code="200",method="GET",path="/api/hyperfleet/v1/clusters/-"} 8742
+api_inbound_request_count{code="201",method="POST",path="/api/hyperfleet/v1/clusters"} 156
+api_inbound_request_count{code="404",method="GET",path="/api/hyperfleet/v1/clusters/-"} 23
+```
+
+#### `api_inbound_request_duration`
+
+**Type:** Histogram
+
+**Description:** Distribution of request processing times in seconds.
+
+**Labels:** Same as `api_inbound_request_count`
+
+**Buckets:** `0.1s`, `1s`, `10s`, `30s`
+
+**Derived metrics:**
+- `api_inbound_request_duration_sum` - Total time spent processing requests
+- `api_inbound_request_duration_count` - Number of requests measured
+- `api_inbound_request_duration_bucket` - Number of requests completed within each bucket
+
+**Example output:**
+```text
+api_inbound_request_duration_bucket{code="200",method="GET",path="/api/hyperfleet/v1/clusters",le="0.1"} 1450
+api_inbound_request_duration_bucket{code="200",method="GET",path="/api/hyperfleet/v1/clusters",le="1"} 1520
+api_inbound_request_duration_bucket{code="200",method="GET",path="/api/hyperfleet/v1/clusters",le="10"} 1523
+api_inbound_request_duration_bucket{code="200",method="GET",path="/api/hyperfleet/v1/clusters",le="30"} 1523
+api_inbound_request_duration_bucket{code="200",method="GET",path="/api/hyperfleet/v1/clusters",le="+Inf"} 1523
+api_inbound_request_duration_sum{code="200",method="GET",path="/api/hyperfleet/v1/clusters"} 45.23
+api_inbound_request_duration_count{code="200",method="GET",path="/api/hyperfleet/v1/clusters"} 1523
+```
+
+## Go Runtime Metrics
+
+The following metrics are automatically exposed by the Prometheus Go client library.
+
+### Process Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `process_cpu_seconds_total` | Counter | Total user and system CPU time spent in seconds |
+| `process_max_fds` | Gauge | Maximum number of open file descriptors |
+| `process_open_fds` | Gauge | Number of open file descriptors |
+| `process_resident_memory_bytes` | Gauge | Resident memory size in bytes |
+| `process_start_time_seconds` | Gauge | Start time of the process since unix epoch |
+| `process_virtual_memory_bytes` | Gauge | Virtual memory size in bytes |
+
+### Go Runtime Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `go_gc_duration_seconds` | Summary | A summary of pause durations during GC cycles |
+| `go_goroutines` | Gauge | Number of goroutines currently existing |
+| `go_memstats_alloc_bytes` | Gauge | Bytes allocated and still in use |
+| `go_memstats_alloc_bytes_total` | Counter | Total bytes allocated (even if freed) |
+| `go_memstats_heap_alloc_bytes` | Gauge | Heap bytes allocated and still in use |
+| `go_memstats_heap_idle_bytes` | Gauge | Heap bytes waiting to be used |
+| `go_memstats_heap_inuse_bytes` | Gauge | Heap bytes in use |
+| `go_memstats_heap_objects` | Gauge | Number of allocated objects |
+| `go_memstats_heap_sys_bytes` | Gauge | Heap bytes obtained from system |
+| `go_memstats_sys_bytes` | Gauge | Total bytes obtained from system |
+| `go_threads` | Gauge | Number of OS threads created |
+
+## Expected Ranges and Alerting Thresholds
+
+### Request Rate
+
+| Condition | Threshold | Severity | Description |
+|-----------|-----------|----------|-------------|
+| Normal | < 1000 req/s | - | Normal operating range |
+| Warning | > 1000 req/s | Warning | High load, monitor closely |
+| Critical | > 5000 req/s | Critical | Capacity limit approaching |
+
+### Error Rate
+
+| Condition | Threshold | Severity | Description |
+|-----------|-----------|----------|-------------|
+| Normal | < 1% | - | Normal error rate |
+| Warning | 1-5% | Warning | Elevated errors, investigate |
+| Critical | > 5% | Critical | High error rate, immediate action |
+
+### Latency (P99)
+
+| Condition | Threshold | Severity | Description |
+|-----------|-----------|----------|-------------|
+| Normal | < 500ms | - | Good response times |
+| Warning | 500ms - 2s | Warning | Degraded performance |
+| Critical | > 2s | Critical | Unacceptable latency |
+
+### Memory Usage
+
+| Condition | Threshold | Severity | Description |
+|-----------|-----------|----------|-------------|
+| Normal | < 70% of limit | - | Healthy memory usage |
+| Warning | 70-85% of limit | Warning | Memory pressure |
+| Critical | > 85% of limit | Critical | OOM risk |
+
+### Goroutines
+
+| Condition | Threshold | Severity | Description |
+|-----------|-----------|----------|-------------|
+| Normal | < 1000 | - | Normal goroutine count |
+| Warning | 1000-5000 | Warning | High goroutine count |
+| Critical | > 5000 | Critical | Possible goroutine leak |
+
+## Example PromQL Queries
+
+### Request Rate
+
+```promql
+# Total request rate (requests per second)
+sum(rate(api_inbound_request_count[5m]))
+
+# Request rate by pod/instance
+sum(rate(api_inbound_request_count[5m])) by (instance)
+
+# Request rate by endpoint
+sum(rate(api_inbound_request_count[5m])) by (path)
+
+# Request rate by status code
+sum(rate(api_inbound_request_count[5m])) by (code)
+
+# Request rate by method
+sum(rate(api_inbound_request_count[5m])) by (method)
+```
+
+### Error Rate
+
+```promql
+# Overall error rate (5xx responses)
+sum(rate(api_inbound_request_count{code=~"5.."}[5m])) /
+sum(rate(api_inbound_request_count[5m])) * 100
+
+# Error rate by endpoint
+sum(rate(api_inbound_request_count{code=~"5.."}[5m])) by (path) /
+sum(rate(api_inbound_request_count[5m])) by (path) * 100
+
+# Client error rate (4xx responses)
+sum(rate(api_inbound_request_count{code=~"4.."}[5m])) /
+sum(rate(api_inbound_request_count[5m])) * 100
+```
+
+### Latency
+
+```promql
+# Average request duration (last 10 minutes)
+rate(api_inbound_request_duration_sum[10m]) /
+rate(api_inbound_request_duration_count[10m])
+
+# Average request duration by endpoint
+sum(rate(api_inbound_request_duration_sum[5m])) by (path) /
+sum(rate(api_inbound_request_duration_count[5m])) by (path)
+
+# P50 latency (approximate using histogram)
+histogram_quantile(0.5, sum(rate(api_inbound_request_duration_bucket[5m])) by (le))
+
+# P90 latency
+histogram_quantile(0.9, sum(rate(api_inbound_request_duration_bucket[5m])) by (le))
+
+# P99 latency
+histogram_quantile(0.99, sum(rate(api_inbound_request_duration_bucket[5m])) by (le))
+
+# P99 latency by endpoint
+histogram_quantile(0.99, sum(rate(api_inbound_request_duration_bucket[5m])) by (le, path))
+```
+
+### Resource Usage
+
+```promql
+# Memory usage in MB
+process_resident_memory_bytes / 1024 / 1024
+
+# Memory usage trend (increase over 1 hour)
+delta(process_resident_memory_bytes[1h]) / 1024 / 1024
+
+# Goroutine count
+go_goroutines
+
+# Goroutine trend
+delta(go_goroutines[1h])
+
+# CPU usage rate
+rate(process_cpu_seconds_total[5m])
+
+# File descriptor usage percentage
+process_open_fds / process_max_fds * 100
+```
+
+### Common Investigation Queries
+
+```promql
+# Slowest endpoints (average latency)
+topk(10,
+  sum(rate(api_inbound_request_duration_sum[5m])) by (path) /
+  sum(rate(api_inbound_request_duration_count[5m])) by (path)
+)
+
+# Most requested endpoints
+topk(10, sum(rate(api_inbound_request_count[5m])) by (path))
+
+# Endpoints with highest error rate
+topk(10,
+  sum(rate(api_inbound_request_count{code=~"5.."}[5m])) by (path) /
+  sum(rate(api_inbound_request_count[5m])) by (path)
+)
+
+# Percentage of requests taking longer than 1 second
+1 - (sum(rate(api_inbound_request_duration_bucket{le="1"}[5m])) /
+sum(rate(api_inbound_request_duration_count[5m])))
+```
+
+## Prometheus Operator Integration
+
+If using Prometheus Operator, enable the ServiceMonitor in Helm values:
+
+```yaml
+serviceMonitor:
+  enabled: true
+  interval: 30s
+  scrapeTimeout: 10s
+  labels:
+    release: prometheus  # Match your Prometheus selector
+```
+
+See [Deployment Guide](deployment.md#prometheus-operator-integration) for details.
+
+## Grafana Dashboard
+
+Example dashboard JSON for HyperFleet API monitoring is available in the architecture repository. Key panels to include:
+
+1. **Request Rate** - Total requests per second over time
+2. **Error Rate** - Percentage of 5xx responses
+3. **Latency Distribution** - P50, P90, P99 latencies
+4. **Request Duration Heatmap** - Visual distribution of request times
+5. **Top Endpoints** - Most frequently accessed paths
+6. **Memory Usage** - Resident memory over time
+7. **Goroutines** - Goroutine count over time
+
+## Related Documentation
+
+- [Operational Runbook](runbook.md) - Troubleshooting and operational procedures
+- [Deployment Guide](deployment.md) - Deployment and ServiceMonitor configuration
+- [Development Guide](development.md) - Local development setup
diff --git a/docs/runbook.md b/docs/runbook.md
new file mode 100644
index 0000000..4b5f2b4
--- /dev/null
+++ b/docs/runbook.md
@@ -0,0 +1,402 @@
+# Operational Runbook
+
+This runbook provides operational procedures for managing HyperFleet API in production environments.
+
+## Service Overview
+
+HyperFleet API is a REST service that manages HyperFleet cluster and nodepool resources. It exposes:
+
+- **API Server**: Port 8000 - REST API endpoints
+- **Health Server**: Port 8080 - Liveness (`/healthz`) and readiness (`/readyz`) probes
+- **Metrics Server**: Port 9090 - Prometheus metrics (`/metrics`)
+
+### Architecture Diagram
+
+```text
+                                 ┌─────────────────────────────────────┐
+                                 │         hyperfleet-api Pod          │
+                                 │                                     │
+    ┌─────────────┐              │  ┌─────────────────────────────┐   │
+    │   Clients   │──────────────┼─▶│     API Server (:8000)      │   │
+    │             │    REST API  │  │  /api/hyperfleet/v1/*       │   │
+    └─────────────┘              │  └──────────────┬──────────────┘   │
+                                 │                 │                   │
+    ┌─────────────┐              │  ┌──────────────▼──────────────┐   │
+    │ Kubernetes  │──────────────┼─▶│   Health Server (:8080)     │   │
+    │   Probes    │   HTTP GET   │  │  /healthz  /readyz          │   │
+    └─────────────┘              │  └──────────────┬──────────────┘   │
+                                 │                 │                   │
+    ┌─────────────┐              │  ┌──────────────▼──────────────┐   │
+    │ Prometheus  │──────────────┼─▶│  Metrics Server (:9090)     │   │
+    │             │   Scrape     │  │  /metrics                   │   │
+    └─────────────┘              │  └─────────────────────────────┘   │
+                                 │                 │                   │
+                                 └─────────────────┼───────────────────┘
+                                                   │
+                                                   ▼
+                                 ┌─────────────────────────────────────┐
+                                 │            PostgreSQL               │
+                                 │      (clusters, nodepools)          │
+                                 └─────────────────────────────────────┘
+```
+
+## Health Check Interpretation
+
+### Liveness Probe (`/healthz`)
+
+The liveness probe indicates whether the application process is alive and responsive.
+
+| Response | Status | Meaning |
+|----------|--------|---------|
+| `200 OK` | `{"status": "ok"}` | Process is alive and responsive |
+
+**Note:** The liveness probe always returns 200 OK if the HTTP server is responding. If the process crashes or hangs, Kubernetes will not receive a response and will restart the pod.
+
+**When liveness probe times out or connection fails:**
+- The pod will be restarted by Kubernetes
+- Check logs for fatal errors or panics
+- This should be rare; frequent restarts indicate a serious issue
+
+### Readiness Probe (`/readyz`)
+
+The readiness probe indicates whether the application is ready to receive traffic.
+
+| Response | Status | Meaning |
+|----------|--------|---------|
+| `200 OK` | `{"status": "ok"}` | Ready to receive traffic |
+| `503 Service Unavailable` | `{"status": "not_ready"}` | Still initializing or dependencies unavailable |
+| `503 Service Unavailable` | `{"status": "shutting_down"}` | Graceful shutdown in progress |
+
+**Readiness checks include:**
+- Application initialization complete
+- Database connection available and responding to pings
+- Not in shutdown state
+
+**When readiness fails:**
+- Pod is removed from service endpoints (no traffic routed)
+- Rolling updates will not promote new pods until they become ready
+- Check database connectivity first
+- Verify all required environment variables are set
+- Check startup logs for initialization errors
+
+## Common Operational Procedures
+
+### Restarting the Service
+
+#### Single Pod Restart
+
+```bash
+# Delete a specific pod (Kubernetes will recreate it)
+kubectl delete pod <pod-name> -n hyperfleet-system
+
+# Or rollout restart the entire deployment
+kubectl rollout restart deployment/hyperfleet-api -n hyperfleet-system
+```
+
+#### Verify Restart Success
+
+```bash
+# Watch pods come up
+kubectl get pods -n hyperfleet-system -w
+
+# Check readiness
+kubectl get pods -n hyperfleet-system -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}'
+
+# Verify health endpoints
+kubectl port-forward svc/hyperfleet-api-health 8080:8080 -n hyperfleet-system &
+curl http://localhost:8080/healthz
+curl http://localhost:8080/readyz
+```
+
+### Scaling the Service
+
+#### Manual Scaling
+
+```bash
+# Scale up
+kubectl scale deployment/hyperfleet-api --replicas=5 -n hyperfleet-system
+
+# Scale down
+kubectl scale deployment/hyperfleet-api --replicas=2 -n hyperfleet-system
+```
+
+#### Verify Scaling
+
+```bash
+# Check replica count
+kubectl get deployment hyperfleet-api -n hyperfleet-system
+
+# Verify all pods are ready
+kubectl get pods -n hyperfleet-system -l app=hyperfleet-api
+```
+
+### Database Operations
+
+#### Check Database Connectivity
+
+```bash
+# Check readiness probe (includes DB connectivity check)
+kubectl port-forward svc/hyperfleet-api-health 8080:8080 -n hyperfleet-system &
+curl http://localhost:8080/readyz
+
+# If readiness returns "Database ping failed", use a debug pod to test connectivity
+kubectl run pg-debug --rm -it --image=postgres:15-alpine --restart=Never -n hyperfleet-system -- \
+  pg_isready -h <db-host> -p <db-port>
+```
+
+#### Database Connection Pool Issues
+
+If you see `connection refused` or `too many connections` errors:
+
+1. Check current connection count on database
+2. Verify `--db-max-open-connections` setting (default: 50)
+3. Consider scaling down replicas to reduce connection load
+4. Check for connection leaks in recent deployments
+
+#### Database Migrations
+
+Migrations run automatically via an init container (`db-migrate`) before the main application starts. This happens on every deployment.
+
+To manually run migrations (rarely needed):
+
+```bash
+# Run a one-off migration job
+kubectl run hyperfleet-migrate --rm -it \
+  --image=quay.io/openshift-hyperfleet/hyperfleet-api:latest \
+  --restart=Never \
+  -n hyperfleet-system \
+  --overrides='{"spec":{"containers":[{"name":"hyperfleet-migrate","image":"quay.io/openshift-hyperfleet/hyperfleet-api:latest","command":["/app/hyperfleet-api","migrate"],"volumeMounts":[{"name":"secrets","mountPath":"/build/secrets","readOnly":true}]}],"volumes":[{"name":"secrets","secret":{"secretName":"hyperfleet-db-external"}}]}}' \
+  -- /app/hyperfleet-api migrate
+```
+
+Or trigger a rollout restart to re-run the init container:
+
+```bash
+kubectl rollout restart deployment/hyperfleet-api -n hyperfleet-system
+```
+
+### Log Analysis
+
+#### View Real-time Logs
+
+```bash
+# Single pod
+kubectl logs -f deployment/hyperfleet-api -n hyperfleet-system
+
+# All pods
+kubectl logs -f -l app=hyperfleet-api -n hyperfleet-system --max-log-requests=10
+```
+
+#### Search for Errors
+
+```bash
+# Recent errors
+kubectl logs deployment/hyperfleet-api -n hyperfleet-system --since=1h | grep -i error
+
+# Structured log query (if using JSON logs)
+kubectl logs deployment/hyperfleet-api -n hyperfleet-system --since=1h | jq 'select(.level == "error")'
+```
+
+## Troubleshooting Guide
+
+### Pod Not Starting
+
+**Symptoms:** Pod stuck in `Pending`, `ContainerCreating`, or `CrashLoopBackOff`
+
+**Diagnosis:**
+```bash
+kubectl describe pod <pod-name> -n hyperfleet-system
+kubectl get events -n hyperfleet-system --sort-by='.lastTimestamp' | tail -20
+```
+
+**Common causes:**
+- **ImagePullBackOff**: Check image name, tag, and registry credentials
+- **Insufficient resources**: Check node capacity and resource requests
+- **ConfigMap/Secret not found**: Verify all required configs exist
+
+### Pod Crashing on Startup
+
+**Symptoms:** `CrashLoopBackOff` status, restarts > 0
+
+**Diagnosis:**
+```bash
+# Check previous container logs
+kubectl logs <pod-name> -n hyperfleet-system --previous
+
+# Check events
+kubectl describe pod <pod-name> -n hyperfleet-system
+```
+
+**Common causes:**
+- Missing or invalid environment variables
+- Database connection failure
+- Invalid configuration file
+- Port already in use (unlikely in Kubernetes)
+
+### High Latency
+
+**Symptoms:** Slow API responses, timeouts
+
+**Diagnosis:**
+```bash
+# Check request duration metrics
+curl -s http://<metrics-endpoint>:9090/metrics | grep api_inbound_request_duration
+
+# Check pod resource usage
+kubectl top pods -n hyperfleet-system
+```
+
+**Common causes:**
+- Database query performance issues
+- Insufficient CPU/memory resources
+- Network latency to database
+- High concurrent request load
+
+### High Error Rate
+
+**Symptoms:** Increased 5xx responses, error logs
+
+**Diagnosis:**
+```bash
+# Check error count by path and code
+curl -s http://<metrics-endpoint>:9090/metrics | grep api_inbound_request_count
+
+# Review error logs
+kubectl logs deployment/hyperfleet-api -n hyperfleet-system --since=15m | grep -i error
+```
+
+**Common causes:**
+- Database connection issues
+- Invalid request data
+- Upstream service failures
+- Resource exhaustion
+
+### Database Connection Errors
+
+**Symptoms:** `connection refused`, `no such host`, `connection reset`
+
+**Diagnosis:**
+```bash
+# Check readiness probe (includes DB check)
+kubectl port-forward svc/hyperfleet-api-health 8080:8080 -n hyperfleet-system &
+curl http://localhost:8080/readyz
+
+# Test connectivity using a debug pod
+kubectl run pg-debug --rm -it --image=postgres:15-alpine --restart=Never -n hyperfleet-system -- \
+  pg_isready -h <db-host> -p <db-port>
+
+# Check database secret exists and has expected keys (does not print values)
+kubectl get secret hyperfleet-db -n hyperfleet-system -o go-template='{{range $k,$v := .data}}{{println $k}}{{end}}'
+```
+
+**Resolution:**
+1. Verify database host and port are correct
+2. Check network policies allow egress to database
+3. Verify database credentials are valid
+4. Check database is running and accepting connections
+5. Verify SSL settings match database requirements
+
+### Memory Issues
+
+**Symptoms:** OOMKilled, high memory usage
+
+**Diagnosis:**
+```bash
+# Check memory usage
+kubectl top pods -n hyperfleet-system
+
+# Check for OOMKilled events
+kubectl get events -n hyperfleet-system | grep -i oom
+```
+
+**Resolution:**
+1. Increase memory limits in deployment
+2. Check for memory leaks (increasing memory over time)
+3. Review query patterns that may load large datasets
+
+## Recovery Procedures
+
+### Complete Service Recovery
+
+If the service is completely down:
+
+1. **Check namespace exists:**
+   ```bash
+   kubectl get namespace hyperfleet-system
+   ```
+
+2. **Check deployment exists:**
+   ```bash
+   kubectl get deployment hyperfleet-api -n hyperfleet-system
+   ```
+
+3. **Force recreate all pods:**
+   ```bash
+   kubectl rollout restart deployment/hyperfleet-api -n hyperfleet-system
+   ```
+
+4. **Verify recovery:**
+   ```bash
+   kubectl rollout status deployment/hyperfleet-api -n hyperfleet-system
+   ```
+
+### Database Recovery
+
+If database is unavailable:
+
+1. **Verify database status** (external DB or PostgreSQL pod)
+2. **Check connectivity** from API pods
+3. **If using built-in PostgreSQL:**
+   ```bash
+   kubectl rollout restart statefulset/hyperfleet-postgresql -n hyperfleet-system
+   ```
+4. **Wait for readiness probes to pass** before routing traffic
+
+### Rollback to Previous Version
+
+```bash
+# View rollout history
+kubectl rollout history deployment/hyperfleet-api -n hyperfleet-system
+
+# Rollback to previous version
+kubectl rollout undo deployment/hyperfleet-api -n hyperfleet-system
+
+# Rollback to specific revision
+kubectl rollout undo deployment/hyperfleet-api -n hyperfleet-system --to-revision=2
+```
+
+## Escalation Paths
+
+### Severity Levels
+
+| Level | Description | Response Time | Example |
+|-------|-------------|---------------|---------|
+| **P1 - Critical** | Complete service outage | Immediate | All pods crashing, database unavailable |
+| **P2 - High** | Degraded service | 30 minutes | High error rate, significant latency |
+| **P3 - Medium** | Minor impact | 4 hours | Single pod issues, non-critical errors |
+| **P4 - Low** | No user impact | Next business day | Log noise, documentation issues |
+
+### Escalation Contacts
+
+For all HyperFleet issues, escalate via the team Slack channel:
+
+- **Channel**: [#hcm-hyperfleet-team](https://redhat.enterprise.slack.com/archives/C0916E39DQV)
+
+### When to Escalate
+
+- **Escalate immediately** if:
+  - Complete service outage affecting users
+  - Data integrity issues suspected
+  - Security incident detected
+  - Unable to diagnose issue within 30 minutes
+
+- **Escalate within 1 hour** if:
+  - Partial outage or degraded performance
+  - Issue requires access you don't have
+  - Root cause is unclear after initial investigation
+
+## Related Documentation
+
+- [Deployment Guide](deployment.md) - Deployment and configuration
+- [Metrics Documentation](metrics.md) - Prometheus metrics reference
+- [Development Guide](development.md) - Local development setup