Status: landed. The heterogeneous lab composes the commercial fleet director with open mesh/API worker primitives. See OpenCoreBoundary.md for the authoritative open/commercial split.
The lab runs heterogeneous Nexo hosts on one Docker bridge so you can test different images and security configurations together without extra hardware. peer-a is the commercial fleet director (Fleet.Host); peer-b and worker use open Nexo.API.
| Role | Dockerfile (override) | Runtime / env highlights | Auth (override) |
|---|---|---|---|
| peer-a | MESH_LAB_PEER_A_DOCKERFILE → .docker/Dockerfile.fleet-host |
ASPNETCORE_ENVIRONMENT = Production (default); LiteDB director persistence |
ApiKey (Nexo__Security__ApiKey) |
| peer-b | MESH_LAB_PEER_B_DOCKERFILE → .docker/Dockerfile.quickstart |
NEXO_ALLOW_MOCK=1, Staging |
ApiKeyOrBearerToken — same Nexo__Security__ApiKey or Nexo__Security__PeerB__BearerToken |
worker (profile workers) |
MESH_LAB_WORKER_DOCKERFILE → .docker/Dockerfile.api |
Development by default (MESH_LAB_WORKER_ASPNETCORE_ENVIRONMENT), ShowAdvisoryInPortal off by default |
ApiKeyOrBasic — API key or Basic (nexo + Nexo__Security__Worker__BasicAuthPassword) |
Optional heavier worker image: set MESH_LAB_WORKER_DOCKERFILE=.docker/Dockerfile.agent-server (SDK-based final image; slower CI/build, richer for local soak tests).
All Nexo__Security__* keys map to the same binding as production (Nexo:Security); see docs/Configuration.md.
- Docker Engine + Compose v2
python3on the host (used byscripts/mesh-lab-verify.shfor mesh task JSON assertions)- RAM for two full image builds (
api+quickstartdiffer in final stage); workers reuse theapiimage by default
Automated one-shot (temp secrets, tear down after verify):
make mesh-lab-e2e
# same as: bash scripts/run-mesh-lab-e2e.shInclude the worker tier (Compose profile workers; exercises Basic + API key against worker):
make mesh-lab-e2e-workers
# or: MESH_LAB_E2E_WORKERS=1 bash scripts/run-mesh-lab-e2e.sh
# or: bash scripts/run-mesh-lab-e2e.sh --workersRequires Docker running (docker info). Uses project name nexo_mesh_lab_local by default.
On Apple Silicon, builds default to linux/amd64 (same as CI) because grpc.tools/protoc can crash on linux_arm64 inside Docker. Override with DOCKER_DEFAULT_PLATFORM=linux/arm64 only if your setup handles it.
Persistent lab with your .env.mesh-lab file:
cp docs/config/mesh-lab.env.example .env.mesh-lab
# Set Nexo__Security__ApiKey, Nexo__Security__PeerB__BearerToken, Nexo__Security__Worker__BasicAuthPassword, MESH_LAB_PEER_REGISTRATION_KEY
make mesh-lab-up
make mesh-lab-verify
# …when finished…
make mesh-lab-downWith workers:
MESH_LAB_WORKERS=1 make mesh-lab-up
make mesh-lab-verify
make mesh-lab-downEquivalent raw Compose:
docker compose -f docker-compose.mesh-lab.yml --env-file .env.mesh-lab up -d --build
./scripts/mesh-lab-verify.sh .env.mesh-labHost URLs: http://127.0.0.1:18081 (peer-a), http://127.0.0.1:18082 (peer-b). With the workers profile, the worker also publishes http://127.0.0.1:18083 by default (MESH_LAB_WORKER_PUBLISH).
Requires python3 on the host (parses mesh task JSON).
- Host ↔ published ports:
GET /healthon peer-a and peer-b (API key and/or Bearer where configured). - In-network HTTP: an ephemeral
curlcontainer on themesh_labbridge resolvespeer-a/peer-band callshttp://peer-a:8080/healthandhttp://peer-b:8080/health. - Mesh control plane (JSON):
GET /api/mesh/fleet/nodesfrom the host and from inside the bridge (empty[]initially).GET /api/mesh/elastic/statuson peer-b (Bearer from host; unauthenticated GET from inside the bridge when mesh token is unset).POST /api/mesh/fleet/nodeson peer-a withX-Nexo-Api-Key, registeringhttp://peer-b:8080asmesh-lab-verify-peer, then verifies the fleet list includes that peer id.- Mesh task placement (same API key path):
POST /api/mesh/tasks→POST /api/mesh/tasks/{taskId}/schedule→AssignedwithassignedPeerId/assignedApiBaseUrl. - Mesh task lifecycle on the director (peer-a): wrong
leaseToken→ 409;PATCHRunningthenSucceededwith valid lease;GETconfirms terminal state and cleared lease. - Brick HTTP (optional): from the bridge,
GET /api/bricksandPOST /api/bricks/{id}/executeon peer-b (first catalog entry, orMESH_LAB_VERIFY_BRICK_ID); skipped if the catalog is empty.
- Worker tier when the Compose
workersprofile is running:GET /health, mesh reads with API key and Basic auth (via published:18083when mapped).
Deep checks (scripts/mesh-lab-verify-deep.sh): multi-step task, lease/extend, migrate-for-checkpoint, reschedule, and terminal Succeeded. CI runs this after the standard verify script. Locally:
make mesh-lab-verify-deep
# or one-shot E2E with deep:
MESH_LAB_VERIFY_DEEP=1 make mesh-lab-e2e-workersThat covers Docker DNS, bridge connectivity, security headers, director placement, lease-gated lifecycle, federated brick HTTP, and (with the workers profile) an autonomous worker executor on the worker container (Nexo:MeshLab:WorkerExecutor:Enabled). Tasks named with prefix mesh-lab-worker-exec are scheduled on the director and completed by the worker (Running → optional brick on assigned peer → Succeeded) without manual PATCH. Set MESH_LAB_SKIP_WORKER_EXECUTOR=1 to skip that check.
Entitlements (worker tier): scripts/mesh-lab-verify-entitlements.sh (invoked from standard verify when workers are up) checks CopilotScoped API key → 403 on POST /api/mesh/tasks, full key → 200, and Nexo:Entitlements:MaxCopilotSubmissionsPerHour → 429 after the configured limit. Configure Nexo__Security__CopilotScopedApiKey in .env.mesh-lab.
Trust-tier placement (peer-a director): default Nexo__Mesh__Placement__PeerTrustPolicy=trusted-only. scripts/mesh-lab-verify-trust.sh registers trusted vs untrusted fleet nodes and asserts placement picks trusted peers only. Set MESH_LAB_SKIP_TRUST_VERIFY=1 to skip.
Fleet governance (peer-a director): default Nexo__Mesh__Fleet__RequirePeerRegistrationKey=true. Each fleet register must include peerRegistrationKey in the JSON body (distinct from the operator Nexo__Security__ApiKey). Set MESH_LAB_PEER_REGISTRATION_KEY in .env.mesh-lab. scripts/mesh-lab-verify-governance.sh (invoked from standard verify) checks registration policy, credential rotation (fingerprint change), POST /api/mesh/fleet/nodes/{peerId}/revoke → placement blocked, and /admit → placement restored. Set MESH_LAB_SKIP_GOV_VERIFY=1 to skip.
Director CLI (Product 5.3): scripts/mesh-lab-verify-director-cli.sh exercises commercial mesh director CLI register|revoke|admit against the running lab (requires .NET SDK on the host). Set MESH_LAB_SKIP_DIRECTOR_CLI_VERIFY=1 to skip. Ops: docs/runbooks/mesh-lab-operations.md (split-brain, upgrade order, director vs instances.json).
Director persistence (Phase 9): peer-a uses LiteDB (Nexo__Mesh__Persistence__Provider=LiteDb, volume mesh_lab_peer_a_data). scripts/mesh-lab-verify-persistence.sh restarts peer-a and asserts fleet + tasks survive. Set MESH_LAB_SKIP_PERSISTENCE_VERIFY=1 to skip. See MeshPhase9DirectorPersistence.md.
Network negative (Phase 11): scripts/mesh-lab-verify-network-negative.sh exercises blackhole/DNS worker URLs, drained-only placement, peer-b stop/start partition, and director restart + lease when LiteDB is on. Set MESH_LAB_SKIP_NETWORK_NEGATIVE_VERIFY=1 to skip. See MeshPhase11NetworkNegative.md.
Data plane & federation (Phase 13): mesh-lab-verify-knowledge.sh, mesh-lab-verify-federation.sh, mesh-lab-verify-retry-result.sh, mesh-lab-verify-elastic.sh — knowledge export/import, federated brick catalog on peer-a, task retry + result download, queue-depth placement + heartbeat. Skip with MESH_LAB_SKIP_*_VERIFY=1 (see MeshPhase13DataPlaneFederation.md).
Ops / commercial: docs/runbooks/mesh-lab-operations.md, docs/commercial/mesh-add-on-sku.md.
This Cursor/workspace cannot provision cloud VMs or run Docker for you. On any fresh Ubuntu/Debian VM (AWS/GCP/Azure/Linode):
-
SSH into the VM and install Git if needed.
-
Clone this repository and
cdto the repo root. -
Run:
chmod +x scripts/bootstrap-cloud-mesh-lab.sh ./scripts/bootstrap-cloud-mesh-lab.sh --install-docker --workers --deep
--install-dockerusesapt-getto installdocker.ioanddocker-compose-v2, then creates.env.mesh-labfromdocs/config/mesh-lab.env.example(random lab secrets),docker compose up --build,mesh-lab-verify.sh, and (with--deep)mesh-lab-verify-deep.sh. Use--workersfor the autonomous worker executor checks. -
From your laptop, tunnel the peer ports if the VM has no public listener:
ssh -L 18081:127.0.0.1:18081 -L 18082:127.0.0.1:18082 user@your-vm
Then open
http://127.0.0.1:18081/18082locally.
If Docker is already installed, omit --install-docker. For non-apt Linux, install Docker + Compose v2 manually (Docker Engine install), then run ./scripts/bootstrap-cloud-mesh-lab.sh without --install-docker.
docker compose --profile workers -f docker-compose.mesh-lab.yml --env-file .env.mesh-lab up -d --scale worker=2 worker
./scripts/mesh-lab-stress-ramp.sh .env.mesh-lab 8 2 30 4
# one-shot (verify + deep + ramp):
make mesh-lab-e2e-stress
# persistent lab already up:
make mesh-lab-stressCI: .github/workflows/mesh-lab-stress-gate.yml runs weekly (Mondays 06:00 UTC) and on workflow_dispatch — full verify (same sub-checks as PR gate), deep, stress ramp (4 workers, 15 requests/step), then mesh-lab-verify-post-stress.sh. Default PR gate remains mesh-lab-gate.yml without stress. See MeshPhase10LabStressHardening.md.
export NEXO_MESH_DIRECTOR_BASE_URL=http://127.0.0.1:18081
export NEXO_MESH_API_KEY='your-key'
dotnet run --project commercial/src/Nexo.Commercial.MeshDirector -- director get /health --json
export NEXO_MESH_DIRECTOR_BASE_URL=http://127.0.0.1:18082
# peer-b accepts Bearer OR same API key:
dotnet run --project commercial/src/Nexo.Commercial.MeshDirector -- director get /health --jsonUse host URLs from above; list local peers with nexo mesh peers or fleet nodes with director list-nodes.
docker compose --profile workers -f docker-compose.mesh-lab.yml --env-file .env.mesh-lab down -vPre-production gaps that Docker bridge alone cannot cover are automated in Phase 12 — see MeshPhase12RemoteNetworkingAutomation.md.
| Check | Local | CI |
|---|---|---|
| HTTPS director (Caddy → peer-a) | make mesh-lab-e2e-tls |
mesh-lab-tls-gate.yml (weekly) |
| gRPC transport (Kestrel round-trip) | dotnet test … --filter Category=ProdStyle |
grpc-transport-gate.yml |
| Two-host / tailnet | scripts/mesh-lab-verify-remote.sh + env |
mesh-lab-remote-gate.yml (workflow_dispatch + secrets) |
.github/workflows/mesh-lab-gate.yml writes lab secrets, brings up peers and the workers profile, then runs mesh-lab-verify.sh and mesh-lab-verify-deep.sh.
Optional dotnet mirror: set NEXO_RUN_MESH_LAB=1 and run make test-mesh-lab (or dotnet test … --filter Category=MeshLab). MeshLabDockerE2ETests starts the same Compose stack and invokes the same verify scripts; skipped when the env var is unset.
| Date | Change |
|---|---|
| 2026-04-23 | Initial virtual mesh lab. |
| 2026-04-24 | Scalable workers + stress ramp. |
| 2026-05-17 | Mesh verify: task create→schedule placement; worker profile in CI; stronger worker auth checks. |
| 2026-05-18 | Lease lifecycle + brick execute in verify; mesh-lab-verify-deep (checkpoint/migrate/reschedule). |
| 2026-05-18 | Optional MeshLabDockerE2ETests (NEXO_RUN_MESH_LAB=1, make test-mesh-lab). |
| 2026-05-18 | Fix GrpcAgentTransport DI (ambient barrier context); worker tier defaults to Development again. |
| 2026-05-19 | Lab worker executor (Nexo:MeshLab:WorkerExecutor) on worker container; verify script autonomous task path. |
| 2026-05-19 | Stress gate workflow; bootstrap --deep; make mesh-lab-e2e-stress / make mesh-lab-stress. |
| 2026-05-19 | Entitlements verify (CopilotScoped + hourly copilot quota on worker). |
| 2026-05-19 | Fleet governance (registration key, admit/revoke); director CLI verify; expanded ops runbook. |
| 2026-05-19 | LiteDB director persistence on peer-a; mesh-lab-verify-persistence (restart survival). |
| 2026-05-19 | Stress gate parity + post-stress placement/persistence (MeshPhase10LabStressHardening.md). |
| 2026-05-19 | Network-negative verify (unreachable workers, peer-b outage, director restart lease). |
| 2026-05-19 | Trust-tier placement on director; stress burst pass/fail; mesh lab runbook + SKU sketch. |
| 2026-05-19 | Phase 12: TLS E2E, gRPC gate, remote verify script + automation doc. |
| 2026-05-19 | Phase 13: knowledge, federation, retry/result, elastic verify scripts. |