-
Notifications
You must be signed in to change notification settings - Fork 400
Description
Agent Diagnostic
No agent diagnostic was run. The gateway did not survive long enough to support agent interaction — each run left the gateway in a broken state post-OOM, requiring a full delete and re-onboard before the next attempt.
Diagnosis was performed manually via dmesg and free -h output across three runs.
Description
openshell sandbox create --from is killed by the kernel OOM killer every time during the image push phase. The process allocates RAM unboundedly and synchronously — RSS grows to consume all available physical memory regardless of how much is available, and the allocation is fast enough that swap is never used before the OOM killer fires.
All 35 Dockerfile build steps complete successfully, and the image is built (~2.3GB). The kill happens consistently at the push-into-gateway step.
After the kill, the gateway is left in a broken state and requires a full delete and re-onboarding before the next attempt.
Expected: Image push into gateway uses bounded memory proportional to a streaming buffer, not total image size.
Actual: Process is killed every time. More RAM does not help — RSS scales with available memory. Swap does not help — swapents:0 confirms swap is never touched before the kill.
Reproduction Steps
- Onboard a gateway:
nemoclaw onboard - Run sandbox create from a local Dockerfile producing a ~2.3GB image:
openshell sandbox create --from ./Dockerfile --name test --policy ./policies/openclaw-sandbox.yaml -- nemoclaw-start - All Dockerfile build steps complete successfully
- Process is killed during the push phase:
Pushing image openshell/sandbox-from:1774758071 into gateway "nemoclaw" [progress] Exported 2358 MiB bash: line 1: XXXXX Killed - Confirm via
dmesg | grep -i oom— openshell process killed with RSS ~equal to total available RAM openshell sandbox listreturns no sandboxes
Reproduced on:
- 6GB RAM, no swap — RSS at kill ~4.7GB
- 10GB RAM, no swap — RSS at kill ~9.2GB
- 10GB RAM, 9.5GB swap — RSS at kill ~9.2GB, swapents:0
Environment
- OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
- Kernel: 5.15.0-173-generic feat(cli): auto-create providers for explicit --provider names #183-Ubuntu SMP
- OpenShell: 0.0.10
- Docker: 28.2.2
- Storage Driver: overlay2
- Cgroup Driver: systemd
- Cgroup Version: 2 (cgroup2fs)
- Virtualisation: KVM (QEMU VM on Proxmox)
- CPU: Intel i5-8500 @ 3.00GHz, 2 vCPUs, 1 thread/core
- RAM: 9.69 GiB total, ~8.6 GiB available at rest
- Gateway container: openshell-cluster-nemoclaw (healthy, cluster:0.0.10)
Logs
Agent-First Checklist
- I pointed my agent at the repo and had it investigate this issue
- I loaded relevant skills (e.g.,
debug-openshell-cluster,debug-inference,openshell-cli) - My agent could not resolve this — the diagnostic above explains why