Skip to content

openshell sandbox create --from allocates unbounded memory during image push, scaling to consume all available RAM #655

@ChrisMail-MXA

Description

@ChrisMail-MXA

Agent Diagnostic

No agent diagnostic was run. The gateway did not survive long enough to support agent interaction — each run left the gateway in a broken state post-OOM, requiring a full delete and re-onboard before the next attempt.

Diagnosis was performed manually via dmesg and free -h output across three runs.

Description

openshell sandbox create --from is killed by the kernel OOM killer every time during the image push phase. The process allocates RAM unboundedly and synchronously — RSS grows to consume all available physical memory regardless of how much is available, and the allocation is fast enough that swap is never used before the OOM killer fires.

All 35 Dockerfile build steps complete successfully, and the image is built (~2.3GB). The kill happens consistently at the push-into-gateway step.

After the kill, the gateway is left in a broken state and requires a full delete and re-onboarding before the next attempt.

Expected: Image push into gateway uses bounded memory proportional to a streaming buffer, not total image size.

Actual: Process is killed every time. More RAM does not help — RSS scales with available memory. Swap does not help — swapents:0 confirms swap is never touched before the kill.

Reproduction Steps

  1. Onboard a gateway: nemoclaw onboard
  2. Run sandbox create from a local Dockerfile producing a ~2.3GB image:
    openshell sandbox create --from ./Dockerfile --name test --policy ./policies/openclaw-sandbox.yaml -- nemoclaw-start
    
  3. All Dockerfile build steps complete successfully
  4. Process is killed during the push phase:
    Pushing image openshell/sandbox-from:1774758071 into gateway "nemoclaw"
    [progress] Exported 2358 MiB
    bash: line 1: XXXXX Killed
    
  5. Confirm via dmesg | grep -i oom — openshell process killed with RSS ~equal to total available RAM
  6. openshell sandbox list returns no sandboxes

Reproduced on:

  • 6GB RAM, no swap — RSS at kill ~4.7GB
  • 10GB RAM, no swap — RSS at kill ~9.2GB
  • 10GB RAM, 9.5GB swap — RSS at kill ~9.2GB, swapents:0

Environment

  • OS: Ubuntu 22.04.5 LTS (Jammy Jellyfish)
  • Kernel: 5.15.0-173-generic feat(cli): auto-create providers for explicit --provider names #183-Ubuntu SMP
  • OpenShell: 0.0.10
  • Docker: 28.2.2
  • Storage Driver: overlay2
  • Cgroup Driver: systemd
  • Cgroup Version: 2 (cgroup2fs)
  • Virtualisation: KVM (QEMU VM on Proxmox)
  • CPU: Intel i5-8500 @ 3.00GHz, 2 vCPUs, 1 thread/core
  • RAM: 9.69 GiB total, ~8.6 GiB available at rest
  • Gateway container: openshell-cluster-nemoclaw (healthy, cluster:0.0.10)

Logs

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions