Project
cortex
Description
When a process running inside cortex sandbox is killed by the Linux OOM (Out-of-Memory) killer, cortex reports a generic exit code and error message that doesn't indicate OOM was the cause. This makes debugging memory issues extremely difficult, as the user has no indication that memory exhaustion occurred.
Error Message
$ cortex sandbox linux -- python3 -c "x = [0] * (10**9)"
Sandbox process exited with code 137
# Exit code 137 = 128 + 9 (SIGKILL), but no indication WHY it was killed
# User has to guess: was it OOM? Was it timeout? Was it manual kill?
Debug Logs
$ RUST_LOG=debug cortex sandbox linux -- python3 -c "x = [0] * (10**9)"
[DEBUG] Starting sandbox with cgroup memory limit: 512MB
[DEBUG] Spawning process: python3 -c "x = [0] * (10**9)"
[DEBUG] Process started with PID: 12345
[DEBUG] Process received signal: SIGKILL
[DEBUG] Sandbox terminated with exit code: 137
# No log about memory.current, memory.max, or oom_kill events!
System Information
Bounty Version: 0.1.0
OS: Ubuntu 24.04 LTS
CPU: AMD EPYC-Genoa Processor (8 cores)
RAM: 15 GB
Cgroup: v2 (unified hierarchy)
Screenshots
No response
Steps to Reproduce
-
Start a sandbox with default memory limits:
cortex sandbox linux -- python3 -c "x = [0] * (10**9)"
-
Observe the exit code:
Sandbox process exited with code 137
-
Check the system journal for OOM events (manual detective work):
sudo dmesg | grep -i oom
# Shows: python3 invoked oom-killer: gfp_mask=0x...
-
Note that cortex provided no indication that OOM was the cause
Expected Behavior
Cortex should detect and report OOM kills explicitly:
- Check cgroup OOM events: Read
memory.events after process exit to check if oom_kill counter increased
- Report OOM clearly:
Sandbox process killed by OOM (used 512MB of 512MB limit)
Exit code: 137 (OOM killed)
- Include memory stats: Show peak memory usage vs limit
- Suggest remediation: "Increase sandbox memory limit with --memory 1G"
- Structured output: In
--json mode, include OOM information:
{
"exit_code": 137,
"signal": "SIGKILL",
"oom_killed": true,
"memory_limit": "512MB",
"memory_peak": "512MB"
}
Actual Behavior
When OOM kills the sandboxed process:
- Cortex only reports the exit code (137)
- No distinction between OOM kill, user-initiated kill, or other SIGKILL sources
- No memory usage statistics provided
- User must manually check
dmesg or /var/log/kern.log to diagnose
--json output doesn't include any memory-related information
This is particularly problematic because:
- CI/CD systems need to distinguish OOM from other failures for automatic retry/resource-adjustment logic
- Users may waste time debugging code bugs when the issue is just insufficient memory allocation
- Memory leaks are hard to detect without seeing how close to the limit the process got
Additional Context
With cgroup v2 (the default on modern Linux), detecting OOM is straightforward:
# Before process start
cat /sys/fs/cgroup/sandbox-12345/memory.events
# oom_kill 0
# After OOM kill
cat /sys/fs/cgroup/sandbox-12345/memory.events
# oom_kill 1
# Memory at time of kill
cat /sys/fs/cgroup/sandbox-12345/memory.peak
# 536870912 (512MB)
Cortex should read these cgroup files after the sandboxed process exits to provide meaningful diagnostics. This is standard practice in container runtimes (Docker, Podman) and process supervisors (systemd).
The current behavior forces users to become Linux kernel experts just to understand why their sandbox process died, which defeats the purpose of having a user-friendly sandbox abstraction.
Project
cortex
Description
When a process running inside
cortex sandboxis killed by the Linux OOM (Out-of-Memory) killer, cortex reports a generic exit code and error message that doesn't indicate OOM was the cause. This makes debugging memory issues extremely difficult, as the user has no indication that memory exhaustion occurred.Error Message
Debug Logs
System Information
Screenshots
No response
Steps to Reproduce
Start a sandbox with default memory limits:
cortex sandbox linux -- python3 -c "x = [0] * (10**9)"Observe the exit code:
Check the system journal for OOM events (manual detective work):
Note that cortex provided no indication that OOM was the cause
Expected Behavior
Cortex should detect and report OOM kills explicitly:
memory.eventsafter process exit to check ifoom_killcounter increased--jsonmode, include OOM information:{ "exit_code": 137, "signal": "SIGKILL", "oom_killed": true, "memory_limit": "512MB", "memory_peak": "512MB" }Actual Behavior
When OOM kills the sandboxed process:
dmesgor/var/log/kern.logto diagnose--jsonoutput doesn't include any memory-related informationThis is particularly problematic because:
Additional Context
With cgroup v2 (the default on modern Linux), detecting OOM is straightforward:
Cortex should read these cgroup files after the sandboxed process exits to provide meaningful diagnostics. This is standard practice in container runtimes (Docker, Podman) and process supervisors (systemd).
The current behavior forces users to become Linux kernel experts just to understand why their sandbox process died, which defeats the purpose of having a user-friendly sandbox abstraction.