Kernel panic (IOGPUMemory.cpp:550) on M4 Max with large context prefill (~173K tokens)

**Apple Feedback ID**: FB22091885

This issue has been filed with Apple and is cross-referenced here for the MLX community. A fix may come from either side.

Kernel panic in IOGPUMemory.cpp:550 triggered by large Metal GPU memory allocation during MLX inference on M4 Max.

**PANIC STRING:**
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550

**SYSTEM:**
- Hardware: Apple M4 Max (36GB unified memory)
- macOS: 26.3 (25D125)
- Kernel: Darwin 25.3.0 xnu-12377.81.4~5/RELEASE_ARM64_T6041

**REPRODUCIBLE:** Yes — confirmed twice with identical call stacks.

**REPRODUCTION STEPS:**
1. Install MLX and mlx-lm via pip on Python 3.14 ARM64
2. Load a large quantized LLM (Qwen3.5-27B Q5_K_M) via mlx_vlm.load()
3. Construct a prompt consisting of 147 concatenated model outputs totalling approximately 173,000 tokens
4. Call mlx_vlm.generate() with this prompt — prefill phase begins processing the full context
5. Kernel panics during prefill, consistently at IOGPUMemory.cpp:550

**ROOT COMPONENT:**
com.apple.iokit.IOGPUFamily (129.3.2)

**NOTES:**
- Panic does not occur with smaller prompts (under ~10,000 tokens)
- Memory capacity is not the issue — system has 36GB and model occupies  ~26GB, leaving sufficient headroom
- Issue appears to be a GPU memory accounting state corruption triggered 
  by a single contiguous Metal allocation for a very large attention  computation, not an out-of-memory condition
- Two panic logs attached with identical backtraces confirming  deterministic reproducibility

**Suggested mitigation for MLX:**
Add a prefill token count guard in mlx_lm before the Metal allocator is called. If the prompt exceeds a safe threshold (empirically somewhere 
below 173K tokens on M4 Max with 36GB), either raise a clear exception with guidance to chunk the prompt, or automatically split the prefill 
into safe-sized segments. This would prevent the IOGPUFamily kernel panic without requiring a macOS fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel panic (IOGPUMemory.cpp:550) on M4 Max with large context prefill (~173K tokens) #3186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernel panic (IOGPUMemory.cpp:550) on M4 Max with large context prefill (~173K tokens) #3186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions