Skip to content

Kernel panic (IOGPUMemory.cpp:550) on M4 Max with large context prefill (~173K tokens) #3186

@kotono-amaha

Description

@kotono-amaha

Apple Feedback ID: FB22091885

This issue has been filed with Apple and is cross-referenced here for the MLX community. A fix may come from either side.

Kernel panic in IOGPUMemory.cpp:550 triggered by large Metal GPU memory allocation during MLX inference on M4 Max.

PANIC STRING:
"completeMemory() prepare count underflow" @IOGPUMemory.cpp:550

SYSTEM:

  • Hardware: Apple M4 Max (36GB unified memory)
  • macOS: 26.3 (25D125)
  • Kernel: Darwin 25.3.0 xnu-12377.81.4~5/RELEASE_ARM64_T6041

REPRODUCIBLE: Yes — confirmed twice with identical call stacks.

REPRODUCTION STEPS:

  1. Install MLX and mlx-lm via pip on Python 3.14 ARM64
  2. Load a large quantized LLM (Qwen3.5-27B Q5_K_M) via mlx_vlm.load()
  3. Construct a prompt consisting of 147 concatenated model outputs totalling approximately 173,000 tokens
  4. Call mlx_vlm.generate() with this prompt — prefill phase begins processing the full context
  5. Kernel panics during prefill, consistently at IOGPUMemory.cpp:550

ROOT COMPONENT:
com.apple.iokit.IOGPUFamily (129.3.2)

NOTES:

  • Panic does not occur with smaller prompts (under ~10,000 tokens)
  • Memory capacity is not the issue — system has 36GB and model occupies ~26GB, leaving sufficient headroom
  • Issue appears to be a GPU memory accounting state corruption triggered
    by a single contiguous Metal allocation for a very large attention computation, not an out-of-memory condition
  • Two panic logs attached with identical backtraces confirming deterministic reproducibility

Suggested mitigation for MLX:
Add a prefill token count guard in mlx_lm before the Metal allocator is called. If the prompt exceeds a safe threshold (empirically somewhere
below 173K tokens on M4 Max with 36GB), either raise a clear exception with guidance to chunk the prompt, or automatically split the prefill
into safe-sized segments. This would prevent the IOGPUFamily kernel panic without requiring a macOS fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions