Skip to content

Implement MROPE / IMROPE / conditional-GLM4 rope variants #9

Description

@pekkah

Background

PR #7 added two RoPE conventions: interleaved (LLaMA, default) and NEOX (Qwen/Phi/Gemma/Falcon family). `llama.cpp`'s `llama_model_rope_type()` lists three more variants that currently fall through to interleaved in SharpInference and would produce wrong outputs:

Variant Architectures (per llama.cpp)
MROPE (multi-axis) QWEN2VL, PADDLEOCR
IMROPE (interleaved multi-axis) QWEN3VL, QWEN3VLMOE, QWEN35, QWEN35MOE
Conditional GLM4 (NORM or MROPE depending on `hparams.use_mrope`); GLM4_MOE (NEOX or MROPE)

Where to wire them in

`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata`, after the `isNeoxRope` switch. There is a comment in place noting these are not implemented.

Each variant needs:

  • A `ModelHyperparams` flag (e.g., `RopeType` enum instead of just `IsNeoxRope`) — current bool will need to become an enum.
  • A new SIMD kernel in `SimdKernels.cs`.
  • A new Vulkan shader in `Shaders.cs` + pipeline in `VulkanBackend.cs`.
  • Dispatch wiring in `ForwardPass.ApplyRope`, `GpuForwardPass`, `HybridForwardPass`.

Priority

Low until/unless we want to support Qwen2VL or Qwen3-VL series. No model in current focus uses these.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions