Background
PR #7 added two RoPE conventions: interleaved (LLaMA, default) and NEOX (Qwen/Phi/Gemma/Falcon family). `llama.cpp`'s `llama_model_rope_type()` lists three more variants that currently fall through to interleaved in SharpInference and would produce wrong outputs:
| Variant |
Architectures (per llama.cpp) |
| MROPE (multi-axis) |
QWEN2VL, PADDLEOCR |
| IMROPE (interleaved multi-axis) |
QWEN3VL, QWEN3VLMOE, QWEN35, QWEN35MOE |
| Conditional |
GLM4 (NORM or MROPE depending on `hparams.use_mrope`); GLM4_MOE (NEOX or MROPE) |
Where to wire them in
`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata`, after the `isNeoxRope` switch. There is a comment in place noting these are not implemented.
Each variant needs:
- A `ModelHyperparams` flag (e.g., `RopeType` enum instead of just `IsNeoxRope`) — current bool will need to become an enum.
- A new SIMD kernel in `SimdKernels.cs`.
- A new Vulkan shader in `Shaders.cs` + pipeline in `VulkanBackend.cs`.
- Dispatch wiring in `ForwardPass.ApplyRope`, `GpuForwardPass`, `HybridForwardPass`.
Priority
Low until/unless we want to support Qwen2VL or Qwen3-VL series. No model in current focus uses these.
Background
PR #7 added two RoPE conventions: interleaved (LLaMA, default) and NEOX (Qwen/Phi/Gemma/Falcon family). `llama.cpp`'s `llama_model_rope_type()` lists three more variants that currently fall through to interleaved in SharpInference and would produce wrong outputs:
Where to wire them in
`src/SharpInference.Core/ModelGraph.cs` — `FromGgufMetadata`, after the `isNeoxRope` switch. There is a comment in place noting these are not implemented.
Each variant needs:
Priority
Low until/unless we want to support Qwen2VL or Qwen3-VL series. No model in current focus uses these.