Skip to content

Add bench rows for qwen36-27b-mtp (cpu, cuda-hybrid) and establish perf baseline #28

Description

@pekkah

Background

This session (under #25) landed support for Qwen3.6-27B-MTP on both CPU and CUDA hybrid backends, with the following measured baselines on RTX 4070 Ti 12 GB:

Backend Decode t/s Notes
CPU 3.1 All FFN + GDN on CPU
CUDA hybrid (no FFN-on-GPU) 4.0 Pre-exact-size, all FFN on CPU
CUDA hybrid (21/64 FFN-on-GPU, exact-size) 6.3 Current best on 12 GB

These numbers exist only in the design doc (`docs/qwen35moe-plan.md` Phase 11) and won't be tracked across future changes without bench rows.

Scope

  1. Add `qwen36-27b-mtp` rows to `scripts/bench-all.ps1` mirroring the existing `qwen36` pattern (`bench-all.ps1:34-35`):
    ```powershell
    $results += .\scripts\bench-textgen.ps1 -Model $qwen36_27b_mtp -Tag "qwen36-27b-mtp-cpu" -NTokens $NTokens -Prompt $Prompt -TimeoutSec 600
    $results += .\scripts\bench-textgen.ps1 -Model $qwen36_27b_mtp -Tag "qwen36-27b-mtp-cuda-hybrid" -NTokens $NTokens -Prompt $Prompt -TimeoutSec 600 -ExtraArgs @("-g","-1","--backend","cuda")
    ```
  2. Define `$qwen36_27b_mtp` path at the top of `bench-all.ps1` (likely `models/Qwen3.6-27B-MTP-Q4_K_M.gguf` per the download script entry added under MTP / qwen3_next_mtp self-speculation for Qwen3.6 hybrid GDN #25).
  3. Update README perf table once MTP / qwen3_next_mtp self-speculation for Qwen3.6 hybrid GDN #25 lands MTP self-speculation so we have an apples-to-apples "+MTP / -MTP" comparison row.

Out of scope (separate issues)

Verification

  • pwsh scripts/bench-all.ps1 runs to completion without errors, prints the new rows, and the cuda-hybrid number is within ±10 % of the 6.3 t/s baseline above (allowing for run-to-run variance and the bench's prompt/length).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions