Skip to content

perf(mtp): batched-verify logits-buffer reuse + PromptLookup incremental index (#209 items 5,6) #291

@pekkah

Description

@pekkah

Follow-up to #209 (work items #5 + #6 — the low-leverage allocation/drafting micro-opts). #209's #1 shipped in #287.

Item #5 — batched-verify logits buffer churn

Every BatchVerify returns k fresh vocab-sized float[] (~600 KB each → LOH) per step, and EnsureBatchVerifyScratch is exact-size, so --draft-lookup's varying proposal lengths realloc device+host buffers most steps. Reuse per-k cached buffers / return views.

Item #6 — PromptLookupDraft incremental n-gram index

PromptLookupDraft.Propose linearly scans the whole history per step (≥0.1–0.5 ms at 16–32K ctx on the no-match path — exactly the "floor = baseline" case). Replace with a llama.cpp-style last-occurrence map updated O(1) in Append.

Both are correctness-neutral micro-opts; grouped because each is small. Note PromptLookupDraft currently has no production engine caller (it's the --draft-lookup path only), so #6 is lowest priority.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions