[WIP] Add Qwen3.5 h200 MTP by hshrivastava-droid · Pull Request #921 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-03-20T01:57:43Z

No description provided.

github-actions · 2026-03-20T01:57:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-20T01:57:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-03-20T01:57:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh

claude · 2026-03-20T02:21:13Z

perf-changelog.yaml

+    - qwen3.5-fp8-h200-sglang-mtp
+  description:
+    - "Add Qwen3.5-397B-A17B-FP8 H200 SGLang MTP (EAGLE speculative decoding)"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921


🟡 Nit: The pr-link for the new qwen3.5-fp8-h200-sglang-mtp entry uses a placeholder /pull/XXX instead of /pull/921. Please update before merging.

Extended reasoning...

Bug Description

The new perf-changelog entry added at line 987 for qwen3.5-fp8-h200-sglang-mtp uses a placeholder PR link:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

instead of the actual PR number:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921

Code Path

The diff adds a new changelog block at the end of perf-changelog.yaml (lines 982-987). Every other entry in the file that was finalized has a concrete PR number in its pr-link field, making this an outlier that needs updating.

Pre-existing Context

There are several other pre-existing XXX placeholders in the file (e.g., for glm5-fp8-mi355x-sglang, dsr1-fp8-h200-sglang, minimaxm2.5-fp8-h200-vllm, qwen3.5-bf16-mi325x-sglang, qwen3.5-fp8-mi325x-sglang). However, those are from other PRs and outside the scope of this change. This PR should fix its own entry.

Impact

The impact is low — this is a metadata/documentation field, not functional code. The placeholder link would point to a nonexistent or incorrect pull request page, making it harder for someone reviewing the changelog to trace the entry back to its source PR.

Suggested Fix

Replace XXX with 921 on line 987:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921

Given that the PR title is [WIP], this is likely a known TODO that the author plans to fix before final merge. Flagging it here as a reminder.

ankursingh-nv · 2026-03-24T00:02:15Z

benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh

+  --speculative-algorithm EAGLE \
+  --speculative-num-steps 2 \
+  --speculative-num-draft-tokens 3 \
+  --speculative-eagle-topk 1 \


Nitpicking here, can you please set these values as env variables?
Refer 👇

InferenceX/benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh

Lines 44 to 47 in 26d415b

# MTP (Multi-Token Prediction) Config - EAGLE speculative decoding

SPECULATIVE_NUM_STEPS=3

SPECULATIVE_DRAFT_TOKENS=4

SPECULATIVE_EAGLE_TOPK=1

ankursingh-nv · 2026-03-24T00:04:42Z

benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh

+
+SERVER_LOG=/workspace/server.log
+PORT=${PORT:-8888}
+MAX_SEQ_LEN=$((ISL + OSL + 20))


You can use MAX_MODEL_LEN here. This env is made available to the benchmark script just like TP, CONC, etc.

ankursingh-nv · 2026-03-24T00:07:45Z

benchmarks/single_node/qwen3.5_fp8_h200_mtp.sh

+  --enable-flashinfer-allreduce-fusion \
+  --max-running-requests 128 \
+  --chunked-prefill-size 16384 \
+  --decode-log-interval 1 \


Are you sure we need --decode-log-interval flag when benchmarking?

Add Qwen3.5 h200 MTP

734c1ba

hshrivastava-droid requested a review from a team March 20, 2026 01:57

hshrivastava-droid requested review from ankursingh-nv, jgangani and kedarpotdar-nv as code owners March 20, 2026 01:57

github-project-automation bot added this to InferenceMAX Board Mar 20, 2026

extend conc

3d80687

hshrivastava-droid added NVIDIA sweep-enabled labels Mar 20, 2026

claude bot reviewed Mar 20, 2026

View reviewed changes

hshrivastava-droid added 10 commits March 19, 2026 21:48

adding flag

96a8f02

Merge branch 'main' into nv/h200-qwen35

db6e75d

Update perf-changelog.yaml

0d13e55

add new line

0ce06cf

Update perf-changelog.yaml

5dd2308

add new line

14a584f

Update perf-changelog.yaml

55a6d03

fix: perf bug

2e489b9

fix:perf

cbe069d

Delete docs/accuracy_evals_slides.html

250e850

ankursingh-nv reviewed Mar 24, 2026

View reviewed changes

hshrivastava-droid added 2 commits March 23, 2026 22:36

change env variable

af9bda1

fix: max seq len

571512b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Qwen3.5 h200 MTP#921

[WIP] Add Qwen3.5 h200 MTP#921
hshrivastava-droid wants to merge 14 commits intomainfrom
nv/h200-qwen35

hshrivastava-droid commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Uh oh!

claude bot Mar 20, 2026

Uh oh!

hshrivastava-droid Mar 23, 2026

Uh oh!

ankursingh-nv Mar 24, 2026

Uh oh!

ankursingh-nv Mar 24, 2026 •

edited

Loading

Uh oh!

ankursingh-nv Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# MTP (Multi-Token Prediction) Config - EAGLE speculative decoding
	SPECULATIVE_NUM_STEPS=3
	SPECULATIVE_DRAFT_TOKENS=4
	SPECULATIVE_EAGLE_TOPK=1

Conversation

hshrivastava-droid commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

Uh oh!

claude bot Mar 20, 2026

Choose a reason for hiding this comment

Bug Description

Code Path

Pre-existing Context

Impact

Suggested Fix

Uh oh!

hshrivastava-droid Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ankursingh-nv Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

ankursingh-nv Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankursingh-nv Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ankursingh-nv Mar 24, 2026 •

edited

Loading

ankursingh-nv Mar 24, 2026 •

edited

Loading