Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
| - qwen3.5-fp8-h200-sglang-mtp | ||
| description: | ||
| - "Add Qwen3.5-397B-A17B-FP8 H200 SGLang MTP (EAGLE speculative decoding)" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921 |
There was a problem hiding this comment.
🟡 Nit: The pr-link for the new qwen3.5-fp8-h200-sglang-mtp entry uses a placeholder /pull/XXX instead of /pull/921. Please update before merging.
Extended reasoning...
Bug Description
The new perf-changelog entry added at line 987 for qwen3.5-fp8-h200-sglang-mtp uses a placeholder PR link:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXinstead of the actual PR number:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921Code Path
The diff adds a new changelog block at the end of perf-changelog.yaml (lines 982-987). Every other entry in the file that was finalized has a concrete PR number in its pr-link field, making this an outlier that needs updating.
Pre-existing Context
There are several other pre-existing XXX placeholders in the file (e.g., for glm5-fp8-mi355x-sglang, dsr1-fp8-h200-sglang, minimaxm2.5-fp8-h200-vllm, qwen3.5-bf16-mi325x-sglang, qwen3.5-fp8-mi325x-sglang). However, those are from other PRs and outside the scope of this change. This PR should fix its own entry.
Impact
The impact is low — this is a metadata/documentation field, not functional code. The placeholder link would point to a nonexistent or incorrect pull request page, making it harder for someone reviewing the changelog to trace the entry back to its source PR.
Suggested Fix
Replace XXX with 921 on line 987:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/921Given that the PR title is [WIP], this is likely a known TODO that the author plans to fix before final merge. Flagging it here as a reminder.
There was a problem hiding this comment.
updated
| --speculative-algorithm EAGLE \ | ||
| --speculative-num-steps 2 \ | ||
| --speculative-num-draft-tokens 3 \ | ||
| --speculative-eagle-topk 1 \ |
There was a problem hiding this comment.
Nitpicking here, can you please set these values as env variables?
Refer 👇
InferenceX/benchmarks/single_node/qwen3.5_fp8_b200_mtp.sh
Lines 44 to 47 in 26d415b
|
|
||
| SERVER_LOG=/workspace/server.log | ||
| PORT=${PORT:-8888} | ||
| MAX_SEQ_LEN=$((ISL + OSL + 20)) |
There was a problem hiding this comment.
You can use MAX_MODEL_LEN here. This env is made available to the benchmark script just like TP, CONC, etc.
| --enable-flashinfer-allreduce-fusion \ | ||
| --max-running-requests 128 \ | ||
| --chunked-prefill-size 16384 \ | ||
| --decode-log-interval 1 \ |
There was a problem hiding this comment.
Are you sure we need --decode-log-interval flag when benchmarking?
No description provided.