[nv] - h200 sglang disagg#580
Merged
cquil11 merged 1 commit intonv/dsr1-fp8-h200-dynamo-trtllm-260126from Jan 27, 2026
Merged
Conversation
- Add dsr1-fp8-h200-dynamo-sglang config to nvidia-master.yaml - Include 1k1k configs: aggregated, low-latency (1P9D), high-throughput TEP/DEP (1P6D) - Include 8k1k configs: aggregated, TEP variants (1P7D, 1P6D, 1P3D, 2P3D), DEP (1P1D) - Add perf-changelog entry for new configuration - Document recipe registration process in AGENT.md
cquil11
approved these changes
Jan 27, 2026
Collaborator
There was a problem hiding this comment.
nice nice. lgtm
thx @ishandhanani
will wait for test sweep to pass as smoke test
cquil11
requested changes
Jan 27, 2026
Collaborator
cquil11
left a comment
There was a problem hiding this comment.
will wait for prerequisite pr to merge as states in desc
Collaborator
|
nvm ur merging into prereq branch |
Collaborator
Author
|
why merge this one? |
Collaborator
|
u were targeting https://github.com/InferenceMAX/InferenceMAX/tree/nv/dsr1-fp8-h200-dynamo-trtllm-260126 as your base, so I thought u wanted to merge into that one to merge them to main altogether. I can revert this one, and u merge into main later? |
cquil11
added a commit
that referenced
this pull request
Jan 27, 2026
Collaborator
Author
|
Yea. Lets get Nir's PR in first and then get this one merged in afterwards. The base will change when Nir's goes into main (should be soon). I will reopen |
cquil11
added a commit
that referenced
this pull request
Jan 29, 2026
* add h200 srtslurm setup
* fix slurm parameters
* refactor logs
* fix log path
* refactor
* fix slurm output
* fix
* fix
* Add H200 dynamo-trt disaggregated configs for 1k1k and 8k1k
Expand dsr1-fp8-h200-dynamo-trt section with full configuration set:
- 1k1k MTP configs (c4-c512) with CONFIG_FILE references
- 1k1k STP configs (c4-c512) with CONFIG_FILE references
- 8k1k MTP configs (c4-c512) with CONFIG_FILE references
- 8k1k STP configs (c4-c512) with CONFIG_FILE references
All configs reference recipe YAMLs in srt-slurm-trtllm repo under
recipies/trtllm/h200/{1k1k,8k1k}/{mtp,stp}/
* add uv and sqsh file
* Update H200 dynamo-trt container image to nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.8.0
* Fix H200 CONFIG_FILE references to match corrected recipe filenames
* Fix H200 dp-attn values to match recipe filenames
dep8 = enable_attention_dp: true = dp-attn: true
tep8 = enable_attention_dp: false = dp-attn: false
* Fix H200 model path to use DeepSeek-R1-0528
Update MODEL_PATH from /models/dsr1-fp8 (old DeepSeek-R1) to
/models/DeepSeek-R1-0528 (new version matching nvidia-master.yaml)
* add uv install and image fix
* fix squash file path
* fix srt-slurm repo
* update h200 runner details
* Add perf-changelog entry for dsr1-fp8-h200-dynamo-trt config
* Add H200 sglang disagg configs from srtslurm (#580)
- Add dsr1-fp8-h200-dynamo-sglang config to nvidia-master.yaml
- Include 1k1k configs: aggregated, low-latency (1P9D), high-throughput TEP/DEP (1P6D)
- Include 8k1k configs: aggregated, TEP variants (1P7D, 1P6D, 1P3D, 2P3D), DEP (1P1D)
- Add perf-changelog entry for new configuration
- Document recipe registration process in AGENT.md
* Revert "Add H200 sglang disagg configs from srtslurm (#580)" (#581)
This reverts commit f6609d9.
* recipies -> recipes
* Update dsr1-fp8-h200-dynamo-trt image to 0.8.1.post1
* Add dynamic container mapping for srtslurm.yaml
- Update SQUASH_FILE to use /data/containers/ with + separators
- Strip nvcr.io/ prefix from path to match actual .sqsh filenames
- Add CONTAINER_KEY to convert IMAGE to srt-slurm format (nvcr.io#)
- Map container key to .sqsh path dynamically in srtslurm.yaml
* Pin srt-slurm to sa-submission-q1-2026 branch
Use the release branch for Q1 2026 submission instead of main.
* Add srt-slurm GitHub URLs above h200 CONFIG_FILE entries
Link each CONFIG_FILE to its source in srt-slurm sa-submission-q1-2026 branch.
* Update perf-changelog.yaml to modify DSR1 configurations
Removed outdated DSR1 FP8 H200 Dynamo TRT configuration details and re-added them in a new section.
---------
Co-authored-by: Sahithi Chigurupati <schigurupati@nvidia.com>
Co-authored-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: Cameron Quilici <cjquilici@gmail.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add H200 SGLang disaggregated multinode configurations, sourced from srtslurm recipes.
Depends on #570
Changes