Skip to content

[nv] - h200 sglang disagg#580

Merged
cquil11 merged 1 commit intonv/dsr1-fp8-h200-dynamo-trtllm-260126from
nv/h200-sglang-disagg
Jan 27, 2026
Merged

[nv] - h200 sglang disagg#580
cquil11 merged 1 commit intonv/dsr1-fp8-h200-dynamo-trtllm-260126from
nv/h200-sglang-disagg

Conversation

@ishandhanani
Copy link
Collaborator

@ishandhanani ishandhanani commented Jan 27, 2026

Add H200 SGLang disaggregated multinode configurations, sourced from srtslurm recipes.

Depends on #570

Changes

  • Add dsr1-fp8-h200-dynamo-sglang config to nvidia-master.yaml
  • 1k1k: aggregated, low-latency (1P9D), high-throughput TEP/DEP (1P6D)
  • 8k1k: aggregated, TEP variants (1P7D, 1P6D, 1P3D, 2P3D), DEP (1P1D)
  • Add perf-changelog entry
  • Document recipe registration from srtslurm in AGENT.md

- Add dsr1-fp8-h200-dynamo-sglang config to nvidia-master.yaml
- Include 1k1k configs: aggregated, low-latency (1P9D), high-throughput TEP/DEP (1P6D)
- Include 8k1k configs: aggregated, TEP variants (1P7D, 1P6D, 1P3D, 2P3D), DEP (1P1D)
- Add perf-changelog entry for new configuration
- Document recipe registration process in AGENT.md
@ishandhanani ishandhanani requested a review from a team as a code owner January 27, 2026 19:33
@ishandhanani ishandhanani changed the base branch from main to nv/dsr1-fp8-h200-dynamo-trtllm-260126 January 27, 2026 19:33
Copy link
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice nice. lgtm
thx @ishandhanani

will wait for test sweep to pass as smoke test

Copy link
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will wait for prerequisite pr to merge as states in desc

@cquil11
Copy link
Collaborator

cquil11 commented Jan 27, 2026

nvm ur merging into prereq branch

@cquil11 cquil11 merged commit f6609d9 into nv/dsr1-fp8-h200-dynamo-trtllm-260126 Jan 27, 2026
@cquil11 cquil11 deleted the nv/h200-sglang-disagg branch January 27, 2026 19:41
@ishandhanani
Copy link
Collaborator Author

why merge this one?

@cquil11
Copy link
Collaborator

cquil11 commented Jan 27, 2026

u were targeting https://github.com/InferenceMAX/InferenceMAX/tree/nv/dsr1-fp8-h200-dynamo-trtllm-260126 as your base, so I thought u wanted to merge into that one to merge them to main altogether.

I can revert this one, and u merge into main later?
my bad for misunderstanding

@cquil11 cquil11 restored the nv/h200-sglang-disagg branch January 27, 2026 19:49
cquil11 added a commit that referenced this pull request Jan 27, 2026
cquil11 added a commit that referenced this pull request Jan 27, 2026
@ishandhanani
Copy link
Collaborator Author

ishandhanani commented Jan 27, 2026

Yea. Lets get Nir's PR in first and then get this one merged in afterwards. The base will change when Nir's goes into main (should be soon).

I will reopen

cquil11 added a commit that referenced this pull request Jan 29, 2026
* add h200 srtslurm setup

* fix slurm parameters

* refactor logs

* fix log path

* refactor

* fix slurm output

* fix

* fix

* Add H200 dynamo-trt disaggregated configs for 1k1k and 8k1k

Expand dsr1-fp8-h200-dynamo-trt section with full configuration set:
- 1k1k MTP configs (c4-c512) with CONFIG_FILE references
- 1k1k STP configs (c4-c512) with CONFIG_FILE references
- 8k1k MTP configs (c4-c512) with CONFIG_FILE references
- 8k1k STP configs (c4-c512) with CONFIG_FILE references

All configs reference recipe YAMLs in srt-slurm-trtllm repo under
recipies/trtllm/h200/{1k1k,8k1k}/{mtp,stp}/

* add uv and sqsh file

* Update H200 dynamo-trt container image to nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.8.0

* Fix H200 CONFIG_FILE references to match corrected recipe filenames

* Fix H200 dp-attn values to match recipe filenames

dep8 = enable_attention_dp: true = dp-attn: true
tep8 = enable_attention_dp: false = dp-attn: false

* Fix H200 model path to use DeepSeek-R1-0528

Update MODEL_PATH from /models/dsr1-fp8 (old DeepSeek-R1) to
/models/DeepSeek-R1-0528 (new version matching nvidia-master.yaml)

* add uv install and image fix

* fix squash file path

* fix srt-slurm repo

* update h200 runner details

* Add perf-changelog entry for dsr1-fp8-h200-dynamo-trt config

* Add H200 sglang disagg configs from srtslurm (#580)

- Add dsr1-fp8-h200-dynamo-sglang config to nvidia-master.yaml
- Include 1k1k configs: aggregated, low-latency (1P9D), high-throughput TEP/DEP (1P6D)
- Include 8k1k configs: aggregated, TEP variants (1P7D, 1P6D, 1P3D, 2P3D), DEP (1P1D)
- Add perf-changelog entry for new configuration
- Document recipe registration process in AGENT.md

* Revert "Add H200 sglang disagg configs from srtslurm (#580)" (#581)

This reverts commit f6609d9.

* recipies -> recipes

* Update dsr1-fp8-h200-dynamo-trt image to 0.8.1.post1

* Add dynamic container mapping for srtslurm.yaml

- Update SQUASH_FILE to use /data/containers/ with + separators
- Strip nvcr.io/ prefix from path to match actual .sqsh filenames
- Add CONTAINER_KEY to convert IMAGE to srt-slurm format (nvcr.io#)
- Map container key to .sqsh path dynamically in srtslurm.yaml

* Pin srt-slurm to sa-submission-q1-2026 branch

Use the release branch for Q1 2026 submission instead of main.

* Add srt-slurm GitHub URLs above h200 CONFIG_FILE entries

Link each CONFIG_FILE to its source in srt-slurm sa-submission-q1-2026 branch.

* Update perf-changelog.yaml to modify DSR1 configurations

Removed outdated DSR1 FP8 H200 Dynamo TRT configuration details and re-added them in a new section.

---------

Co-authored-by: Sahithi Chigurupati <schigurupati@nvidia.com>
Co-authored-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Co-authored-by: Cameron Quilici <cjquilici@gmail.com>
Co-authored-by: functionstackx <47992694+functionstackx@users.noreply.github.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants