[NV] DSR1 FP8 GB200 Dynamo TRT by jthomson04 · Pull Request #617 · SemiAnalysisAI/InferenceX

jthomson04 · 2026-02-02T20:06:33Z

Summary

Add DeepSeek R1 FP8 GB200 Dynamo TRT-LLM disaggregated multinode configurations using srt-slurm recipe-based workflow.

Changes

Add dsr1-fp8-gb200-dynamo-trt config to nvidia-master.yaml
Add GB200 runner support for dsr1 model prefix in launch_gb200-nv.sh
Add perf-changelog entry

Config Details

Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
Runner: GB200 NVL72 (multinode, disaggregated)
Model: DeepSeek-R1-0528 (FP8)

ISL/OSL	Mode	Scenarios	Workers	Concurrency Range
1k1k	MTP (spec-decoding)	7	1P1D to 1P3D	9 - 4301
1k1k	STP (no spec-decoding)	7	1P1D to 1P3D	3 - 6144
1k8k	MTP	5	1P1D to 1P4D	4 - 8192
1k8k	STP	5	1P1D to 1P4D	4 - 8192
8k1k	MTP	7	1P1D to 5P1D	6 - 666
8k1k	STP	7	1P1D to 5P1D	6 - 1229

Total: 38 scenarios

Architecture Variants

DP Attention mode: Disaggregated prefill with EP=TP (8/16/32)
TEP mode: Tensor Expert Parallelism with multiple decode workers (1P3D, 1P4D configurations)
MTP: Multi-Token Prediction (spec-decoding enabled)
STP: Single-Token Prediction (standard decoding)

Signed-off-by: jthomson04 <jothomson@nvidia.com>

functionstackx · 2026-02-02T22:59:08Z

@claude please fix this, seems like typo in changelog.yaml

Traceback (most recent call last):
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 144, in <module>
    main()
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 73, in main
    changelog_data = yaml.safe_load(added_yaml)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 110, in compose_sequence_node
    while not self.check_event(SequenceEndEvent):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 392, in parse_block_sequence_entry
    raise ParserError("while parsing a block collection", self.marks[-1],
yaml.parser.ParserError: while parsing a block collection
  in "<unicode string>", line 3, column 3:
      - dsr1-fp8-gb200-dynamo-trt
      ^
expected <block end>, but found '?'
  in "<unicode string>", line 4, column 3:
      description:
      ^
Error: Process completed with exit code 1.

Klaud-Cold · 2026-02-02T22:59:44Z

Claude finished @functionstackx's task in 52s —— View job

Fixed YAML Syntax Error in perf-changelog.yaml ✓

Read perf-changelog.yaml to identify the YAML syntax error
Fix the YAML formatting issue
Commit and push the fix

Issue: Missing space after dash at line 329

- -config-keys:
-   - dsr1-fp8-gb200-dynamo-trt
+ - config-keys:
+     - dsr1-fp8-gb200-dynamo-trt

Fix pushed: 9cb5200

--- | Branch

Add missing space after dash in config-keys entry for dsr1-fp8-gb200-dynamo-trt. '-config-keys:' should be '- config-keys:' Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx · 2026-02-02T23:08:18Z

@claude please fix this too in the nvidia-masteer.yaml. seems like just an simple typo of model prefix. change to just dsr1 instead of dsr1-fp8

error log

+ export CONFIG_FILE=recipes/trtllm/gb200-fp8/1k1k/mtp/ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
+ CONFIG_FILE=recipes/trtllm/gb200-fp8/1k1k/mtp/ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
+ export IS_MULTINODE=true
+ IS_MULTINODE=true
+ bash ./runners/launch_gb200-nv.sh
+ [[ dynamo-trt == \d\y\n\a\m\o\-\s\g\l\a\n\g ]]
Unsupported model prefix: dsr1-fp8. Supported prefixes are: gptoss or dsr1
+ [[ dynamo-trt == \d\y\n\a\m\o\-\t\r\t ]]
+ [[ dsr1-fp8 == \g\p\t\o\s\s ]]
+ [[ dsr1-fp8 == \d\s\r\1 ]]
+ echo 'Unsupported model prefix: dsr1-fp8. Supported prefixes are: gptoss or dsr1'
+ exit 1
Error: Process completed with exit code 1.

Klaud-Cold · 2026-02-02T23:08:40Z

Claude finished @functionstackx's task in 47s —— View job

Fixed Model Prefix Typo in nvidia-master.yaml

Read nvidia-master.yaml to find the incorrect model prefix
Fix dsr1-fp8 to dsr1 in the config
Commit and push the fix

Issue: Invalid model-prefix at line 2297

- model-prefix: dsr1-fp8
+ model-prefix: dsr1

The runner script only supports gptoss or dsr1 as model prefixes, not dsr1-fp8.

Fix pushed: e5efea3

--- | | Branch

The model-prefix 'dsr1-fp8' is not a supported prefix. The supported prefixes are 'gptoss' and 'dsr1'. Changed to 'dsr1' to fix the launch_gb200-nv.sh runner error. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx · 2026-02-02T23:21:45Z

@jthomson04 can u take a look at this error related to the path of the model?

Traceback (most recent call last):
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/cli/do_sweep.py", line 276, in main
    runtime = RuntimeContext.from_config(config, job_id)
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/core/runtime.py", line 163, in from_config
    raise FileNotFoundError(f"Model path does not exist: {model_path}")
FileNotFoundError: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/dsr1-fp8

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21609573902/job/62274835308?pr=617

jthomson04 · 2026-02-02T23:29:30Z

@jthomson04 can u take a look at this error related to the path of the model?

Traceback (most recent call last):
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/cli/do_sweep.py", line 276, in main
    runtime = RuntimeContext.from_config(config, job_id)
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/core/runtime.py", line 163, in from_config
    raise FileNotFoundError(f"Model path does not exist: {model_path}")
FileNotFoundError: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/dsr1-fp8

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21609573902/job/62274835308?pr=617

Claude broke it. I have a fix locally, will push soon.

Signed-off-by: jthomson04 <jothomson@nvidia.com>

functionstackx · 2026-02-02T23:58:47Z

Claude broke it. I have a fix locally, will push soon.

mb lol

cquil11 · 2026-02-03T05:39:57Z

@jthomson04 claude not AGI confirmed

cquil11 · 2026-02-03T05:41:18Z

@claude please make the perf changelog description and PR description slightly more detailed. don't list every single thing per se, but give a high level of the scenarios added. be more detailed in the PR description (make a table of scenarios, something like https://github.com/InferenceMAX/InferenceMAX/pull/582 description)

cquil11 · 2026-02-03T05:41:25Z

@claude in .github/configs/nvidia-master.yaml, there are scenarios with corresponding links to external configs that contain more detailed information. can you please get the configs from the links and double check that the fields (such as prefill/decode ep/tp, num workers, etc) that are also present in this repo's config match?

Klaud-Cold · 2026-02-03T05:41:37Z

Claude finished @cquil11's task in 1m 29s —— View job

Updated PR and Changelog Descriptions ✓

Read PR [nv] add H200 SGLang disagg configs from srtslurm #582 to understand the expected description format
Read current perf-changelog.yaml to see the existing entry
Read nvidia-master.yaml to understand scenarios added
Update perf-changelog.yaml with more detailed description
Update PR description with detailed scenario table
Commit and push changes

Changes made:

Updated perf-changelog.yaml with detailed breakdown:
- Image version and source
- Scenario counts per ISL/OSL (1k1k, 1k8k, 8k1k)
- MTP vs STP breakdown
- Worker configuration ranges (1P-5P, 1D-4D)
Updated PR description with:
- Summary and changes section
- Scenario table showing ISL/OSL, mode, scenario counts, workers, and concurrency ranges
- Architecture variants explanation (DP attention, TEP, MTP, STP)

Commit: 05cde10

--- | Branch

Klaud-Cold · 2026-02-03T05:41:47Z

Claude finished @cquil11's task in 2m 34s —— View job

Verified External Config Links ✓

Read nvidia-master.yaml to find scenarios with external config links
Fetch external configs and extract relevant fields (38 configs found)
Compare fields (prefill/decode ep/tp, num workers, etc.) with local config
Report any mismatches found

Result: All 38 external configs match the local values.

I fetched each external config from ishandhanani/srt-slurm@sa-submission-q1-2026 and verified the following fields:

num-worker (prefill/decode) ↔ prefill_workers / decode_workers
tp ↔ tensor_parallel_size
ep ↔ moe_expert_parallel_size
dp-attn ↔ enable_attention_dp

Verified configs (click to expand)

1k1k MTP configs (7):

✓ ctx1_gen1_dep8_batch512_eplb0_mtp1_4301.yaml
✓ ctx1_gen1_dep8_batch256_eplb0_mtp1_2151.yaml
✓ ctx1_gen1_dep16_batch64_eplb0_mtp1_1229.yaml
✓ ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
✓ ctx1_gen3_tep8_batch8_eplb0_mtp3_36.yaml
✓ ctx1_gen3_tep8_batch4_eplb0_mtp3_18.yaml
✓ ctx1_gen3_tep8_batch2_eplb0_mtp3_9.yaml

1k1k STP configs (7):

✓ ctx1_gen1_dep8_batch768_eplb0_mtp0_6144.yaml
✓ ctx1_gen1_dep8_batch512_eplb0_mtp0_4301.yaml
✓ ctx1_gen1_dep16_batch128_eplb0_mtp0_2151.yaml
✓ ctx1_gen1_dep32_batch32_eplb0_mtp0_1127.yaml
✓ ctx1_gen1_dep32_batch8_eplb0_mtp0_256.yaml
✓ ctx1_gen3_tep8_batch8_eplb0_mtp0_27.yaml
✓ ctx1_gen3_tep8_batch1_eplb0_mtp0_3.yaml

1k8k MTP configs (5):

✓ ctx1_gen2_dep16_batch256_eplb0_mtp1_8192.yaml
✓ ctx1_gen2_dep16_batch64_eplb0_mtp1_2152.yaml
✓ ctx1_gen1_dep32_batch16_eplb0_mtp3_564.yaml
✓ ctx1_gen1_dep32_batch2_eplb0_mtp3_72.yaml
✓ ctx1_gen4_tep8_batch2_eplb0_mtp3_8.yaml

1k8k STP configs (5):

✓ ctx1_gen2_dep16_batch256_eplb0_mtp0_8192.yaml
✓ ctx1_gen1_dep32_batch64_eplb0_mtp0_2048.yaml
✓ ctx1_gen1_dep32_batch16_eplb0_mtp0_564.yaml
✓ ctx1_gen4_tep8_batch8_eplb0_mtp0_36.yaml
✓ ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml

8k1k MTP configs (7):

✓ ctx3_gen1_dep8_batch64_eplb0_mtp3_666.yaml
✓ ctx5_gen1_dep16_batch32_eplb0_mtp3_666.yaml
✓ ctx3_gen1_dep16_batch16_eplb0_mtp3_333.yaml
✓ ctx4_gen1_dep32_batch8_eplb0_mtp3_333.yaml
✓ ctx2_gen1_dep32_batch2_eplb0_mtp3_90.yaml
✓ ctx1_gen3_tep8_batch4_eplb0_mtp3_15.yaml
✓ ctx1_gen3_tep8_batch2_eplb0_mtp3_6.yaml

8k1k STP configs (7):

✓ ctx5_gen1_dep16_batch64_eplb0_mtp0_1229.yaml
✓ ctx4_gen1_dep32_batch16_eplb0_mtp0_666.yaml
✓ ctx3_gen1_dep16_batch32_eplb0_mtp0_615.yaml
✓ ctx2_gen1_dep32_batch8_eplb0_mtp0_333.yaml
✓ ctx1_gen3_tep8_batch16_eplb0_mtp0_63.yaml
✓ ctx1_gen3_tep8_batch4_eplb0_mtp0_18.yaml
✓ ctx1_gen3_tep8_batch1_eplb0_mtp0_6.yaml

--- | Branch

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11

lgtm once https://github.com/InferenceMAX/InferenceMAX/actions/runs/21610819786 passes
ty!

Signed-off-by: jthomson04 <jothomson@nvidia.com>

kedarpotdar-nv · 2026-02-03T19:58:01Z

looking good to merge

…amo-trt

initial configs

1e13372

Signed-off-by: jthomson04 <jothomson@nvidia.com>

github-project-automation bot added this to InferenceMAX Board Feb 2, 2026

update perf changelog

e663d90

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 marked this pull request as ready for review February 2, 2026 20:09

jthomson04 requested a review from a team as a code owner February 2, 2026 20:09

functionstackx added the sweep-enabled label Feb 2, 2026

Fix YAML syntax error in perf-changelog.yaml

9cb5200

Add missing space after dash in config-keys entry for dsr1-fp8-gb200-dynamo-trt. '-config-keys:' should be '- config-keys:' Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

Fix model-prefix typo: dsr1-fp8 -> dsr1

e5efea3

The model-prefix 'dsr1-fp8' is not a supported prefix. The supported prefixes are 'gptoss' and 'dsr1'. Changed to 'dsr1' to fix the launch_gb200-nv.sh runner error. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

jthomson04 added 2 commits February 2, 2026 15:50

fix gb200

ce38029

Signed-off-by: jthomson04 <jothomson@nvidia.com>

fix nvidia-master.yaml

69d903a

Signed-off-by: jthomson04 <jothomson@nvidia.com>

cquil11 removed the sweep-enabled label Feb 3, 2026

Update perf-changelog with detailed DSR1 FP8 GB200 scenario descriptions

05cde10

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11 approved these changes Feb 3, 2026

View reviewed changes

update dsr1 fp8 path

d9e73bd

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Merge remote-tracking branch 'public/main' into nv/dsr1-fp8-gb200-dyn…

573bee2

…amo-trt

jthomson04 merged commit daa8239 into main Feb 3, 2026
13 checks passed

jthomson04 deleted the nv/dsr1-fp8-gb200-dynamo-trt branch February 3, 2026 20:27

github-project-automation bot moved this to Done in InferenceMAX Board Feb 3, 2026

Conversation

jthomson04 commented Feb 2, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Config Details

Architecture Variants

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

Klaud-Cold commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed YAML Syntax Error in perf-changelog.yaml ✓

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

Klaud-Cold commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixed Model Prefix Typo in nvidia-master.yaml

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

jthomson04 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

Klaud-Cold commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updated PR and Changelog Descriptions ✓

Uh oh!

Klaud-Cold commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verified External Config Links ✓

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

kedarpotdar-nv commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jthomson04 commented Feb 2, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Feb 2, 2026 •

edited

Loading

Klaud-Cold commented Feb 2, 2026 •

edited

Loading

jthomson04 commented Feb 2, 2026 •

edited

Loading

Klaud-Cold commented Feb 3, 2026 •

edited

Loading

Klaud-Cold commented Feb 3, 2026 •

edited

Loading