Skip to content

[NV] DSR1 FP8 GB200 Dynamo TRT#617

Merged
jthomson04 merged 9 commits intomainfrom
nv/dsr1-fp8-gb200-dynamo-trt
Feb 3, 2026
Merged

[NV] DSR1 FP8 GB200 Dynamo TRT#617
jthomson04 merged 9 commits intomainfrom
nv/dsr1-fp8-gb200-dynamo-trt

Conversation

@jthomson04
Copy link
Copy Markdown
Collaborator

@jthomson04 jthomson04 commented Feb 2, 2026

Summary

Add DeepSeek R1 FP8 GB200 Dynamo TRT-LLM disaggregated multinode configurations using srt-slurm recipe-based workflow.

Changes

  • Add dsr1-fp8-gb200-dynamo-trt config to nvidia-master.yaml
  • Add GB200 runner support for dsr1 model prefix in launch_gb200-nv.sh
  • Add perf-changelog entry

Config Details

Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
Runner: GB200 NVL72 (multinode, disaggregated)
Model: DeepSeek-R1-0528 (FP8)

ISL/OSL Mode Scenarios Workers Concurrency Range
1k1k MTP (spec-decoding) 7 1P1D to 1P3D 9 - 4301
1k1k STP (no spec-decoding) 7 1P1D to 1P3D 3 - 6144
1k8k MTP 5 1P1D to 1P4D 4 - 8192
1k8k STP 5 1P1D to 1P4D 4 - 8192
8k1k MTP 7 1P1D to 5P1D 6 - 666
8k1k STP 7 1P1D to 5P1D 6 - 1229

Total: 38 scenarios

Architecture Variants

  • DP Attention mode: Disaggregated prefill with EP=TP (8/16/32)
  • TEP mode: Tensor Expert Parallelism with multiple decode workers (1P3D, 1P4D configurations)
  • MTP: Multi-Token Prediction (spec-decoding enabled)
  • STP: Single-Token Prediction (standard decoding)

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
@jthomson04 jthomson04 marked this pull request as ready for review February 2, 2026 20:09
@jthomson04 jthomson04 requested a review from a team as a code owner February 2, 2026 20:09
@functionstackx
Copy link
Copy Markdown
Contributor

@claude please fix this, seems like typo in changelog.yaml

Traceback (most recent call last):
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 144, in <module>
    main()
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 73, in main
    changelog_data = yaml.safe_load(added_yaml)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 110, in compose_sequence_node
    while not self.check_event(SequenceEndEvent):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 392, in parse_block_sequence_entry
    raise ParserError("while parsing a block collection", self.marks[-1],
yaml.parser.ParserError: while parsing a block collection
  in "<unicode string>", line 3, column 3:
      - dsr1-fp8-gb200-dynamo-trt
      ^
expected <block end>, but found '?'
  in "<unicode string>", line 4, column 3:
      description:
      ^
Error: Process completed with exit code 1.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 2, 2026

Claude finished @functionstackx's task in 52s —— View job


Fixed YAML Syntax Error in perf-changelog.yaml ✓

  • Read perf-changelog.yaml to identify the YAML syntax error
  • Fix the YAML formatting issue
  • Commit and push the fix

Issue: Missing space after dash at line 329

- -config-keys:
-   - dsr1-fp8-gb200-dynamo-trt
+ - config-keys:
+     - dsr1-fp8-gb200-dynamo-trt

Fix pushed: 9cb5200

--- | Branch

Add missing space after dash in config-keys entry for dsr1-fp8-gb200-dynamo-trt.
'-config-keys:' should be '- config-keys:'

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx
Copy link
Copy Markdown
Contributor

@claude please fix this too in the nvidia-masteer.yaml. seems like just an simple typo of model prefix. change to just dsr1 instead of dsr1-fp8

error log

+ export CONFIG_FILE=recipes/trtllm/gb200-fp8/1k1k/mtp/ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
+ CONFIG_FILE=recipes/trtllm/gb200-fp8/1k1k/mtp/ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
+ export IS_MULTINODE=true
+ IS_MULTINODE=true
+ bash ./runners/launch_gb200-nv.sh
+ [[ dynamo-trt == \d\y\n\a\m\o\-\s\g\l\a\n\g ]]
Unsupported model prefix: dsr1-fp8. Supported prefixes are: gptoss or dsr1
+ [[ dynamo-trt == \d\y\n\a\m\o\-\t\r\t ]]
+ [[ dsr1-fp8 == \g\p\t\o\s\s ]]
+ [[ dsr1-fp8 == \d\s\r\1 ]]
+ echo 'Unsupported model prefix: dsr1-fp8. Supported prefixes are: gptoss or dsr1'
+ exit 1
Error: Process completed with exit code 1.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 2, 2026

Claude finished @functionstackx's task in 47s —— View job


Fixed Model Prefix Typo in nvidia-master.yaml

  • Read nvidia-master.yaml to find the incorrect model prefix
  • Fix dsr1-fp8 to dsr1 in the config
  • Commit and push the fix

Issue: Invalid model-prefix at line 2297

- model-prefix: dsr1-fp8
+ model-prefix: dsr1

The runner script only supports gptoss or dsr1 as model prefixes, not dsr1-fp8.

Fix pushed: e5efea3

--- | | Branch

The model-prefix 'dsr1-fp8' is not a supported prefix. The supported
prefixes are 'gptoss' and 'dsr1'. Changed to 'dsr1' to fix the
launch_gb200-nv.sh runner error.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx
Copy link
Copy Markdown
Contributor

@jthomson04 can u take a look at this error related to the path of the model?

Traceback (most recent call last):
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/cli/do_sweep.py", line 276, in main
    runtime = RuntimeContext.from_config(config, job_id)
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/core/runtime.py", line 163, in from_config
    raise FileNotFoundError(f"Model path does not exist: {model_path}")
FileNotFoundError: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/dsr1-fp8

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21609573902/job/62274835308?pr=617

@jthomson04
Copy link
Copy Markdown
Collaborator Author

jthomson04 commented Feb 2, 2026

@jthomson04 can u take a look at this error related to the path of the model?

Traceback (most recent call last):
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/cli/do_sweep.py", line 276, in main
    runtime = RuntimeContext.from_config(config, job_id)
  File "/mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/outputs/8517/.srtctl_install/srtctl/core/runtime.py", line 163, in from_config
    raise FileNotFoundError(f"Model path does not exist: {model_path}")
FileNotFoundError: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunner0/actions-runner/_work/InferenceMAX/InferenceMAX/srt-slurm/dsr1-fp8

https://github.com/InferenceMAX/InferenceMAX/actions/runs/21609573902/job/62274835308?pr=617

Claude broke it. I have a fix locally, will push soon.

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
@functionstackx
Copy link
Copy Markdown
Contributor

Claude broke it. I have a fix locally, will push soon.

mb lol

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 3, 2026

@jthomson04 claude not AGI confirmed

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 3, 2026

@claude please make the perf changelog description and PR description slightly more detailed. don't list every single thing per se, but give a high level of the scenarios added. be more detailed in the PR description (make a table of scenarios, something like https://github.com/InferenceMAX/InferenceMAX/pull/582 description)

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 3, 2026

@claude in .github/configs/nvidia-master.yaml, there are scenarios with corresponding links to external configs that contain more detailed information. can you please get the configs from the links and double check that the fields (such as prefill/decode ep/tp, num workers, etc) that are also present in this repo's config match?

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 3, 2026

Claude finished @cquil11's task in 1m 29s —— View job


Updated PR and Changelog Descriptions ✓

  • Read PR [nv] add H200 SGLang disagg configs from srtslurm #582 to understand the expected description format
  • Read current perf-changelog.yaml to see the existing entry
  • Read nvidia-master.yaml to understand scenarios added
  • Update perf-changelog.yaml with more detailed description
  • Update PR description with detailed scenario table
  • Commit and push changes

Changes made:

  1. Updated perf-changelog.yaml with detailed breakdown:

    • Image version and source
    • Scenario counts per ISL/OSL (1k1k, 1k8k, 8k1k)
    • MTP vs STP breakdown
    • Worker configuration ranges (1P-5P, 1D-4D)
  2. Updated PR description with:

    • Summary and changes section
    • Scenario table showing ISL/OSL, mode, scenario counts, workers, and concurrency ranges
    • Architecture variants explanation (DP attention, TEP, MTP, STP)

Commit: 05cde10

--- | Branch

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 3, 2026

Claude finished @cquil11's task in 2m 34s —— View job


Verified External Config Links ✓

  • Read nvidia-master.yaml to find scenarios with external config links
  • Fetch external configs and extract relevant fields (38 configs found)
  • Compare fields (prefill/decode ep/tp, num workers, etc.) with local config
  • Report any mismatches found

Result: All 38 external configs match the local values.

I fetched each external config from ishandhanani/srt-slurm@sa-submission-q1-2026 and verified the following fields:

  • num-worker (prefill/decode) ↔ prefill_workers / decode_workers
  • tptensor_parallel_size
  • epmoe_expert_parallel_size
  • dp-attnenable_attention_dp
Verified configs (click to expand)

1k1k MTP configs (7):

  • ✓ ctx1_gen1_dep8_batch512_eplb0_mtp1_4301.yaml
  • ✓ ctx1_gen1_dep8_batch256_eplb0_mtp1_2151.yaml
  • ✓ ctx1_gen1_dep16_batch64_eplb0_mtp1_1229.yaml
  • ✓ ctx1_gen1_dep32_batch16_eplb0_mtp3_615.yaml
  • ✓ ctx1_gen3_tep8_batch8_eplb0_mtp3_36.yaml
  • ✓ ctx1_gen3_tep8_batch4_eplb0_mtp3_18.yaml
  • ✓ ctx1_gen3_tep8_batch2_eplb0_mtp3_9.yaml

1k1k STP configs (7):

  • ✓ ctx1_gen1_dep8_batch768_eplb0_mtp0_6144.yaml
  • ✓ ctx1_gen1_dep8_batch512_eplb0_mtp0_4301.yaml
  • ✓ ctx1_gen1_dep16_batch128_eplb0_mtp0_2151.yaml
  • ✓ ctx1_gen1_dep32_batch32_eplb0_mtp0_1127.yaml
  • ✓ ctx1_gen1_dep32_batch8_eplb0_mtp0_256.yaml
  • ✓ ctx1_gen3_tep8_batch8_eplb0_mtp0_27.yaml
  • ✓ ctx1_gen3_tep8_batch1_eplb0_mtp0_3.yaml

1k8k MTP configs (5):

  • ✓ ctx1_gen2_dep16_batch256_eplb0_mtp1_8192.yaml
  • ✓ ctx1_gen2_dep16_batch64_eplb0_mtp1_2152.yaml
  • ✓ ctx1_gen1_dep32_batch16_eplb0_mtp3_564.yaml
  • ✓ ctx1_gen1_dep32_batch2_eplb0_mtp3_72.yaml
  • ✓ ctx1_gen4_tep8_batch2_eplb0_mtp3_8.yaml

1k8k STP configs (5):

  • ✓ ctx1_gen2_dep16_batch256_eplb0_mtp0_8192.yaml
  • ✓ ctx1_gen1_dep32_batch64_eplb0_mtp0_2048.yaml
  • ✓ ctx1_gen1_dep32_batch16_eplb0_mtp0_564.yaml
  • ✓ ctx1_gen4_tep8_batch8_eplb0_mtp0_36.yaml
  • ✓ ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml

8k1k MTP configs (7):

  • ✓ ctx3_gen1_dep8_batch64_eplb0_mtp3_666.yaml
  • ✓ ctx5_gen1_dep16_batch32_eplb0_mtp3_666.yaml
  • ✓ ctx3_gen1_dep16_batch16_eplb0_mtp3_333.yaml
  • ✓ ctx4_gen1_dep32_batch8_eplb0_mtp3_333.yaml
  • ✓ ctx2_gen1_dep32_batch2_eplb0_mtp3_90.yaml
  • ✓ ctx1_gen3_tep8_batch4_eplb0_mtp3_15.yaml
  • ✓ ctx1_gen3_tep8_batch2_eplb0_mtp3_6.yaml

8k1k STP configs (7):

  • ✓ ctx5_gen1_dep16_batch64_eplb0_mtp0_1229.yaml
  • ✓ ctx4_gen1_dep32_batch16_eplb0_mtp0_666.yaml
  • ✓ ctx3_gen1_dep16_batch32_eplb0_mtp0_615.yaml
  • ✓ ctx2_gen1_dep32_batch8_eplb0_mtp0_333.yaml
  • ✓ ctx1_gen3_tep8_batch16_eplb0_mtp0_63.yaml
  • ✓ ctx1_gen3_tep8_batch4_eplb0_mtp0_18.yaml
  • ✓ ctx1_gen3_tep8_batch1_eplb0_mtp0_6.yaml

--- | Branch

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: jthomson04 <jothomson@nvidia.com>
@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

looking good to merge

@jthomson04 jthomson04 merged commit daa8239 into main Feb 3, 2026
13 checks passed
@jthomson04 jthomson04 deleted the nv/dsr1-fp8-gb200-dynamo-trt branch February 3, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

5 participants