Skip to content

[NV] DSR1 FP4 GB300 Dynamo TRT#618

Merged
jthomson04 merged 5 commits intomainfrom
nv/dsr1-fp4-gb300-dynamo-trt
Feb 3, 2026
Merged

[NV] DSR1 FP4 GB300 Dynamo TRT#618
jthomson04 merged 5 commits intomainfrom
nv/dsr1-fp4-gb300-dynamo-trt

Conversation

@jthomson04
Copy link
Copy Markdown
Collaborator

@jthomson04 jthomson04 commented Feb 2, 2026

Summary

Add DeepSeek-R1 FP4 GB300 Dynamo TRT disaggregated multinode benchmark configurations, sourced from srtslurm recipes.

Changes

  • Add dsr1-fp4-gb300-dynamo-trt config to nvidia-master.yaml
  • Add gb300-nv_0 runner to runners.yaml
  • Add launch_gb300-nv.sh script for srt-slurm integration
  • Add perf-changelog entry

Config Details

ISL/OSL Mode Prefill Workers Decode Workers Decode TP/EP Concurrencies
1k1k
1k1k MTP 1P 1D 4 3226
1k1k MTP 1P 1D 32 333
1k1k MTP 1P 4D 8 5, 8-48
1k1k MTP 3P 1D 16 2253
1k1k MTP 3P 1D 32 1229
1k1k STP 1P 4D 8 5, 12-192
1k1k STP 2P 1D 8 8192
1k1k STP 2P 1D 32 1229
1k1k STP 3P 1D 16 4301
1k1k STP 3P 1D 32 2253
1k8k
1k8k MTP 1P 7D 8 7, 63
1k8k MTP 1P 1D 32 563, 2088
1k8k MTP 1P 2D 16 8192
1k8k MTP 1P 4D 8 16384
1k8k STP 1P 7D 8 7, 245
1k8k STP 1P 15D 4 60
1k8k STP 1P 1D 32 1024, 4096, 8192
8k1k
8k1k MTP 1P 3D 8 33
8k1k MTP 1P 4D 8 5, 12, 24
8k1k MTP 4P 1D 32 180
8k1k MTP 8P 1D 32 308
8k1k MTP 10P 1D 8 2253
8k1k MTP 10P 1D 16 666
8k1k MTP 13P 1D 16 1127
8k1k STP 1P 3D 8 72
8k1k STP 1P 4D 8 5, 12
8k1k STP 1P 5D 8 5, 15, 30
8k1k STP 7P 1D 32 666
8k1k STP 9P 1D 16 1229
8k1k STP 11P 3D 4 3228
8k1k STP 14P 1D 16 2253

Notes:

  • MTP = Multi-Token Prediction (speculative decoding)
  • STP = Standard Token Prediction (no speculative decoding)
  • P = Prefill worker, D = Decode worker
  • All prefill workers use TP=2, EP=2 with DP attention enabled
  • Image: nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
  • Model: nvidia/DeepSeek-R1-0528-NVFP4-v2

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
@jthomson04 jthomson04 marked this pull request as ready for review February 2, 2026 20:34
@jthomson04 jthomson04 requested a review from a team as a code owner February 2, 2026 20:34
@functionstackx
Copy link
Copy Markdown
Contributor

@claude please fix this PR, seems like typo in perf-changelog.yaml

Traceback (most recent call last):
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 144, in <module>
    main()
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 73, in main
    changelog_data = yaml.safe_load(added_yaml)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 110, in compose_sequence_node
    while not self.check_event(SequenceEndEvent):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 392, in parse_block_sequence_entry
    raise ParserError("while parsing a block collection", self.marks[-1],
yaml.parser.ParserError: while parsing a block collection
  in "<unicode string>", line 3, column 3:
      - dsr1-fp4-gb300-dynamo-trt
      ^
expected <block end>, but found '?'
  in "<unicode string>", line 4, column 3:
      description:
      ^

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 2, 2026

Claude finished @functionstackx's task in 49s —— View job


Fixing YAML Syntax Error in perf-changelog.yaml

  • Analyze the YAML parsing error
  • Read and examine perf-changelog.yaml
  • Fix the syntax error
  • Commit and push the fix

Summary: Fixed YAML syntax error at line 329 where -config-keys: was missing a space after the dash. Changed to - config-keys: and also fixed the indentation of the child list item.

Commit: 4bba770
|

Add missing space after dash in list item at line 329 (-config-keys: -> - config-keys:)

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 3, 2026

@claude in .github/configs/nvidia-master.yaml, there are scenarios with corresponding links to external configs that contain more detailed information. can you please get the configs from the links and double check that the fields (such as prefill/decode ep/tp, num workers, etc) that are also present in this repo's config match?

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 3, 2026

Claude finished @cquil11's task in 3m 25s —— View job


Verifying External Config Links

  • Read nvidia-master.yaml to find external config links
  • Fetch each external config and extract relevant fields
  • Compare fields (prefill/decode EP/TP, num workers, etc.) with local config
  • Report any mismatches

Results Summary

I verified all 40 GB300-FP4 external configs against the local nvidia-master.yaml configuration.

Found 1 Mismatch

Config File Field External Value Local Value Line
ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml decode.tp 4 8 3109
ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml decode.ep 4 8 3110

Location: .github/configs/nvidia-master.yaml:3098-3111

The external config specifies decode TP/EP of 4 (as indicated by tep4 in the filename), but the local config has TP/EP of 8.

# Local config (lines 3098-3111) - INCORRECT
- conc-list: [5, 15, 30]
  prefill:
    num-worker: 1
    tp: 2
    ep: 2
    dp-attn: true
    additional-settings:
    # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml
    - "CONFIG_FILE=recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml"
  decode:
    num-worker: 5
    tp: 8    # Should be 4
    ep: 8    # Should be 4
    dp-attn: false

All Other Configs Match ✅

The remaining 39 GB300-FP4 configs all have matching values:

  • 1k1k MTP: 6 configs ✅
  • 1k1k STP: 6 configs ✅
  • 1k8k MTP: 6 configs ✅
  • 1k8k STP: 6 configs ✅
  • 8k1k MTP: 8 configs ✅
  • 8k1k STP: 7 configs ✅ (1 mismatch noted above)

Would you like me to fix this mismatch? I can update the decode TP/EP from 8 to 4 at lines 3109-3110.

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 3, 2026

@claude please make the perf changelog description and PR description slightly more detailed. don't list every single thing per se, but give a high level of the scenarios added. be more detailed in the PR description (make a table of scenarios, something like https://github.com/InferenceMAX/InferenceMAX/pull/582 description)

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 3, 2026

Claude finished @cquil11's task in 1m 13s —— View job


Updating PR and Changelog Descriptions

Summary: Updated the perf-changelog.yaml with a more detailed description covering the image version, sequence length configs (1k1k, 1k8k, 8k1k), and srt-slurm integration. Also updated the PR description with a comprehensive table of all scenarios showing ISL/OSL, mode (MTP/STP), prefill/decode worker counts, TP/EP settings, and concurrency levels.

Commit: 986fb1b
|

github-actions bot and others added 2 commits February 3, 2026 05:13
Add more detailed description including image version, sequence length
coverage (1k1k, 1k8k, 8k1k), and srt-slurm integration details.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jthomson04 jthomson04 merged commit 3eb1b77 into main Feb 3, 2026
@jthomson04 jthomson04 deleted the nv/dsr1-fp4-gb300-dynamo-trt branch February 3, 2026 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

4 participants