[NV] DSR1 FP4 GB300 Dynamo TRT by jthomson04 · Pull Request #618 · SemiAnalysisAI/InferenceX

jthomson04 · 2026-02-02T20:32:09Z

Summary

Add DeepSeek-R1 FP4 GB300 Dynamo TRT disaggregated multinode benchmark configurations, sourced from srtslurm recipes.

Changes

Add dsr1-fp4-gb300-dynamo-trt config to nvidia-master.yaml
Add gb300-nv_0 runner to runners.yaml
Add launch_gb300-nv.sh script for srt-slurm integration
Add perf-changelog entry

Config Details

ISL/OSL	Mode	Prefill Workers	Decode Workers	Decode TP/EP	Concurrencies
1k1k
1k1k	MTP	1P	1D	4	3226
1k1k	MTP	1P	1D	32	333
1k1k	MTP	1P	4D	8	5, 8-48
1k1k	MTP	3P	1D	16	2253
1k1k	MTP	3P	1D	32	1229
1k1k	STP	1P	4D	8	5, 12-192
1k1k	STP	2P	1D	8	8192
1k1k	STP	2P	1D	32	1229
1k1k	STP	3P	1D	16	4301
1k1k	STP	3P	1D	32	2253
1k8k
1k8k	MTP	1P	7D	8	7, 63
1k8k	MTP	1P	1D	32	563, 2088
1k8k	MTP	1P	2D	16	8192
1k8k	MTP	1P	4D	8	16384
1k8k	STP	1P	7D	8	7, 245
1k8k	STP	1P	15D	4	60
1k8k	STP	1P	1D	32	1024, 4096, 8192
8k1k
8k1k	MTP	1P	3D	8	33
8k1k	MTP	1P	4D	8	5, 12, 24
8k1k	MTP	4P	1D	32	180
8k1k	MTP	8P	1D	32	308
8k1k	MTP	10P	1D	8	2253
8k1k	MTP	10P	1D	16	666
8k1k	MTP	13P	1D	16	1127
8k1k	STP	1P	3D	8	72
8k1k	STP	1P	4D	8	5, 12
8k1k	STP	1P	5D	8	5, 15, 30
8k1k	STP	7P	1D	32	666
8k1k	STP	9P	1D	16	1229
8k1k	STP	11P	3D	4	3228
8k1k	STP	14P	1D	16	2253

Notes:

MTP = Multi-Token Prediction (speculative decoding)
STP = Standard Token Prediction (no speculative decoding)
P = Prefill worker, D = Decode worker
All prefill workers use TP=2, EP=2 with DP attention enabled
Image: nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
Model: nvidia/DeepSeek-R1-0528-NVFP4-v2

Signed-off-by: jthomson04 <jothomson@nvidia.com>

functionstackx · 2026-02-02T22:58:28Z

@claude please fix this PR, seems like typo in perf-changelog.yaml

Traceback (most recent call last):
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 144, in <module>
    main()
  File "/home/runner/work/InferenceMAX/InferenceMAX/utils/process_changelog.py", line 73, in main
    changelog_data = yaml.safe_load(added_yaml)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 49, in get_single_data
    node = self.get_single_node()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 36, in get_single_node
    document = self.compose_document()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 55, in compose_document
    node = self.compose_node(None, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 84, in compose_node
    node = self.compose_mapping_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 133, in compose_mapping_node
    item_value = self.compose_node(node, item_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 82, in compose_node
    node = self.compose_sequence_node(anchor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/composer.py", line 110, in compose_sequence_node
    while not self.check_event(SequenceEndEvent):
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 98, in check_event
    self.current_event = self.state()
                         ^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/yaml/parser.py", line 392, in parse_block_sequence_entry
    raise ParserError("while parsing a block collection", self.marks[-1],
yaml.parser.ParserError: while parsing a block collection
  in "<unicode string>", line 3, column 3:
      - dsr1-fp4-gb300-dynamo-trt
      ^
expected <block end>, but found '?'
  in "<unicode string>", line 4, column 3:
      description:
      ^

Klaud-Cold · 2026-02-02T23:00:13Z

Claude finished @functionstackx's task in 49s —— View job

Fixing YAML Syntax Error in perf-changelog.yaml

Analyze the YAML parsing error
Read and examine perf-changelog.yaml
Fix the syntax error
Commit and push the fix

Summary: Fixed YAML syntax error at line 329 where -config-keys: was missing a space after the dash. Changed to - config-keys: and also fixed the indentation of the child list item.

Commit: 4bba770
|

Add missing space after dash in list item at line 329 (-config-keys: -> - config-keys:) Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

cquil11 · 2026-02-03T05:07:42Z

@claude in .github/configs/nvidia-master.yaml, there are scenarios with corresponding links to external configs that contain more detailed information. can you please get the configs from the links and double check that the fields (such as prefill/decode ep/tp, num workers, etc) that are also present in this repo's config match?

Klaud-Cold · 2026-02-03T05:08:06Z

Claude finished @cquil11's task in 3m 25s —— View job

Verifying External Config Links

Read nvidia-master.yaml to find external config links
Fetch each external config and extract relevant fields
Compare fields (prefill/decode EP/TP, num workers, etc.) with local config
Report any mismatches

Results Summary

I verified all 40 GB300-FP4 external configs against the local nvidia-master.yaml configuration.

Found 1 Mismatch

Config File	Field	External Value	Local Value	Line
`ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml`	decode.tp	4	8	3109
`ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml`	decode.ep	4	8	3110

Location: .github/configs/nvidia-master.yaml:3098-3111

The external config specifies decode TP/EP of 4 (as indicated by tep4 in the filename), but the local config has TP/EP of 8.

# Local config (lines 3098-3111) - INCORRECT
- conc-list: [5, 15, 30]
  prefill:
    num-worker: 1
    tp: 2
    ep: 2
    dp-attn: true
    additional-settings:
    # https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml
    - "CONFIG_FILE=recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml"
  decode:
    num-worker: 5
    tp: 8    # Should be 4
    ep: 8    # Should be 4
    dp-attn: false

All Other Configs Match ✅

The remaining 39 GB300-FP4 configs all have matching values:

1k1k MTP: 6 configs ✅
1k1k STP: 6 configs ✅
1k8k MTP: 6 configs ✅
1k8k STP: 6 configs ✅
8k1k MTP: 8 configs ✅
8k1k STP: 7 configs ✅ (1 mismatch noted above)

Would you like me to fix this mismatch? I can update the decode TP/EP from 8 to 4 at lines 3109-3110.

cquil11 · 2026-02-03T05:11:08Z

@claude please make the perf changelog description and PR description slightly more detailed. don't list every single thing per se, but give a high level of the scenarios added. be more detailed in the PR description (make a table of scenarios, something like https://github.com/InferenceMAX/InferenceMAX/pull/582 description)

Klaud-Cold · 2026-02-03T05:11:41Z

Claude finished @cquil11's task in 1m 13s —— View job

Updating PR and Changelog Descriptions

Read the nvidia-master.yaml to understand scenarios added
Read PR [nv] add H200 SGLang disagg configs from srtslurm #582 for reference description format
Update perf-changelog.yaml with better description
Update PR description with detailed table
Commit and push changes

Summary: Updated the perf-changelog.yaml with a more detailed description covering the image version, sequence length configs (1k1k, 1k8k, 8k1k), and srt-slurm integration. Also updated the PR description with a comprehensive table of all scenarios showing ISL/OSL, mode (MTP/STP), prefill/decode worker counts, TP/EP settings, and concurrency levels.

Commit: 986fb1b
|

Add more detailed description including image version, sequence length coverage (1k1k, 1k8k, 8k1k), and srt-slurm integration details. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11

lgtm after this run completes successfully https://github.com/InferenceMAX/InferenceMAX/actions/runs/21611528513

initial configs

e72dd2e

Signed-off-by: jthomson04 <jothomson@nvidia.com>

github-project-automation bot added this to InferenceMAX Board Feb 2, 2026

perf changelog

b5e3fa3

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 marked this pull request as ready for review February 2, 2026 20:34

jthomson04 requested a review from a team as a code owner February 2, 2026 20:34

functionstackx added the sweep-enabled label Feb 2, 2026

Fix YAML syntax error in perf-changelog.yaml

4bba770

Add missing space after dash in list item at line 329 (-config-keys: -> - config-keys:) Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx added the NVIDIA label Feb 3, 2026

github-actions bot and others added 2 commits February 3, 2026 05:13

Improve perf-changelog description for GB300 Dynamo TRT configs

986fb1b

Add more detailed description including image version, sequence length coverage (1k1k, 1k8k, 8k1k), and srt-slurm integration details. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

Correct tp and ep values in nvidia-master.yaml

b1fc3eb

cquil11 removed the sweep-enabled label Feb 3, 2026

cquil11 approved these changes Feb 3, 2026

View reviewed changes

jthomson04 merged commit 3eb1b77 into main Feb 3, 2026

jthomson04 deleted the nv/dsr1-fp4-gb300-dynamo-trt branch February 3, 2026 17:13

github-project-automation bot moved this to Done in InferenceMAX Board Feb 3, 2026

Klaud-Cold mentioned this pull request Feb 5, 2026

[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k) #627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] DSR1 FP4 GB300 Dynamo TRT#618

[NV] DSR1 FP4 GB300 Dynamo TRT#618
jthomson04 merged 5 commits intomainfrom
nv/dsr1-fp4-gb300-dynamo-trt

jthomson04 commented Feb 2, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

Klaud-Cold commented Feb 2, 2026 •

edited

Loading

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

Klaud-Cold commented Feb 3, 2026 •

edited

Loading

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

Klaud-Cold commented Feb 3, 2026 •

edited

Loading

Uh oh!

cquil11 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jthomson04 commented Feb 2, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Config Details

Uh oh!

functionstackx commented Feb 2, 2026

Uh oh!

Klaud-Cold commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixing YAML Syntax Error in perf-changelog.yaml

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

Klaud-Cold commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying External Config Links

Results Summary

Found 1 Mismatch

All Other Configs Match ✅

Uh oh!

cquil11 commented Feb 3, 2026

Uh oh!

Klaud-Cold commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updating PR and Changelog Descriptions

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jthomson04 commented Feb 2, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Feb 2, 2026 •

edited

Loading

Klaud-Cold commented Feb 3, 2026 •

edited

Loading

Klaud-Cold commented Feb 3, 2026 •

edited

Loading