[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k) by jthomson04 · Pull Request #627 · SemiAnalysisAI/InferenceX

jthomson04 · 2026-02-04T04:21:19Z

Summary

This PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths.

Changes

New Configuration: `dsr1-fp8-gb300-dynamo-trt`

Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
Model: deepseek-ai/DeepSeek-R1-0528
Runner: GB300 with multinode and disaggregated inference enabled
Framework: Dynamo TRT-LLM

Sequence Length Configurations

1k1k (ISL=1024, OSL=1024)

MTP (Multi-Token Prediction): 7 configurations with concurrencies [8, 24, 180, 564, 666, 2253, 8192]
STP (Single-Token Prediction): 7 configurations with concurrencies [4, 24, 84, 1229, 2253, 8602, 12288]

8k1k (ISL=8192, OSL=1024)

MTP: 6 configurations with concurrencies [8, 24, 333, 666, 1229]
STP: 7 configurations with concurrencies [4, 24, 36, 512, 666, 1229, 2151]

Launch Script Updates (`runners/launch_gb300-nv.sh`)

Extended to support FP8 precision in addition to FP4
Added conditional model path selection based on precision:
- FP4: /raid/shared/models/deepseek-r1-0528-fp4-v2
- FP8: /raid/shared/models/deepseek-r1-0528
Added SRT_SLURM_MODEL_PREFIX variable to correctly map model aliases in srt-slurm

Benchmark Verification

✅ Successful sweep: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21697417250

Related PRs

[NV] Fix GB200 fp8 dynamo trtllm model prefix #646 - Fix GB200 FP8 Dynamo TRT model prefix
[NV] DSR1 FP4 GB300 Dynamo TRT #618 - GB300 FP4 Dynamo TRT configurations

claude · 2026-02-04T18:06:18Z

Claude finished @jthomson04's task in 1m 15s —— View job

PR Review Complete

Gather context and read changed files
Validate master config and perf-changelog requirements
Validate container image accessibility
Validate launch script enroot import pattern
Review code for bugs, logic errors, and blocking issues
Post review summary

Summary

This PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths.

Found 1 blocking issue:

🔴 Model path uses local /scratch/ path (.github/configs/nvidia-master.yaml:3742) - The new config uses model: /scratch/models/DeepSeek-R1-0528 while other similar configs (GB200 FP8, GB300 FP4) use public HuggingFace model references. This makes the configuration non-reproducible for others. See inline comment for suggested fix.

Validations passed:

✅ perf-changelog.yaml updated alongside master config changes
✅ Container image nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2 is publicly accessible on NGC
✅ Launch script properly uses enroot import docker://$IMAGE for reproducibility
✅ Launch script correctly extended to support dsr1-fp8 model prefix

.github/configs/nvidia-master.yaml

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Oseltamivir

LGTM

.github/configs/nvidia-master.yaml

runners/launch_gb300-nv.sh

Signed-off-by: jthomson04 <jothomson@nvidia.com>

…amo-trt

kedarpotdar-nv · 2026-02-05T17:13:39Z

Successful sweep - https://github.com/InferenceMAX/InferenceMAX/actions/runs/21697417250

cquil11 · 2026-02-05T17:18:06Z

@claude In the .github/configs/nvidia-master.yaml file, please go through all added downstream config files (the links) and ensure there is no system specific config settings or hardcoded paths to cache artifacts, etc.

Klaud-Cold · 2026-02-05T17:18:25Z

Claude finished @cquil11's task in 2m 8s —— View job

Reviewing Downstream Config Files for System-Specific Settings

Extract downstream config file URLs from nvidia-master.yaml changes
Fetch and review each downstream config file (27 files reviewed)
Check for hardcoded paths, system-specific settings, or cache artifacts
Report findings

Summary

Reviewed all 27 downstream config files from the srt-slurm repository referenced in this PR.

✅ No Blocking Issues Found

Good news: None of the downstream config files contain hardcoded system paths like /scratch/, /home/, /mnt/, or /data/. The configurations use:

Relative model identifiers (dsr1-fp8) rather than absolute filesystem paths
Container references (dynamo-trtllm) that resolve via registry
Abstract resource specifications without local path dependencies

Environment-Specific Configurations (Expected/Acceptable)

The configs do contain hardware-specific settings that are expected and appropriate for this use case:

Setting Type	Examples	Notes
GPU Type	`gb300`	Required for GB300 deployment
Communication	`TRTLLM_FORCE_COMM_METHOD: NVLINK_TWO_SIDED`	NVLink topology optimization
NCCL Settings	`NCCL_GRAPH_MIXING_SUPPORT: 0`	Hardware-dependent tuning
Memory Fractions	`free_gpu_memory_fraction: 0.1/0.7/0.9`	Workload-specific tuning
Batch/Sequence Limits	Various `max_batch_size`, `max_seq_len` values	Config-specific parameters

These are not problematic - they are intentional hardware-specific optimizations that define the benchmark configuration.

Files Reviewed

1k1k MTP configs (7 files): ✅ Clean

ctx1_gen1_dep16_batch32_eplb0_mtp3_666.yaml
ctx1_gen1_dep32_batch4_eplb0_mtp3_180.yaml
ctx1_gen4_tep8_batch1_eplb0_mtp3_8.yaml
ctx1_gen4_tep8_batch4_eplb0_mtp3_24.yaml
ctx2_gen1_dep16_batch128_eplb0_mtp1_2253.yaml
ctx2_gen1_dep32_batch16_eplb0_mtp3_564.yaml
ctx3_gen2_dep8_batch512_eplb0_mtp1_8192.yaml

1k1k STP configs (7 files): ✅ Clean

ctx1_gen4_tep8_batch16_eplb0_mtp0_84.yaml
ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml
ctx1_gen4_tep8_batch4_eplb0_mtp0_24.yaml
ctx2_gen1_dep16_batch128_eplb0_mtp0_2253.yaml
ctx2_gen1_dep32_batch32_eplb0_mtp0_1229.yaml
ctx3_gen2_dep8_batch512_eplb0_mtp0_8602.yaml
ctx3_gen2_dep8_batch768_eplb0_mtp0_12288.yaml

8k1k MTP configs (6 files): ✅ Clean

ctx10_gen1_dep16_batch64_eplb0_mtp1_1229.yaml
ctx1_gen4_tep8_batch1_eplb0_mtp3_8.yaml
ctx1_gen4_tep8_batch4_eplb0_mtp3_24.yaml
ctx6_gen1_dep32_batch8_eplb0_mtp3_333.yaml
ctx7_gen1_dep8_batch128_eplb0_mtp1_1229.yaml
ctx8_gen1_dep16_batch32_eplb0_mtp3_666.yaml

8k1k STP configs (7 files): ✅ Clean

ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml
ctx1_gen4_tep8_batch4_eplb0_mtp0_24.yaml
ctx1_gen4_tep8_batch8_eplb0_mtp0_36.yaml
ctx4_gen1_dep16_batch32_eplb0_mtp0_666.yaml
ctx6_gen1_dep32_batch16_eplb0_mtp0_512.yaml
ctx7_gen1_dep16_batch64_eplb0_mtp0_1229.yaml
ctx7_gen1_dep8_batch256_eplb0_mtp0_2151.yaml

cquil11 · 2026-02-05T17:18:34Z

@claude Please update the PR description in accordance with changes introduced in this PR, as well as changes stated in perf-changelog.yaml.

Klaud-Cold · 2026-02-05T17:18:54Z

Claude finished @cquil11's task in 1m 0s —— View job

Updating PR Description

Read the changed files to understand PR changes
Read perf-changelog.yaml for change descriptions
Update the PR description with comprehensive summary

Summary

Updated the PR description with a comprehensive summary based on the changes introduced:

Key changes documented:

New dsr1-fp8-gb300-dynamo-trt configuration with Dynamo TRT-LLM image v0.8.1.post2
1k1k sequence length: 7 MTP + 7 STP configurations (concurrencies up to 12,288)
8k1k sequence length: 6 MTP + 7 STP configurations (concurrencies up to 2,151)
Launch script updates to support FP8 precision with correct model path selection
Reference to successful benchmark sweep and related PRs

functionstackx

LGTM

github-project-automation bot added this to InferenceMAX Board Feb 4, 2026

kedarpotdar-nv added the NVIDIA label Feb 4, 2026

jthomson04 marked this pull request as ready for review February 4, 2026 18:05

jthomson04 requested a review from a team as a code owner February 4, 2026 18:05

claude bot reviewed Feb 4, 2026

View reviewed changes

.github/configs/nvidia-master.yaml Outdated Show resolved Hide resolved

jthomson04 added the sweep-enabled label Feb 4, 2026

jthomson04 added 4 commits February 4, 2026 10:43

gb300 fp8 configs

07075d6

Signed-off-by: jthomson04 <jothomson@nvidia.com>

remove 1k8k for now

dab0c1d

Signed-off-by: jthomson04 <jothomson@nvidia.com>

perf changelog entry

f5cae75

Signed-off-by: jthomson04 <jothomson@nvidia.com>

fix model path

504fd22

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 force-pushed the nv/dsr1-fp8-gb300-dynamo-trt branch from 9acc0e8 to 504fd22 Compare February 4, 2026 18:43

fix fp8 model path

b6c4e2b

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Oseltamivir approved these changes Feb 5, 2026

View reviewed changes

functionstackx requested changes Feb 5, 2026

View reviewed changes

.github/configs/nvidia-master.yaml Outdated Show resolved Hide resolved

functionstackx requested a review from Oseltamivir February 5, 2026 01:47

functionstackx reviewed Feb 5, 2026

View reviewed changes

runners/launch_gb300-nv.sh Outdated Show resolved Hide resolved

fix model prefix

4bf4a06

Signed-off-by: jthomson04 <jothomson@nvidia.com>

This was referenced Feb 5, 2026

Add model prefix validation to PR review workflow #648

Merged

Update GB200 and GB300 SGLANG FP8 DSR1 Disaggregated inference configs (STP only) #635

Merged

Merge remote-tracking branch 'public/main' into nv/dsr1-fp8-gb300-dyn…

35c1877

…amo-trt

functionstackx approved these changes Feb 5, 2026

View reviewed changes

functionstackx removed the sweep-enabled label Feb 5, 2026

Merge branch 'main' into nv/dsr1-fp8-gb300-dynamo-trt

e7401d6

jthomson04 merged commit 032f9aa into main Feb 5, 2026
13 checks passed

jthomson04 deleted the nv/dsr1-fp8-gb300-dynamo-trt branch February 5, 2026 20:14

github-project-automation bot moved this to Done in InferenceMAX Board Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k)#627

[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k)#627
jthomson04 merged 8 commits intomainfrom
nv/dsr1-fp8-gb300-dynamo-trt

jthomson04 commented Feb 4, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

claude bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Oseltamivir left a comment

Uh oh!

Uh oh!

Uh oh!

kedarpotdar-nv commented Feb 5, 2026

Uh oh!

cquil11 commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Uh oh!

cquil11 commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jthomson04 commented Feb 4, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Configuration: dsr1-fp8-gb300-dynamo-trt

Sequence Length Configurations

1k1k (ISL=1024, OSL=1024)

8k1k (ISL=8192, OSL=1024)

Launch Script Updates (runners/launch_gb300-nv.sh)

Benchmark Verification

Related PRs

Uh oh!

claude bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Summary

Uh oh!

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kedarpotdar-nv commented Feb 5, 2026

Uh oh!

cquil11 commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing Downstream Config Files for System-Specific Settings

Summary

✅ No Blocking Issues Found

Environment-Specific Configurations (Expected/Acceptable)

Files Reviewed

Uh oh!

cquil11 commented Feb 5, 2026

Uh oh!

Klaud-Cold commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updating PR Description

Summary

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jthomson04 commented Feb 4, 2026 •

edited by Klaud-Cold

Loading

New Configuration: `dsr1-fp8-gb300-dynamo-trt`

Launch Script Updates (`runners/launch_gb300-nv.sh`)

claude bot commented Feb 4, 2026 •

edited

Loading

Klaud-Cold commented Feb 5, 2026 •

edited

Loading

Klaud-Cold commented Feb 5, 2026 •

edited

Loading