Skip to content

[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k)#627

Merged
jthomson04 merged 8 commits intomainfrom
nv/dsr1-fp8-gb300-dynamo-trt
Feb 5, 2026
Merged

[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k)#627
jthomson04 merged 8 commits intomainfrom
nv/dsr1-fp8-gb300-dynamo-trt

Conversation

@jthomson04
Copy link
Copy Markdown
Collaborator

@jthomson04 jthomson04 commented Feb 4, 2026

Summary

This PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths.

Changes

New Configuration: dsr1-fp8-gb300-dynamo-trt

  • Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2
  • Model: deepseek-ai/DeepSeek-R1-0528
  • Runner: GB300 with multinode and disaggregated inference enabled
  • Framework: Dynamo TRT-LLM

Sequence Length Configurations

1k1k (ISL=1024, OSL=1024)

  • MTP (Multi-Token Prediction): 7 configurations with concurrencies [8, 24, 180, 564, 666, 2253, 8192]
  • STP (Single-Token Prediction): 7 configurations with concurrencies [4, 24, 84, 1229, 2253, 8602, 12288]

8k1k (ISL=8192, OSL=1024)

  • MTP: 6 configurations with concurrencies [8, 24, 333, 666, 1229]
  • STP: 7 configurations with concurrencies [4, 24, 36, 512, 666, 1229, 2151]

Launch Script Updates (runners/launch_gb300-nv.sh)

  • Extended to support FP8 precision in addition to FP4
  • Added conditional model path selection based on precision:
    • FP4: /raid/shared/models/deepseek-r1-0528-fp4-v2
    • FP8: /raid/shared/models/deepseek-r1-0528
  • Added SRT_SLURM_MODEL_PREFIX variable to correctly map model aliases in srt-slurm

Benchmark Verification

Related PRs

@jthomson04 jthomson04 marked this pull request as ready for review February 4, 2026 18:05
@jthomson04 jthomson04 requested a review from a team as a code owner February 4, 2026 18:05
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Feb 4, 2026

Claude finished @jthomson04's task in 1m 15s —— View job


PR Review Complete

  • Gather context and read changed files
  • Validate master config and perf-changelog requirements
  • Validate container image accessibility
  • Validate launch script enroot import pattern
  • Review code for bugs, logic errors, and blocking issues
  • Post review summary

Summary

This PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths.

Found 1 blocking issue:

🔴 Model path uses local /scratch/ path (.github/configs/nvidia-master.yaml:3742) - The new config uses model: /scratch/models/DeepSeek-R1-0528 while other similar configs (GB200 FP8, GB300 FP4) use public HuggingFace model references. This makes the configuration non-reproducible for others. See inline comment for suggested fix.

Validations passed:

  • perf-changelog.yaml updated alongside master config changes
  • ✅ Container image nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2 is publicly accessible on NGC
  • ✅ Launch script properly uses enroot import docker://$IMAGE for reproducibility
  • ✅ Launch script correctly extended to support dsr1-fp8 model prefix

Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
@jthomson04 jthomson04 force-pushed the nv/dsr1-fp8-gb300-dynamo-trt branch from 9acc0e8 to 504fd22 Compare February 4, 2026 18:43
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: jthomson04 <jothomson@nvidia.com>
@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 5, 2026

@claude In the .github/configs/nvidia-master.yaml file, please go through all added downstream config files (the links) and ensure there is no system specific config settings or hardcoded paths to cache artifacts, etc.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 5, 2026

Claude finished @cquil11's task in 2m 8s —— View job


Reviewing Downstream Config Files for System-Specific Settings

  • Extract downstream config file URLs from nvidia-master.yaml changes
  • Fetch and review each downstream config file (27 files reviewed)
  • Check for hardcoded paths, system-specific settings, or cache artifacts
  • Report findings

Summary

Reviewed all 27 downstream config files from the srt-slurm repository referenced in this PR.

✅ No Blocking Issues Found

Good news: None of the downstream config files contain hardcoded system paths like /scratch/, /home/, /mnt/, or /data/. The configurations use:

  • Relative model identifiers (dsr1-fp8) rather than absolute filesystem paths
  • Container references (dynamo-trtllm) that resolve via registry
  • Abstract resource specifications without local path dependencies

Environment-Specific Configurations (Expected/Acceptable)

The configs do contain hardware-specific settings that are expected and appropriate for this use case:

Setting Type Examples Notes
GPU Type gb300 Required for GB300 deployment
Communication TRTLLM_FORCE_COMM_METHOD: NVLINK_TWO_SIDED NVLink topology optimization
NCCL Settings NCCL_GRAPH_MIXING_SUPPORT: 0 Hardware-dependent tuning
Memory Fractions free_gpu_memory_fraction: 0.1/0.7/0.9 Workload-specific tuning
Batch/Sequence Limits Various max_batch_size, max_seq_len values Config-specific parameters

These are not problematic - they are intentional hardware-specific optimizations that define the benchmark configuration.

Files Reviewed

1k1k MTP configs (7 files): ✅ Clean

  • ctx1_gen1_dep16_batch32_eplb0_mtp3_666.yaml
  • ctx1_gen1_dep32_batch4_eplb0_mtp3_180.yaml
  • ctx1_gen4_tep8_batch1_eplb0_mtp3_8.yaml
  • ctx1_gen4_tep8_batch4_eplb0_mtp3_24.yaml
  • ctx2_gen1_dep16_batch128_eplb0_mtp1_2253.yaml
  • ctx2_gen1_dep32_batch16_eplb0_mtp3_564.yaml
  • ctx3_gen2_dep8_batch512_eplb0_mtp1_8192.yaml

1k1k STP configs (7 files): ✅ Clean

  • ctx1_gen4_tep8_batch16_eplb0_mtp0_84.yaml
  • ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml
  • ctx1_gen4_tep8_batch4_eplb0_mtp0_24.yaml
  • ctx2_gen1_dep16_batch128_eplb0_mtp0_2253.yaml
  • ctx2_gen1_dep32_batch32_eplb0_mtp0_1229.yaml
  • ctx3_gen2_dep8_batch512_eplb0_mtp0_8602.yaml
  • ctx3_gen2_dep8_batch768_eplb0_mtp0_12288.yaml

8k1k MTP configs (6 files): ✅ Clean

  • ctx10_gen1_dep16_batch64_eplb0_mtp1_1229.yaml
  • ctx1_gen4_tep8_batch1_eplb0_mtp3_8.yaml
  • ctx1_gen4_tep8_batch4_eplb0_mtp3_24.yaml
  • ctx6_gen1_dep32_batch8_eplb0_mtp3_333.yaml
  • ctx7_gen1_dep8_batch128_eplb0_mtp1_1229.yaml
  • ctx8_gen1_dep16_batch32_eplb0_mtp3_666.yaml

8k1k STP configs (7 files): ✅ Clean

  • ctx1_gen4_tep8_batch1_eplb0_mtp0_4.yaml
  • ctx1_gen4_tep8_batch4_eplb0_mtp0_24.yaml
  • ctx1_gen4_tep8_batch8_eplb0_mtp0_36.yaml
  • ctx4_gen1_dep16_batch32_eplb0_mtp0_666.yaml
  • ctx6_gen1_dep32_batch16_eplb0_mtp0_512.yaml
  • ctx7_gen1_dep16_batch64_eplb0_mtp0_1229.yaml
  • ctx7_gen1_dep8_batch256_eplb0_mtp0_2151.yaml

@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Feb 5, 2026

@claude Please update the PR description in accordance with changes introduced in this PR, as well as changes stated in perf-changelog.yaml.

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Feb 5, 2026

Claude finished @cquil11's task in 1m 0s —— View job


Updating PR Description

  • Read the changed files to understand PR changes
  • Read perf-changelog.yaml for change descriptions
  • Update the PR description with comprehensive summary

Summary

Updated the PR description with a comprehensive summary based on the changes introduced:

Key changes documented:

  • New dsr1-fp8-gb300-dynamo-trt configuration with Dynamo TRT-LLM image v0.8.1.post2
  • 1k1k sequence length: 7 MTP + 7 STP configurations (concurrencies up to 12,288)
  • 8k1k sequence length: 6 MTP + 7 STP configurations (concurrencies up to 2,151)
  • Launch script updates to support FP8 precision with correct model path selection
  • Reference to successful benchmark sweep and related PRs

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jthomson04 jthomson04 merged commit 032f9aa into main Feb 5, 2026
13 checks passed
@jthomson04 jthomson04 deleted the nv/dsr1-fp8-gb300-dynamo-trt branch February 5, 2026 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

6 participants