[NV] DSR1 FP8 GB300 Dynamo TRT (8k1k and 1k1k)#627
Conversation
|
Claude finished @jthomson04's task in 1m 15s —— View job PR Review Complete
SummaryThis PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths. Found 1 blocking issue: 🔴 Model path uses local Validations passed:
|
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
9acc0e8 to
504fd22
Compare
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
Successful sweep - https://github.com/InferenceMAX/InferenceMAX/actions/runs/21697417250 |
|
@claude In the .github/configs/nvidia-master.yaml file, please go through all added downstream config files (the links) and ensure there is no system specific config settings or hardcoded paths to cache artifacts, etc. |
|
Claude finished @cquil11's task in 2m 8s —— View job Reviewing Downstream Config Files for System-Specific Settings
SummaryReviewed all 27 downstream config files from the ✅ No Blocking Issues FoundGood news: None of the downstream config files contain hardcoded system paths like
Environment-Specific Configurations (Expected/Acceptable)The configs do contain hardware-specific settings that are expected and appropriate for this use case:
These are not problematic - they are intentional hardware-specific optimizations that define the benchmark configuration. Files Reviewed1k1k MTP configs (7 files): ✅ Clean
1k1k STP configs (7 files): ✅ Clean
8k1k MTP configs (6 files): ✅ Clean
8k1k STP configs (7 files): ✅ Clean
|
|
@claude Please update the PR description in accordance with changes introduced in this PR, as well as changes stated in perf-changelog.yaml. |
|
Claude finished @cquil11's task in 1m 0s —— View job Updating PR Description
SummaryUpdated the PR description with a comprehensive summary based on the changes introduced: Key changes documented:
|
Summary
This PR adds DeepSeek R1 FP8 GB300 Dynamo TRT-LLM disaggregated multinode configurations for 8k1k and 1k1k sequence lengths.
Changes
New Configuration:
dsr1-fp8-gb300-dynamo-trtnvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2deepseek-ai/DeepSeek-R1-0528Sequence Length Configurations
1k1k (ISL=1024, OSL=1024)
8k1k (ISL=8192, OSL=1024)
Launch Script Updates (
runners/launch_gb300-nv.sh)/raid/shared/models/deepseek-r1-0528-fp4-v2/raid/shared/models/deepseek-r1-0528SRT_SLURM_MODEL_PREFIXvariable to correctly map model aliases in srt-slurmBenchmark Verification
Related PRs