[NV] Update DSR1 GB200 FP4 Disagg Submission by jthomson04 · Pull Request #510 · SemiAnalysisAI/InferenceX

jthomson04 · 2026-01-21T17:24:14Z

This MR updates our dsr1-fp4-gb200-dynamo-trt submission. As a part of this MR, we also introduce a new way to launch Dynamo slurm jobs through srt-slurm. The new workflow for launching jobs is:

Clone and install srtslurm.
Set cluster specific information (partition, account, etc.)
Use config file specified in the CONFIG_FILE env var from nvidia-master.yaml
srtctl apply -f $CONFIG_FILE

Signed-off-by: jthomson04 <jothomson@nvidia.com>

claude · 2026-01-21T17:54:25Z

Claude finished @jthomson04's task in 1m 59s —— View job

PR Review Complete

Analyze PR context and changes
Check for master config changes and perf-changelog requirement
Review for bugs, logic errors, and blocking issues
Post summary

Summary: This PR updates the DSR1 GB200 FP4 Disagg submission and introduces a new workflow for launching Dynamo slurm jobs through srt-slurm. The changes look reasonable overall, but there are two issues to address:

🔴 BLOCKING: Master config file .github/configs/nvidia-master.yaml was modified but perf-changelog.yaml was not updated. When changing master config files, you must add a corresponding entry to perf-changelog.yaml documenting the changes.

🟡 WARNING: Hardcoded fp8 in srtctl tags at runners/launch_gb200-nv.sh:100 - should use ${PRECISION} variable instead since this is an FP4 submission.

runners/launch_gb200-nv.sh

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

functionstackx · 2026-01-21T22:08:52Z

@claude what is the progress on https://github.com/InferenceMAX/InferenceMAX/actions/runs/21219580245

how many is complete, any failure so far? can u monitor each 5 minutes till all the jobs in 21219580245 is done?

claude-code-infmax · 2026-01-21T22:09:35Z

🔄 Workflow Monitoring - Status Update #12

Workflow Run: 21219580245
Time: 2026-01-21 ~23:15 UTC

📊 Current Progress

Status	Count	Percentage
✅ Completed	18	72%
🔄 In Progress	1	4%
⏳ Queued	6	24%
❌ Failed	0	0%

Progress Bar: ████████████████░░░░░░ 72%

🔄 Currently Running

Job	Started	Duration
dsr1_8k1k gb200 dynamo-trt fp4 specdecod-mtp	22:43 UTC	~32 min

⏳ Queued Jobs (6 remaining)

dsr1_8k1k gb200 dynamo-trt fp4 specdecod-none (x3)
dsr1_1k1k gb200 dynamo-trt fp4 specdecod-none (x2)
dsr1_1k1k gb200 dynamo-trt fp4 specdecod-mtp (x1)

✅ Recently Completed

dsr1_1k1k gb200 dynamo-trt fp4 specdecod-none (22:28-22:42 UTC) ✅
dsr1_1k1k gb200 dynamo-trt fp4 specdecod-mtp (22:08-22:28 UTC) ✅
dsr1_8k1k gb200 dynamo-trt fp4 specdecod-none (21:23-22:08 UTC) ✅

Monitoring every 5 minutes until all jobs complete. Next check in 5 minutes.

…fp4-gb200-dynamo-260121

jthomson04 · 2026-01-22T17:24:44Z

We've incorporated some updates to our SGL 1k/8k submission into this MR as well. See https://github.com/InferenceMAX/InferenceMAX/pull/513.

functionstackx · 2026-01-22T18:38:57Z

@pr-claude please review. r they missing perf-changelog.yaml

claude · 2026-01-22T18:39:22Z

Claude finished @functionstackx's task in 48s —— View job

PR Review

Check if perf-changelog.yaml was updated
Review changed files for issues
Post summary

Answer: Yes, perf-changelog.yaml is missing.

.github/configs/nvidia-master.yaml was modified in this PR but perf-changelog.yaml was not updated. This was flagged as a 🔴 BLOCKING issue in my previous review.

Files changed in this PR:

.github/configs/nvidia-master.yaml (+170/-228)
benchmarks/dsr1_fp4_gb200_dynamo-trt_slurm.sh (deleted)
runners/launch_gb200-nv.sh (+122/-96)

Status of previous issues:

✅ Fixed: The hardcoded fp8 in srtctl tags is now replaced with ${PRECISION} variable
❌ Still open: perf-changelog.yaml needs to be added documenting the config changes

Required action:

Add an entry to perf-changelog.yaml documenting:

Updates to DSR1 GB200 FP4 Disagg submission configs
New srt-slurm workflow for launching Dynamo jobs
Any SGL 1k/8k submission updates (per comment about PR [nv] add 1k8k gb200 fp4 sglang configs #513)

functionstackx · 2026-01-22T20:59:47Z

@pr-claude can u draft an inline diff within the github comment suggestion on what the perf-changelog.yaml change should be?

claude · 2026-01-22T21:04:15Z

Claude finished @functionstackx's task in 39s —— View job

Drafting perf-changelog.yaml Entry

Read current perf-changelog.yaml to understand format
Read nvidia-master.yaml changes to understand what was modified
Draft suggested perf-changelog.yaml entry

Suggested `perf-changelog.yaml` Entry

Based on the changes in this PR, here's the entry you should add to perf-changelog.yaml:

- config-keys:
    - dsr1-fp4-gb200-dynamo-trt
    - dsr1-fp4-gb200-dynamo-sglang
  description:
    - "Update Dynamo TRT image from 0.5.1-rc0.pre3 to 0.8.0"
    - "Update Dynamo SGLang image from v0.5.5.post2 to dev-cu13"
    - "Refactor configurations to use CONFIG_FILE-based recipes instead of inline parameter settings"
    - "Add new 1k8k sequence length configurations for SGLang"
    - "Introduce srt-slurm workflow for launching Dynamo jobs"
  pr-link: https://github.com/InferenceMAX/InferenceMAX/pull/510

Key changes documented:

Image updates: TRT image 0.5.1-rc0.pre3 → 0.8.0, SGLang image v0.5.5.post2 → dev-cu13
Config refactor: Moved from inline PREFILL_MAX_NUM_TOKENS, DECODE_MAX_BATCH_SIZE, etc. to external recipe YAML files via CONFIG_FILE
New configs: Added 1k8k (1024x8192) sequence length configurations for dsr1-fp4-gb200-dynamo-sglang
New workflow: srt-slurm integration for job launching (mentioned in PR description)

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 · 2026-01-28T17:26:19Z

Successful SGLang run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21421141115

Signed-off-by: jthomson04 <jothomson@nvidia.com>

functionstackx · 2026-01-29T01:38:19Z

hi @jthomson04

generally the trtllm parts of this PR is fine, the sglang parts of this PR still need a bit of work which @kedarpotdar-nv and us were discussing.

if u wanna merge this PR without delay, i suggest splitting out the trtllm parts of this away from the sglang parts

+viz @kedarpotdar-nv @cquil11

Signed-off-by: jthomson04 <jothomson@nvidia.com>

…-trt-260121

.github/configs/nvidia-master.yaml

Signed-off-by: jthomson04 <jothomson@nvidia.com>

cquil11

ok. lgtm to me now. thank you

cquil11 · 2026-01-29T19:19:25Z

@jthomson04 please feel free to merge at your convenience

perf-changelog.yaml

Signed-off-by: jthomson04 <jothomson@nvidia.com>

kedarpotdar-nv · 2026-01-30T16:37:16Z

Successful sweep here: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21492255292

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 · 2026-01-30T16:50:59Z

There were some missing 1k8k configs. Not ready to merge yet; kicked off a new pipeline

functionstackx · 2026-01-30T18:24:02Z

@jthomson04 1k8k takes too long. wanna spit that out to another follow up PR & just merge 1k1k 8k1k first?

cquil11 · 2026-01-30T19:07:11Z

@jthomson04 you can also just comment out the other sequence lengths in the master config to test. might be easier that way

jthomson04 · 2026-01-30T22:12:36Z

It's halfway done now. Will wait for that to complete before merge. https://github.com/InferenceMAX/InferenceMAX/actions/runs/21523871013

functionstackx

LGTM. yolo! feel free to merge

jthomson04 and others added 3 commits January 21, 2026 09:19

Configs

fd9a39b

Signed-off-by: jthomson04 <jothomson@nvidia.com>

gb200 srt slurm

16a3764

fix sglang breaking changes

7d8c2cd

github-project-automation bot added this to InferenceMAX Board Jan 21, 2026

jthomson04 changed the title ~~Dsr1 fp4 gb200 dynamo trt 260121~~ [NV] Update DSR1 GB200 FP4 Disagg Submission Jan 21, 2026

update uv

9cfacda

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 force-pushed the dsr1-fp4-gb200-dynamo-trt-260121 branch from a365a91 to 9cfacda Compare January 21, 2026 17:37

jthomson04 marked this pull request as ready for review January 21, 2026 17:53

jthomson04 requested a review from a team as a code owner January 21, 2026 17:53

claude bot reviewed Jan 21, 2026

View reviewed changes

runners/launch_gb200-nv.sh Outdated Show resolved Hide resolved

Update runners/launch_gb200-nv.sh

6bf442f

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

ishandhanani and others added 4 commits January 21, 2026 14:09

init

76f29cb

Merge branch 'main' into dsr1-fp4-gb200-dynamo-trt-260121

b50b884

unify trtllm and sglang for testing

3b889d8

Merge remote-tracking branch 'ishan/nv/add-1k8k-gb200-fp4' into dsr1-…

467a32e

…fp4-gb200-dynamo-260121

jthomson04 mentioned this pull request Jan 22, 2026

[nv] add 1k8k gb200 fp4 sglang configs #513

Merged

ishandhanani and others added 3 commits January 21, 2026 17:30

tmp

9a67d6c

go

ea035c8

Merge remote-tracking branch 'ishan/nv/add-1k8k-gb200-fp4' into dsr1-…

6327ef5

…fp4-gb200-dynamo-260121

jthomson04 and others added 3 commits January 22, 2026 14:16

Merge branch 'main' into dsr1-fp4-gb200-dynamo-trt-260121

35489f4

perf changelog

ea465f4

Signed-off-by: jthomson04 <jothomson@nvidia.com>

default time limit to 6 hours

0643281

Signed-off-by: jthomson04 <jothomson@nvidia.com>

This was referenced Jan 28, 2026

Add DSR1 FP8 H200 Dynamo TRT-LLM configurations #570

Merged

[NV] DSR1 FP4 B200 Dynamo TRT #588

Merged

jthomson04 added 4 commits January 28, 2026 10:52

add 1k8k gb200 fp4 trtllm

0772772

comments on configs

f38831a

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Remove dsr1 gb200 dynamo trt launcher

ca2cac0

Signed-off-by: jthomson04 <jothomson@nvidia.com>

comment on launch script format

70f1425

Signed-off-by: jthomson04 <jothomson@nvidia.com>

jthomson04 and others added 3 commits January 28, 2026 20:35

nuke sglang

ef7b1a6

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Merge remote-tracking branch 'public/main' into dsr1-fp4-gb200-dynamo…

e89e5e3

…-trt-260121

Merge branch 'main' into dsr1-fp4-gb200-dynamo-trt-260121

b7f22da

cquil11 reviewed Jan 29, 2026

View reviewed changes

.github/configs/nvidia-master.yaml Outdated Show resolved Hide resolved

cquil11 and others added 4 commits January 29, 2026 16:11

Update model path for dsr1-fp4-gb200-dynamo-sglang

f716ca7

remove hardcoded model paths in top level master yaml

4f3d2bd

restore prior sgl submissions

55670a0

Signed-off-by: jthomson04 <jothomson@nvidia.com>

restore sgl

c876d90

Signed-off-by: jthomson04 <jothomson@nvidia.com>

cquil11 approved these changes Jan 29, 2026

View reviewed changes

Oseltamivir approved these changes Jan 29, 2026

View reviewed changes

perf-changelog.yaml Outdated Show resolved Hide resolved

0.8.0 -> 0.8.2

34f0c0f

Signed-off-by: jthomson04 <jothomson@nvidia.com>

Add missing 1k8k configs

7b4dfb8

Signed-off-by: jthomson04 <jothomson@nvidia.com>

functionstackx approved these changes Jan 31, 2026

View reviewed changes

jthomson04 merged commit 4e9a376 into main Jan 31, 2026
28 of 66 checks passed

jthomson04 deleted the dsr1-fp4-gb200-dynamo-trt-260121 branch January 31, 2026 04:24

github-project-automation bot moved this to Done in InferenceMAX Board Jan 31, 2026

Conversation

jthomson04 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Uh oh!

Uh oh!

functionstackx commented Jan 21, 2026

Uh oh!

claude-code-infmax bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Workflow Monitoring - Status Update #12

📊 Current Progress

🔄 Currently Running

⏳ Queued Jobs (6 remaining)

✅ Recently Completed

Uh oh!

jthomson04 commented Jan 22, 2026

Uh oh!

functionstackx commented Jan 22, 2026

Uh oh!

claude bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Files changed in this PR:

Status of previous issues:

Required action:

Uh oh!

functionstackx commented Jan 22, 2026

Uh oh!

claude bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Drafting perf-changelog.yaml Entry

Suggested perf-changelog.yaml Entry

Uh oh!

jthomson04 commented Jan 28, 2026

Uh oh!

functionstackx commented Jan 29, 2026

Uh oh!

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

cquil11 commented Jan 29, 2026

Uh oh!

Uh oh!

kedarpotdar-nv commented Jan 30, 2026

Uh oh!

jthomson04 commented Jan 30, 2026

Uh oh!

functionstackx commented Jan 30, 2026

Uh oh!

cquil11 commented Jan 30, 2026

Uh oh!

jthomson04 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jthomson04 commented Jan 21, 2026 •

edited

Loading

claude bot commented Jan 21, 2026 •

edited

Loading

claude-code-infmax bot commented Jan 21, 2026 •

edited

Loading

claude bot commented Jan 22, 2026 •

edited

Loading

claude bot commented Jan 22, 2026 •

edited

Loading

Suggested `perf-changelog.yaml` Entry

jthomson04 commented Jan 30, 2026 •

edited

Loading