[NV] Update DSR1 GB200 FP4 Disagg Submission#510
Conversation
a365a91 to
9cfacda
Compare
|
Claude finished @jthomson04's task in 1m 59s —— View job PR Review Complete
Summary: This PR updates the DSR1 GB200 FP4 Disagg submission and introduces a new workflow for launching Dynamo slurm jobs through srt-slurm. The changes look reasonable overall, but there are two issues to address: 🔴 BLOCKING: Master config file 🟡 WARNING: Hardcoded |
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
|
@claude what is the progress on https://github.com/InferenceMAX/InferenceMAX/actions/runs/21219580245 how many is complete, any failure so far? can u monitor each 5 minutes till all the jobs in |
🔄 Workflow Monitoring - Status Update #12Workflow Run: 21219580245 📊 Current Progress
Progress Bar: 🔄 Currently Running
⏳ Queued Jobs (6 remaining)
✅ Recently Completed
Monitoring every 5 minutes until all jobs complete. Next check in 5 minutes. |
|
We've incorporated some updates to our SGL 1k/8k submission into this MR as well. See https://github.com/InferenceMAX/InferenceMAX/pull/513. |
|
@pr-claude please review. r they missing perf-changelog.yaml |
|
Claude finished @functionstackx's task in 48s —— View job PR Review
Answer: Yes,
Files changed in this PR:
Status of previous issues:
Required action:Add an entry to
|
|
@pr-claude can u draft an inline diff within the github comment suggestion on what the perf-changelog.yaml change should be? |
|
Claude finished @functionstackx's task in 39s —— View job Drafting perf-changelog.yaml Entry
Suggested
|
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
Successful SGLang run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21421141115 |
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
hi @jthomson04 generally the trtllm parts of this PR is fine, the sglang parts of this PR still need a bit of work which @kedarpotdar-nv and us were discussing. if u wanna merge this PR without delay, i suggest splitting out the trtllm parts of this away from the sglang parts +viz @kedarpotdar-nv @cquil11 |
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
cquil11
left a comment
There was a problem hiding this comment.
ok. lgtm to me now. thank you
|
@jthomson04 please feel free to merge at your convenience |
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
Successful sweep here: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21492255292 |
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
There were some missing 1k8k configs. Not ready to merge yet; kicked off a new pipeline |
|
@jthomson04 1k8k takes too long. wanna spit that out to another follow up PR & just merge 1k1k 8k1k first? |
|
@jthomson04 you can also just comment out the other sequence lengths in the master config to test. might be easier that way |
|
It's halfway done now. Will wait for that to complete before merge. https://github.com/InferenceMAX/InferenceMAX/actions/runs/21523871013 |
functionstackx
left a comment
There was a problem hiding this comment.
LGTM. yolo! feel free to merge
This MR updates our dsr1-fp4-gb200-dynamo-trt submission. As a part of this MR, we also introduce a new way to launch Dynamo slurm jobs through srt-slurm. The new workflow for launching jobs is:
CONFIG_FILEenv var fromnvidia-master.yamlsrtctl apply -f $CONFIG_FILE