Conversation
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
@claude please fix this PR, seems like typo in |
|
Claude finished @functionstackx's task in 49s —— View job Fixing YAML Syntax Error in perf-changelog.yaml
Summary: Fixed YAML syntax error at line 329 where Commit: |
Add missing space after dash in list item at line 329 (-config-keys: -> - config-keys:) Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
|
@claude in .github/configs/nvidia-master.yaml, there are scenarios with corresponding links to external configs that contain more detailed information. can you please get the configs from the links and double check that the fields (such as prefill/decode ep/tp, num workers, etc) that are also present in this repo's config match? |
|
Claude finished @cquil11's task in 3m 25s —— View job Verifying External Config Links
Results SummaryI verified all 40 GB300-FP4 external configs against the local nvidia-master.yaml configuration. Found 1 Mismatch
Location: The external config specifies decode TP/EP of 4 (as indicated by # Local config (lines 3098-3111) - INCORRECT
- conc-list: [5, 15, 30]
prefill:
num-worker: 1
tp: 2
ep: 2
dp-attn: true
additional-settings:
# https://github.com/ishandhanani/srt-slurm/blob/sa-submission-q1-2026/recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml
- "CONFIG_FILE=recipes/trtllm/gb300-fp4/8k1k/stp/ctx1_gen5_tep4_batch4_eplb0_mtp0.yaml"
decode:
num-worker: 5
tp: 8 # Should be 4
ep: 8 # Should be 4
dp-attn: falseAll Other Configs Match ✅The remaining 39 GB300-FP4 configs all have matching values:
Would you like me to fix this mismatch? I can update the decode TP/EP from 8 to 4 at lines 3109-3110. |
|
@claude please make the perf changelog description and PR description slightly more detailed. don't list every single thing per se, but give a high level of the scenarios added. be more detailed in the PR description (make a table of scenarios, something like https://github.com/InferenceMAX/InferenceMAX/pull/582 description) |
|
Claude finished @cquil11's task in 1m 13s —— View job Updating PR and Changelog Descriptions
Summary: Updated the perf-changelog.yaml with a more detailed description covering the image version, sequence length configs (1k1k, 1k8k, 8k1k), and srt-slurm integration. Also updated the PR description with a comprehensive table of all scenarios showing ISL/OSL, mode (MTP/STP), prefill/decode worker counts, TP/EP settings, and concurrency levels. Commit: |
Add more detailed description including image version, sequence length coverage (1k1k, 1k8k, 8k1k), and srt-slurm integration details. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
cquil11
left a comment
There was a problem hiding this comment.
lgtm after this run completes successfully https://github.com/InferenceMAX/InferenceMAX/actions/runs/21611528513
Summary
Add DeepSeek-R1 FP4 GB300 Dynamo TRT disaggregated multinode benchmark configurations, sourced from srtslurm recipes.
Changes
dsr1-fp4-gb300-dynamo-trtconfig to nvidia-master.yamlgb300-nv_0runner to runners.yamllaunch_gb300-nv.shscript for srt-slurm integrationConfig Details
Notes:
nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post2nvidia/DeepSeek-R1-0528-NVFP4-v2