Conversation
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
|
@jthomson04 @csahithi can one of y'all please re-approve this? we had to revert the original merge and re-run because the name of the directories on the clusters changed so the runs failed edit: there seem to be some other unrelated errors having to do with image download? flakes? https://github.com/InferenceMAX/InferenceMAX/actions/runs/21412751024/job/61653699072 |
runners/launch_b300-nv.sh
Outdated
|
|
||
| git clone https://github.com/ishandhanani/srt-slurm.git "$SRT_REPO_DIR" | ||
| cd "$SRT_REPO_DIR" | ||
| git checkout b4abe4643a7009f3539b36bdc508408874a4c930 |
There was a problem hiding this comment.
There was a problem hiding this comment.
and this appears to be possibly causing errors indirectly, since new commits have potentially been made to the main branch which introduced breaking changes?
https://github.com/InferenceMAX/InferenceMAX/actions/runs/21412751024/job/61653699014#step:4:893
|
Ah, we recently did a big rebase and updated our configs. I'll push a fix shortly |
Signed-off-by: jthomson04 <jothomson@nvidia.com>
6b7f7d9 to
d4bc5f0
Compare
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Signed-off-by: jthomson04 <jothomson@nvidia.com>
Re-do of https://github.com/InferenceMAX/InferenceMAX/pull/532