[AMD]: fix AITER flags for vllm v0.14.0 docker image#535
[AMD]: fix AITER flags for vllm v0.14.0 docker image#535functionstackx merged 5 commits intoSemiAnalysisAI:mainfrom
Conversation
|
thanks @Rohan138 can u add this to https://github.com/InferenceMAX/InferenceMAX/blob/main/perf-changelog.yaml and once ur PR is ready enable the additionally can u document the recommended flags in https://github.com/vllm-project/recipes/tree/main Klaud reasonable reads the |
|
Thanks @functionstackx, can you add the |
|
hmm @cquil11 why did the sweeper bug out? |
|
@Rohan138 since this is ur ROCm fork, i cant rebase myself. can u rebase again to include this commit? InferenceMAX/InferenceMAX@38546bb |
|
@cquil11 the github action still doesnt work on forks? should we just manually import this to run the github action? |
|
/sweep test-config --config-keys gptoss-fp4-mi300x-vllm gptoss-fp4-mi325x-vllm --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml |
|
@cquil11 Kicking off a sweep. Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/21367875017 |
|
@functionstackx @Rohan138 the above command accomplishes the same sweep test |
|
validated pass lgtm https://github.com/InferenceMAX/InferenceMAX/actions/runs/21367875017 @Rohan138 please update |
* fix AITER flags for v0.14.0 release * drop mi325 triton gemm env var * Add changes to perf changelog
#496 updated the MI300/MI325 docker to the recent upstream
v0.14.0release, without updating the corresponding environment variables for AITER. The runs in #496 are using the default vLLM Triton attention kernel instead of the recommended AITER unified attention backend for gpt-oss.