[Fix] Align local-eval deps with ext-prod; default skip-reference by yuanzhang-us · Pull Request #9 · NVIDIA/SOL-ExecBench

yuanzhang-us · 2026-05-22T20:30:40Z

Make local docker eval match the dependency stack used by the production sol_execbench_external image so externally-authored solutions compile and run identically.

pyproject.toml: pin torch==2.9.0 (was >=2.10.0); drop unused torchvision dep that blocked the torch downgrade. nvidia-cutlass-dsl stays at 4.4.1.
uv.lock: regenerated; torch 2.10->2.9, triton 3.6->3.5, nvidia-* libs aligned with torch 2.9.
docker/Dockerfile: add UV_HTTP_TIMEOUT=600 so cold builds don't fail on the large nvidia-cutlass-dsl-libs-cu13 wheel under the default 30s cap.
cli/main.py: rename PYTORCH_ALLOC_CONF -> PYTORCH_CUDA_ALLOC_CONF in the eval subprocess env. torch 2.9 only honors the CUDA form, so the intended expandable_segments allocator never engaged before this fix.
core/bench/config/benchmark_config.py: flip benchmark_reference default from True to False. Reference implementations can take >1h for some problems and dominate evaluation time; users that need a speedup factor can re-enable via --config.
bench_config.example.json: ship a template containing every field at its default so users have a copy-and-edit starting point.
README.md: document the BenchmarkConfig fields and the new example template.

Make local docker eval match the dependency stack used by the production sol_execbench_external image so externally-authored solutions compile and run identically. - pyproject.toml: pin torch==2.9.0 (was >=2.10.0); drop unused torchvision dep that blocked the torch downgrade. nvidia-cutlass-dsl stays at 4.4.1. - uv.lock: regenerated; torch 2.10->2.9, triton 3.6->3.5, nvidia-* libs aligned with torch 2.9. - docker/Dockerfile: add UV_HTTP_TIMEOUT=600 so cold builds don't fail on the large nvidia-cutlass-dsl-libs-cu13 wheel under the default 30s cap. - cli/main.py: rename PYTORCH_ALLOC_CONF -> PYTORCH_CUDA_ALLOC_CONF in the eval subprocess env. torch 2.9 only honors the _CUDA_ form, so the intended expandable_segments allocator never engaged before this fix. - core/bench/config/benchmark_config.py: flip benchmark_reference default from True to False. Reference implementations can take >1h for some problems and dominate evaluation time; users that need a speedup factor can re-enable via --config. - bench_config.example.json: ship a template containing every field at its default so users have a copy-and-edit starting point. - README.md: document the BenchmarkConfig fields and the new example template. Signed-off-by: Yuan Zhang <yuazhang@nvidia.com>

yuanzhang-us requested a review from samodi-nv May 22, 2026 20:30

yuanzhang-us self-assigned this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Align local-eval deps with ext-prod; default skip-reference#9

[Fix] Align local-eval deps with ext-prod; default skip-reference#9
yuanzhang-us wants to merge 1 commit into
mainfrom
fix-local-eval

yuanzhang-us commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuanzhang-us commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant