bench: Migrate Python cuda.compute benchmarks to nvbench#7341
bench: Migrate Python cuda.compute benchmarks to nvbench#7341gevtushenko merged 20 commits intomainfrom
Conversation
|
@shwina I migrated one benchmark to nvbench here and added C++ equivalent. If you can review and see if this is what you were thinking i can continue with the rest. I added some instructions and doing the C++ part with cmake but let me know if you have other ideas on how to make that part better/easier. On my dev GPU (4090) I got: |
|
Thanks @danielfrg - I think this is a really great start! With regards to comparing C++ and Python benchmarks, I wonder if we can't use the existing |
NaderAlAwar
left a comment
There was a problem hiding this comment.
Thanks for working on this @danielfrg! The transform benchmark looks great. Two points:
- As @shwina pointed out, the comparison script (and most of the other analysis scripts) already exist in nvbench, so we should not copy them. I am working on releasing a pynvbench wheel, and will figure out a way to expose that functionality properly, Once the wheels are released, we should make pynvbench an optional dependency that you can add when you pip install cuda-cccl.
- We should not add the cpp version of the benchmark. The CUB benchmarks already exist under
cub/benchmarks/bench/, so in my opinion, we should try to mimic at least one benchmark from each CUB algorithm so we can compare the two properly.
I'm curious about python_vs_cpp_summary.py. Can you show me what the output would look like? Doesn't have to be from a real run, just a visualization
python/cuda_cccl/benchmarks/compute/nvbench/nvbench_transform.py
Outdated
Show resolved
Hide resolved
python/cuda_cccl/benchmarks/compute/nvbench/nvbench_transform.py
Outdated
Show resolved
Hide resolved
|
|
I see now, yeah the python vs cpp summary script should stay. I think it would be more valuable to convert the existing C++ ones, and yes |
|
Also, take a look at https://github.com/NVIDIA/cccl/blob/main/CONTRIBUTING.md#code-formatting-pre-commit-hooks to fix the pre-commit ci issue |
|
I updated the one benchmark here to be based on I had claude to analyze the APIs and give me a mapping for the transform benchmarks:
They look good to me on a quick look. Let me know what you guys prefer in terms of PRs, do we want to keep them all in a single PR? If the structure looks good now I will start with more transform benchmarks. |
|
Changed the C++ build script to use the one in the ci folder and added a single script to run the benchmarks. I added two extra benchmarks to show how that worked. The Any ideas on why this might be happening? My guess would be mostly on the numba kernels or compilation of those? |
NaderAlAwar
left a comment
There was a problem hiding this comment.
This looks great! Thanks Daniel!
| - cuda-bench[cu12]>=0.2.0 | ||
| - cuda-cccl[cu12] | ||
| - cupy-cuda12x |
There was a problem hiding this comment.
Question: is there a way to not hardcode this to cu12?
There was a problem hiding this comment.
Not 100% (i think) my plan was to have another environment.yml for cuda 13.
|
pre-commit.ci autofix |
These results are interesting, we should definitely look into them. I would assume it is something in numba, but I would have to look at the SASS to say for sure. In any case, this should not block the PR. Can you create an issue noting this? Lets maybe have an EPIC issue comparing cuda.compute performance to CUB, and have this specific instance be a subissue |
|
Agree it should not block this one. I will create an epic issue to analyse the results. |
|
It's up to you. We can merge this or you can add the other benchmarks first |
|
Ok, then I will add a couple more here! |
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
|
I think i addressed all the previous comments |
NaderAlAwar
left a comment
There was a problem hiding this comment.
Benchmarks look great now, left a few comments to clean things up. It woud be good to run the benchmarks now and compare them to the C++ benchmarks to see where we stand at this point
python/cuda_cccl/benchmarks/compute/.opencode/commands/migration-status.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
|
All done except one question i left about the --quick, i would like to keep it but we can probably refactor into a --config that allows multiple config for the benchmarks for diff scenarios? |
|
Great work @danielfrg! Could you also open up issues for |
|
/ok to test eca064b |
🥳 CI Workflow Results🟩 Finished in 1h 26m: Pass: 100%/48 | Total: 15h 13m | Max: 57m 59sSee results here. |

Description
closes #7317
Migrating the Python
cuda.computebenchmarks from pytest to nvbench.