GPU implementation improvements #718

iomaganaris · 2021-12-16T10:24:56Z

Set nwarp to very big number for optimal parallelization and improve a bit grid config of CUDA solve_interleaved2

Description

After profiling of various configurations of the channel-benchmark we figured out that the optimal number of nwarp is the number of cells to achieve the best parallelization and more coalesced memory access in the solve_interleaved2 kernel and the current updates
Improved (?) the way the CUDA implementation of solve_interleave2 is launched

…a bit grid config of CUDA solve_interleaved2

coreneuron/permute/cellorder.cu

bbpbuildbot · 2021-12-16T11:09:09Z

Logfiles from GitLab pipeline #29516 (:white_check_mark:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-16T13:30:28Z

Logfiles from GitLab pipeline #29532 (:white_check_mark:) have been uploaded here!

Status and direct links:

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (#693, #704, #705, #707, #708, #716, #719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (#702, #703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (#698, #717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]>

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c

Set nwarp to very big number for optimal parallelization and improve …

37e0836

…a bit grid config of CUDA solve_interleaved2

iomaganaris requested review from kotsaloscv and olupton December 16, 2021 10:24

kotsaloscv approved these changes Dec 16, 2021

View reviewed changes

coreneuron/permute/cellorder.cu Outdated Show resolved Hide resolved

Corrected comment about threadsPerBlock

06e7105

iomaganaris merged commit d03c45f into hackathon_main Dec 17, 2021

iomaganaris deleted the magkanar/nwarp branch December 17, 2021 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU implementation improvements #718

GPU implementation improvements #718

Uh oh!

iomaganaris commented Dec 16, 2021

Uh oh!

Uh oh!

bbpbuildbot commented Dec 16, 2021

Uh oh!

bbpbuildbot commented Dec 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GPU implementation improvements #718

GPU implementation improvements #718

Uh oh!

Conversation

iomaganaris commented Dec 16, 2021

Uh oh!

Uh oh!

bbpbuildbot commented Dec 16, 2021

Uh oh!

bbpbuildbot commented Dec 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants