Add memory pool for Random123 streams. #702

olupton · 2021-12-01T13:17:07Z

This speeds up initialisation when running on GPU if Boost is available.

Previously many small Random123 stream objects were allocated separately using (ultimately) cudaMallocManaged in GPU builds. This is very slow, and makes setup on GPU much slower than on CPU.

This change places a pool allocator "in front of" cudaMallocManaged, which both makes allocation faster and (hopefully) reduces the number of unified memory page faults during simulation.

In a small channel-benchmark-based test this makes model setup 3x faster.

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,

This speeds up initialisation when running on GPU.

coreneuron/utils/randoms/nrnran123.cu

pramodk

👎 boost

bbpbuildbot · 2021-12-01T13:38:07Z

Logfiles from GitLab pipeline #27488 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2021-12-01T17:39:58Z

Logfiles from GitLab pipeline #27530 (:white_check_mark:) have been uploaded here!

Status and direct links:

pramodk

LGTM

pramodk · 2021-12-02T00:15:49Z

As reported on Hackathon slack, this fails with:

[ 87%] Building CXX object coreneuron/CMakeFiles/coreneuron.dir/mpi/core/nrnmpidec.cpp.o
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(72): error: identifier "nrnran123_State" is undefined

/autofs/nccsopen-svm1_sw/ascent/gcc/6.4.0/include/c++/6.4.0/bits/unique_ptr.h(171): error: no instance of constructor "std::tuple<_T1, _T2>::tuple [with _T1=<error-type>, _T2=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]" matches the argument list
          detected during instantiation of "std::unique_ptr<_Tp, _Dp>::unique_ptr(std::unique_ptr<_Tp, _Dp>::pointer) [with _Tp=coreneuron::nrnran123_State, _Dp=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]"
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(292): here

/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/memory.h(90): error: no instance of function template "std::allocator_traits<_Alloc>::destroy [with _Alloc=<unnamed>::random123_allocator]" matches the argument list
            argument types are: (<unnamed>::random123_allocator, <error-type>)
          detected during:
            instantiation of "void coreneuron::alloc_deleter<Alloc>::operator()(coreneuron::alloc_deleter<Alloc>::pointer) const [with Alloc=<unnamed>::random123_allocator]"
/autofs/nccsopen-svm1_sw/ascent/gcc/6.4.0/include/c++/6.4.0/bits/unique_ptr.h(239): here
            instantiation of "std::unique_ptr<_Tp, _Dp>::~unique_ptr() [with _Tp=coreneuron::nrnran123_State, _Dp=coreneuron::alloc_deleter<<unnamed>::random123_allocator>]"
/ccsopen/home/PCARRIER/NEURON/CoreNeuron/coreneuron/utils/randoms/nrnran123.cu(292): here

3 errors detected in the compilation of "/tmp/tmpxft_0000e191_00000000-6_nrnran123.cpp1.ii".
gmake[2]: *** [coreneuron/CMakeFiles/coreneuron.dir/utils/randoms/nrnran123.cu.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
[ 88%] Linking CXX static library ../lib/libscopmath.a
[ 88%] Built target scopmath
gmake[1]: *** [coreneuron/CMakeFiles/coreneuron.dir/all] Error 2
gmake: *** [all] Error 2

This was a silly bug in #702.

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (#693, #704, #705, #707, #708, #716, #719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (#700, #710, #718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (#702, #703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (#698, #717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]>

Summary of changes: - Support OpenMP target offload when NMODL and GPU support are enabled. (BlueBrain/CoreNeuron#693, BlueBrain/CoreNeuron#704, BlueBrain/CoreNeuron#705, BlueBrain/CoreNeuron#707, BlueBrain/CoreNeuron#708, BlueBrain/CoreNeuron#716, BlueBrain/CoreNeuron#719) - Use sensible defaults for the --nwarp parameter, improving the performance of the Hines solver with --cell-permute=2 on GPU. (BlueBrain/CoreNeuron#700, BlueBrain/CoreNeuron#710, BlueBrain/CoreNeuron#718) - Use a Boost memory pool, if Boost is available, to reduce the number of independent CUDA unified memory allocations used for Random123 stream objects. This speeds up initialisation of models using Random123, and also makes it feasible to use NSight Compute on models using Random123 and for NSight Systems to profile initialisation. (BlueBrain/CoreNeuron#702, BlueBrain/CoreNeuron#703) - Use -cuda when compiling with NVHPC and OpenACC or OpenMP, as recommended on the NVIDIA forums. (BlueBrain/CoreNeuron#721) - Do not compile for compute capability 6.0 by default, as this is not supported by NVHPC with OpenMP target offload. - Add new GitLab CI tests so we test CoreNEURON + NMODL with both OpenACC and OpenMP. (BlueBrain/CoreNeuron#698, BlueBrain/CoreNeuron#717) - Add CUDA runtime header search path explicitly, so we don't rely on it being implicit in our NVHPC localrc. - Cleanup unused code. (BlueBrain/CoreNeuron#711) Co-authored-by: Pramod Kumbhar <[email protected]> Co-authored-by: Ioannis Magkanaris <[email protected]> Co-authored-by: Christos Kotsalos <[email protected]> Co-authored-by: Nicolas Cornu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@423ae6c

Add memory pool for Random123 streams.

3490bd0

This speeds up initialisation when running on GPU.

olupton mentioned this pull request Dec 1, 2021

Efficient setup of random123 streams on GPU #587

Open

pramodk reviewed Dec 1, 2021

View reviewed changes

coreneuron/utils/randoms/nrnran123.cu Outdated Show resolved Hide resolved

pramodk suggested changes Dec 1, 2021

View reviewed changes

Make Boost optional.

8688e8a

olupton requested a review from pramodk December 1, 2021 17:11

pramodk approved these changes Dec 1, 2021

View reviewed changes

pramodk merged commit a8bb716 into hackathon_main Dec 1, 2021

pramodk deleted the olupton/faster-random123-gpu-initialisation branch December 1, 2021 20:05

olupton mentioned this pull request Dec 2, 2021

Fix Boost-free compilation. #703

Merged

olupton added a commit that referenced this pull request Dec 2, 2021

Fix Boost-free compilation. (#703)

9649814

This was a silly bug in #702.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add memory pool for Random123 streams. #702

Add memory pool for Random123 streams. #702

Uh oh!

olupton commented Dec 1, 2021 •

edited

Loading

Uh oh!

Uh oh!

pramodk left a comment

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

pramodk left a comment

Uh oh!

pramodk commented Dec 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add memory pool for Random123 streams. #702

Add memory pool for Random123 streams. #702

Uh oh!

Conversation

olupton commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pramodk left a comment

Choose a reason for hiding this comment

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

bbpbuildbot commented Dec 1, 2021

Uh oh!

pramodk left a comment

Choose a reason for hiding this comment

Uh oh!

pramodk commented Dec 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

olupton commented Dec 1, 2021 •

edited

Loading