-
Notifications
You must be signed in to change notification settings - Fork 41
Do not use __managed__ in GPU builds. Add utils. #606
Conversation
|
Logfiles from GitLab pipeline #12242 (:no_entry:) have been uploaded here! Status and direct links:
|
|
Logfiles from GitLab pipeline #12246 (:no_entry:) have been uploaded here! Status and direct links:
|
|
This is blocked by #607. |
ee3feb6 to
f439456
Compare
This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the __managed__ keyword. Now the handling of Random123 state is more explicit for host/device. Also add a pair of helper functions coreneuron::[de]allocate_unified() that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime and fall back to new/delete otherwise, and a method coreneuron::unified_memory_enabled() that queries whether this condition is met. Additionally add a C++ allocator template coreneuron::unified_allocator<T> that wraps these functions, a templated coreneuron::alloc_deleter<T> for use with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...). Cleanup Random123 code by dropping an unused nrnran123_mutconstruct method. Tweak compilation/CMake scripts to remove libcudacoreneuron.a and instead build CUDA sources inside libcoreneuron.a. This sidesteps circular dependency issues that would otherwise be introduced by this commit. Modify CMake so `clang-format` target formats CUDA (.cu) files too.
f439456 to
43b595a
Compare
|
Logfiles from GitLab pipeline #13543 (:white_check_mark:) have been uploaded here! Status and direct links:
|
ferdonline
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'm not the best person to review this code. Maybe let's wait for a review from @iomaganaris
kotsaloscv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| nrnran123_setseq(s, 0, 0); | ||
| { | ||
| // TODO: can I assert something useful about the instance count going | ||
| // back to zero anywhere? Or that it is zero when some operations happen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered in previous comment.
@nrnhines : How the ran123 streams allocated with nrnran123_newstream3 inside bbcore_read() should be allocated? any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.
|
Logfiles from GitLab pipeline #13648 (:white_check_mark:) have been uploaded here! Status and direct links:
|
a421fa5 to
899fffa
Compare
|
Logfiles from GitLab pipeline #13680 (:white_check_mark:) have been uploaded here! Status and direct links:
|
pramodk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| nrnran123_setseq(s, 0, 0); | ||
| { | ||
| // TODO: can I assert something useful about the instance count going | ||
| // back to zero anywhere? Or that it is zero when some operations happen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.
…n#606) This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the __managed__ keyword. Now the handling of Random123 state is more explicit for host/device. Also add a pair of helper functions coreneuron::[de]allocate_unified() that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime and fall back to new/delete otherwise, and a method coreneuron::unified_memory_enabled() that queries whether this condition is met. Additionally add a C++ allocator template coreneuron::unified_allocator<T> that wraps these functions, a templated coreneuron::alloc_deleter<T> for use with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...). Cleanup Random123 code by dropping an unused nrnran123_mutconstruct method. Tweak compilation/CMake scripts to remove libcudacoreneuron.a and instead build CUDA sources inside libcoreneuron.a. This sidesteps circular dependency issues that would otherwise be introduced by this commit. Modify CMake so `clang-format` target formats CUDA (.cu) files too. CoreNEURON Repo SHA: BlueBrain/CoreNeuron@ac2fa3b
Description
This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the
__managed__keyword. Now the handling of Random123 state is more explicit for host/device. This was only added in #595, but the issue wasn't noticed at the time because the GPU-enabled CI only executes tests on machines with GPUs.Also add a pair of helper functions
coreneuron::[de]allocate_unified()that wrapcudaMallocManaged()in GPU builds if--gpuwas passed at runtime and fall back tonew/deleteotherwise, and a methodcoreneuron::unified_memory_enabled()that queries whether this condition is met.Additionally add a C++ allocator template
coreneuron::unified_allocator<T>that wraps these functions, and a templatedcoreneuron::alloc_deleter<T>for use withstd::unique_ptr<T, D>. Also addcoreneuron::allocate_uniquehelper from SO.Cleanup Random123 code by dropping an unused
nrnran123_mutconstruct()method.Tweak compilation scripts to allow for circular dependencies between
libcoreneuron.aandlibcudacoreneuron.a.In future these should just be merged into a single library.
This addresses #599 (comment). #599 should stay open because various OpenACC calls are still not conditional on
--gpu.How to test this?
Try running a GPU-built
special-corewithout--gpuon a machine that does not have an NVIDIA GPU.Test System
Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=master,