Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Jul 29, 2021

Description
This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the __managed__ keyword. Now the handling of Random123 state is more explicit for host/device. This was only added in #595, but the issue wasn't noticed at the time because the GPU-enabled CI only executes tests on machines with GPUs.

Also add a pair of helper functions coreneuron::[de]allocate_unified() that wrap cudaMallocManaged() in GPU builds if --gpu was passed at runtime and fall back to new/delete otherwise, and a method coreneuron::unified_memory_enabled() that queries whether this condition is met.

Additionally add a C++ allocator template coreneuron::unified_allocator<T> that wraps these functions, and a templated coreneuron::alloc_deleter<T> for use with std::unique_ptr<T, D>. Also add coreneuron::allocate_unique helper from SO.

Cleanup Random123 code by dropping an unused nrnran123_mutconstruct() method.

Tweak compilation scripts to allow for circular dependencies between libcoreneuron.a and libcudacoreneuron.a.
In future these should just be merged into a single library.

This addresses #599 (comment). #599 should stay open because various OpenACC calls are still not conditional on --gpu.

How to test this?
Try running a GPU-built special-core without --gpu on a machine that does not have an NVIDIA GPU.

Test System

  • OS: BB5
  • Compiler: NVHPC 21.7 / CUDA 11.0 / GCC 9.3
  • Version: master
  • Backend: GPU/CPU

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,

@olupton olupton requested a review from iomaganaris July 29, 2021 14:20
@olupton
Copy link
Contributor Author

olupton commented Jul 30, 2021

This is blocked by #607.

@olupton olupton marked this pull request as draft July 30, 2021 14:21
@olupton olupton force-pushed the olupton/gpu-without-gpu branch from ee3feb6 to f439456 Compare August 13, 2021 12:13
This fixes CPU execution of GPU builds on machines that do not have
GPUs, which were previously segfaulting due to the use of the
__managed__ keyword. Now the handling of Random123 state is more
explicit for host/device.

Also add a pair of helper functions coreneuron::[de]allocate_unified()
that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime
and fall back to new/delete otherwise, and a method
coreneuron::unified_memory_enabled() that queries whether this condition
is met.

Additionally add a C++ allocator template coreneuron::unified_allocator<T>
that wraps these functions, a templated coreneuron::alloc_deleter<T> for use
with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...).

Cleanup Random123 code by dropping an unused nrnran123_mutconstruct
method.

Tweak compilation/CMake scripts to remove libcudacoreneuron.a and
instead build CUDA sources inside libcoreneuron.a. This sidesteps
circular dependency issues that would otherwise be introduced by this
commit.

Modify CMake so `clang-format` target formats CUDA (.cu) files too.
@olupton olupton force-pushed the olupton/gpu-without-gpu branch from f439456 to 43b595a Compare August 13, 2021 12:51
@olupton olupton marked this pull request as ready for review August 13, 2021 12:53
Copy link
Contributor

@ferdonline ferdonline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'm not the best person to review this code. Maybe let's wait for a review from @iomaganaris

Copy link
Contributor

@kotsaloscv kotsaloscv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nrnran123_setseq(s, 0, 0);
{
// TODO: can I assert something useful about the instance count going
// back to zero anywhere? Or that it is zero when some operations happen?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in previous comment.

@nrnhines : How the ran123 streams allocated with nrnran123_newstream3 inside bbcore_read() should be allocated? any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.

@olupton olupton force-pushed the olupton/gpu-without-gpu branch from a421fa5 to 899fffa Compare August 16, 2021 12:53
Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nrnran123_setseq(s, 0, 0);
{
// TODO: can I assert something useful about the instance count going
// back to zero anywhere? Or that it is zero when some operations happen?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.

@olupton olupton merged commit ac2fa3b into master Aug 16, 2021
@olupton olupton deleted the olupton/gpu-without-gpu branch August 16, 2021 14:42
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
…n#606)

This fixes CPU execution of GPU builds on machines that do not have
GPUs, which were previously segfaulting due to the use of the
__managed__ keyword. Now the handling of Random123 state is more
explicit for host/device.

Also add a pair of helper functions coreneuron::[de]allocate_unified()
that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime
and fall back to new/delete otherwise, and a method
coreneuron::unified_memory_enabled() that queries whether this condition
is met.

Additionally add a C++ allocator template coreneuron::unified_allocator<T>
that wraps these functions, a templated coreneuron::alloc_deleter<T> for use
with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...).

Cleanup Random123 code by dropping an unused nrnran123_mutconstruct
method.

Tweak compilation/CMake scripts to remove libcudacoreneuron.a and
instead build CUDA sources inside libcoreneuron.a. This sidesteps
circular dependency issues that would otherwise be introduced by this
commit.

Modify CMake so `clang-format` target formats CUDA (.cu) files too.

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@ac2fa3b
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants