-
Notifications
You must be signed in to change notification settings - Fork 41
Implement fast_imem calculation on GPU. #574
Conversation
da0f33d to
30845dd
Compare
|
Retest this please Jenkins. |
Updates mod2c/nmodl submodule commits to include relevant fixes, BlueBrain/mod2c#64 and BlueBrain/nmodl#681. Closes #197.
30845dd to
3dd6780
Compare
pramodk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| so here we transform so it only has membrane current contribution | ||
| */ | ||
| double* p = _nt->nrn_fast_imem->nrn_sav_d; | ||
| #pragma acc parallel loop present(p, vec_d) if (_nt->compute_gpu) async(_nt->stream_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is launched via async but I believe the synchronisation via acc wait is not immediately required (there should be top level acc wait already). cc: @iomaganaris
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my conclusion too; let's wait for @iomaganaris's review though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC stream_id should always be 0 unless we use openmp threading, which in the normal case we don't.
More details here:
CoreNeuron/coreneuron/io/phase2.cpp
Line 843 in f61ec32
| nt.stream_id = 0; |
CoreNeuron/coreneuron/io/phase2.cpp
Line 854 in f61ec32
| nt.stream_id = omp_get_thread_num(); |
However since there is still this case I think that the
async(stream_id) is still needed
|
I just checked that with NMODL+GPU+the current version of this branch then I get as expected (BlueBrain/nmodl#675, BlueBrain/nmodl#678). Because the tests are not included in the CI yet, I think it's fine to merge this without waiting for fixes for those issues. |
|
thanks! good to merge! |
iomaganaris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of this! The PR LGTM as well 👍
This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.
| acc_memcpy_to_device(&(d_nrb->_displ), &d_displ, sizeof(int*)); | ||
|
|
||
| d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * (nrb->_size + 1)); | ||
| d_nrb_index = (int*) acc_copyin(nrb->_nrb_index, sizeof(int) * nrb->_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly explain what is behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is to make it match
CoreNeuron/coreneuron/io/phase2.cpp
Line 461 in 2c51992
| nrb->_nrb_index = (int*) ecalloc_align(nrb->_size, sizeof(int)); |
without the change then
acc_copyin reads 4 undefined bytes at the end (Valgrind complained).
This commit updates the CoreNEURON submodule commit to include BlueBrain/CoreNeuron#574, which fixes BlueBrain/CoreNeuron#197 by adding support for fast_imem computation on GPU. This means that various workarounds can be removed from the NEURON test configuration.
* Avoids crashing without --mpi in an MPI build. * Fix off-by-one error in _nrb_index size. * Consistently pad the size of the `pdata` block. * Updates mod2c/nmodl submodule commits to include relevant fixes. CoreNEURON Repo SHA: BlueBrain/CoreNeuron@2c51992
Description
This PR adds support for copying the
fast_imemdata structures to/from a compute device/GPU and adds OpenACC#pragmas to offload computations on those data. This closes #197.This will allow us to improve the coverage of the NEURON tests, where some checks are disabled because of this issue. It is also necessary to get the test suite running with CoreNEURON+NMODL+GPU, because NMODL's generated code crashes if the data structures are not copied to the GPU.
Some fixes were also needed to the
.modfile translation to make the NEURON tests pass, so this changes the submodule commits to include BlueBrain/mod2c#64 and BlueBrain/nmodl#681.This PR also includes fixes for two memory errors found with Valgrind/memcheck, and a workaround so that not passing
--mpito an MPI build does not cause a crash.How to test this?
Build NEURON with CoreNEURON, NMODL and GPU support enabled and run the tests:
without this PR, some tests will fail with
(with an NVIDIA GPU), with this PR the only failures should be in the
testcorenrn_gfandtestcorenrn_watchtests, which will be fixed separately.Test System
Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=olupton/gpu-fast-imem,