Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@olupton
Copy link
Contributor

@olupton olupton commented Aug 12, 2021

Description
This changeset changes the CMake configuration of GPU builds to:

These changes force compilation to proceed slightly differently to a standard mixed CUDA/C++ project. Instead of allowing CMake to generate an explicit device linker step, i.e.

nvcc -dc -o foo.cu.o foo.cu ...
nvcc -dlink -o dlink.o *.cu.o ...
ar qc libcudastuff.a *.cu.o dlink.o

before compiling OpenACC/GPU-enabled C++ code (which emits additional device code):

nvc++ -acc -gpu=cuda11.0,cc70 -o main main.cpp -lcudastuff ...

we instead let nvc++ do the device code linking itself, i.e. something more like

nvc++ -acc -gpu=cuda11.0,cc70 -o main.cpp.o -c main.cpp
nvc++ -acc -gpu=cuda11.0,cc70 -cuda -o main main.cpp.o *.cu.o # -cuda but no dlink.o!

which seems to give correct results. This is a bit tortuous in CMake presumably because the same pattern would fail with a GPU-unaware C++ compiler, e.g. GCC or Clang. The hypothesis is that the old way of doing things fell foul of

It is possible to do multiple device links within a single host executable, as long as each device link is independent of the other. This requirement of independence means that they cannot share code across device executables, nor can they share addresses (e.g., a device function address can be passed from host to device for a callback only if the device link sees both the caller and potential callback callee; you cannot pass an address from one device executable to another, as those are separate address spaces).

from the CUDA documentation, while the new way ensures there is a single device link step including both the code generated from .cu files and that from OpenACC regions.

We now also prefer to dynamically link the CUDA runtime, libcudart.so. nvc++ -cuda seems to prefer this and only allows it to be steered by the -static-nvidia option, which would also statically link the OpenACC runtime (which has always been dynamically linked). Setting CMAKE_CUDA_RUNTIME_LIBRARY=Shared stops CMake from emitting -lcudart_static, which causes segfaults at teardown in combination with -cuda's dynamic linking.

cc: @kotsaloscv

Closes #520. Closes #607.

How to test this?
Follow instructions in #607 to test the link/global state issue.

Test System

  • OS: BB5
  • Compiler: NVHPC 21.7 / CUDA 11.0
  • Version: master
  • Backend: GPU

Use certain branches for the SimulationStack CI

CI_BRANCHES:NEURON_BRANCH=master,

Make sure device code linking only happens once, rather than linking
explicit CUDA code earlier and linking OpenACC device code later.

CUDA has first class language support in all recent CMake versions, so
remove find_package(CUDA) and retire the deprecated `cuda_add_library`
function. Retire the CORENRN_GPU_CUDA_COMPUTE_CAPABILITY CMake variable
and use the standard CMAKE_CUDA_ARCHTECTURES one instead.
This avoids __CUDACC__ and CUDA features being enabled when compiling
.cpp files, which breaks assumptions (but might be fine in the long
run). Also prefix some preprocessor macro names with CORENEURON_.
@olupton olupton force-pushed the olupton/modernise-cuda branch from 505b7cd to 8e6ff6b Compare August 12, 2021 12:22
@olupton olupton closed this Aug 12, 2021
@olupton olupton reopened this Aug 12, 2021
Drop other -D argument that is now injected via the `localrc` file of
BB5's PGI/NVHPC compiler installations.
@olupton olupton marked this pull request as ready for review August 13, 2021 07:46
@olupton
Copy link
Contributor Author

olupton commented Aug 13, 2021

We should merge BlueBrain/spack#1249 immediately before merging this.

@alexsavulescu
Copy link
Contributor

alexsavulescu commented Aug 13, 2021

Please retest

pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
* clang-format-12 support in hpc-coding-conventions.

* Fix CUDA/GPU linking and avoid deprecated CMake.

Make sure device code linking only happens once, rather than linking
explicit CUDA code earlier and linking OpenACC device code later.

CUDA has first class language support in all recent CMake versions, so
remove find_package(CUDA) and retire the deprecated `cuda_add_library`
function. Retire the CORENRN_GPU_CUDA_COMPUTE_CAPABILITY CMake variable
and use the standard CMAKE_CUDA_ARCHTECTURES one instead.

* Only pass -cuda when linking.

This avoids __CUDACC__ and CUDA features being enabled when compiling
.cpp files, which breaks assumptions (but might be fine in the long
run). Also prefix some preprocessor macro names with CORENEURON_.

* Consistent libcudart linkage.

* Fix silly error for -DNDEBUG.

* Update README to suggest CMAKE_CUDA_COMPILER=nvcc.

* CMake minimum v3.15 for GPU builds.

* Tweaks for older CMake versions.

* Set CMAKE_CUDA_COMPILER=nvcc.

Drop other -D argument that is now injected via the `localrc` file of BB5's PGI/NVHPC compiler installations.

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@170a0bb
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random123 global state is not propagated correctly CoreNEURON uses deprecated FindCUDA

4 participants