This repository was archived by the owner on Mar 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 41
This repository was archived by the owner on Mar 20, 2023. It is now read-only.
Issue while running VecPlay test on GPU #501
Copy link
Copy link
Closed
Description
When we run simple VecPlay test on GPU then we get following error:
.....
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 221.0977, Min 221.0977, Avg 221.0977
MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).
Process ID: 11846, Host: r2i3n2, Program: /gpfs/bbp.cscs.ch/home/kumbhar/tmp/testcorenrn/x86_64/special-core
MPT Version: HPE HMPT 2.22 03/31/20 16:17:35
To Reproduce
Build the latest master with GPU enabled:
module load unstable nvhpc cuda/11.1.0 hpe-mpi cmake python-dev
cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/install -DCORENRN_ENABLE_GPU=ON
make -j installAnd then run vecplay test from testcorenrn repo then we get:
kumbhar@r2i3n2:~/tmp/testcorenrn$ ./x86_64/special-core -d coredat --mpi --gpu -e 1
num_mpi=1
num_omp_thread=1
Info : 4 GPUs shared by 1 ranks per node
....
Start time (t) = 0
Memory (MBs) : After mk_spikevec_buffer : Max 221.0977, Min 221.0977, Avg 221.0977
MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).
Process ID: 11846, Host: r2i3n2, Program: /gpfs/bbp.cscs.ch/home/kumbhar/tmp/testcorenrn/x86_64/special-core
MPT Version: HPE HMPT 2.22 03/31/20 16:17:35
MPT: --------stack traceback-------
MPT: Attaching to program: /proc/11846/exe, process 11846
MPT: [New LWP 11857]
MPT: [New LWP 11856]
MPT: [Thread debugging using libthread_db enabled]
MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
MPT: 0x00007fffed6d6179 in waitpid () from /lib64/libpthread.so.0
System (please complete the following information)
- BB5
- Compiler: nvhpc/20.9
- Version: master
- Backend: [e.g. GPU]
Additional context
I believe the error was introduced in #283. In that PR we said I manually ran tests but I believe the PR got changed afterwards and broke some stuff (?).
(#308 is important to have in place!)