If a netcon_srcgid is negative, need to determine the thread. #390

nrnhines · 2020-09-08T22:45:28Z

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target.

To enable thread determination an std::vector involving the name
netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative
gid appears in netcon_srcgid, the tid is the value of the element in
netcon_negsrcgid_tid.

This pr is assciated with neuronsimulator/nrn#735

Only for direct transfer mode is it allowed that a negative srcgid is not in the same thread as the NetCon target. To enable thread determination an std::vector<int> involving the name netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative gid appears in netcon_srcgid, the tid is the value of the element in netcon_negsrcgid_tid.

coreneuron/io/nrn_setup.cpp

coreneuron/io/phase1.cpp

alexsavulescu · 2020-09-11T13:38:08Z

A stack trace from the BBP pipeline:

MPT: (gdb) #0  0x00002aaaab44f12c in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaaabdc6776 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7fffffffa3c0 "MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).\n\tProcess ID: 164164, Host: r2i0n25, Program: /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/bin/x86_64/special-core\n\tMPT Version: HPE MPT 2"...) at sig.c:340
MPT: #3  0x00002aaaabdc6972 in first_arriver_handler (signo=signo@entry=11, 
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2aaab0da0080) at sig.c:489
MPT: #4  0x00002aaaabdc6d0b in slave_sig_handler (signo=11, 
MPT:     siginfo=<optimized out>, extra=<optimized out>) at sig.c:565
MPT: #5  <signal handler called>
MPT: #6  0x00002aaaab0e9408 in std::vector<int, std::allocator<int> >::begin (
MPT:     this=0x0)
MPT:     at /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/gcc-8.3.0-4hd4baobq2/include/c++/8.3.0/bits/stl_vector.h:708
MPT: #7  0x00002aaaab0e9598 in std::vector<int, std::allocator<int> >::empty (
MPT:     this=0x0)
MPT:     at /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/gcc-8.3.0-4hd4baobq2/include/c++/8.3.0/bits/stl_vector.h:895
MPT: #8  0x00002aaaab0e574e in coreneuron::determine_inputpresyn ()
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/io/nrn_setup.cpp:318
MPT: #9  0x00002aaaab0e662d in coreneuron::nrn_setup (
MPT:     filesdat=0x1759f00 "/jenkins/07/workspace/oss.coreneuron.pipeline/tests/integration/ring/files.dat", is_mapping_needed=false, run_setup_cleanup=true, 
MPT:     datpath=0x6fd360 "/jenkins/07/workspace/oss.coreneuron.pipeline/tests/integration/ring", restore_path=0x7fffffffba50 "", 
MPT:     mindelay=0x2aaaab43a0f0 <coreneuron::corenrn_param+112>)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/io/nrn_setup.cpp:514
MPT: #10 0x00002aaaab0cb477 in coreneuron::nrn_init_and_load_data (argc=11, 
MPT:     argv=0x7fffffffbe48, is_mapping_needed=false, run_setup_cleanup=true)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/apps/main1.cpp:252
MPT: #11 0x00002aaaab0c8d52 in run_solve_core (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/apps/main1.cpp:488
MPT: #12 0x00002aaaaacd0bfe in solve_core (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/share/coreneuron/enginemech.cpp:42
MPT: #13 0x000000000040179f in main (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/share/coreneuron/coreneuron.cpp:34

nrnhines · 2020-09-12T00:02:43Z

coreneuron::determine_inputpresyn () ... coreneuron/io/nrn_setup.cpp:318
                if (!negsrcgid_tid.empty()) {

It puzzles me how this can be NULL since line 291

        std::vector<int>& negsrcgid_tid = netcon_negsrcgid_tid[ith];

and netcon_negsrcgid_tid was fully allocated in line 503

        netcon_negsrcgid_tid.resize(nrn_nthread);

prior to Phase1 processing. Shouldn't that have made a nrn_nthread size vector of empty vectors?

alexsavulescu · 2020-09-12T12:48:49Z

prior to Phase1 processing. Shouldn't that have made a nrn_nthread size vector of empty vectors?

Absolutely. My guess is that it doesn't get called. I can have a quick look, I have CoreNeuron set up.

alexsavulescu · 2020-09-12T13:41:22Z

determine_inputpresyn() was using netcon_negsrcgid_tid as it if were resized regardless if we run embedded or not (and it was resized only for the embedded control path). The logic must be adjusted, I will propose something.

alexsavulescu · 2020-09-12T13:48:54Z

@nrnhines what do you think about c870bf3?

nrnhines · 2020-09-12T14:19:57Z

c870bf3 should work but I would prefer avoiding the additional clutter by just creating an empty vector of vectors for netcon_negsrcgid_tid for the file mode transfer. ie. in line 503 of io/nrn_setup.cpp just move

$ git diff
diff --git a/coreneuron/io/nrn_setup.cpp b/coreneuron/io/nrn_setup.cpp
index 7135274..c232326 100644
--- a/coreneuron/io/nrn_setup.cpp
+++ b/coreneuron/io/nrn_setup.cpp
@@ -497,10 +497,10 @@ void nrn_setup(const char* filesdat,
         nrn_partrans::gap_mpi_setup(userParams.ngroup);
     }
 
+    netcon_negsrcgid_tid.resize(nrn_nthread);
     if (!corenrn_embedded) {
         coreneuron::phase_wrapper<coreneuron::phase::one>(userParams);
     } else {
-        netcon_negsrcgid_tid.resize(nrn_nthread);
         nrn_multithread_job([](NrnThread* n) {
             Phase1 p1; 
             p1.read_direct(n->id);

nrnhines · 2020-09-12T14:31:56Z

I did not notice that 7af3e93 was the same as my last suggestion. What went wrong with that that kept you going with c870bf3

alexsavulescu · 2020-09-12T19:42:31Z

I did not notice that 7af3e93 was the same as my last suggestion. What went wrong with that that kept you going with c870bf3

I know it's more verbose, just wanted to be closer to your original logic of resizing only for direct mode. Then I realized it makes things more obvious if we linked netcon_negsrcgid_tid to corenrn_embedded instead of relying on it being empty or not.

We can revert back to 7af3e93 if you prefer, however in that case we would need to keep the following (because empty() will not be a valid check, since netcon_negsrcgid_tid is always resized) :

diff --git a/coreneuron/io/phase1.cpp b/coreneuron/io/phase1.cpp
index fd5d3287..eff0dc4c 100644
--- a/coreneuron/io/phase1.cpp
+++ b/coreneuron/io/phase1.cpp
@@ -58,6 +58,10 @@ void Phase1::read_direct(int thread_id) {
     delete[] netcon_srcgid;
 }

+extern "C" {
+extern bool corenrn_embedded;
+}
+
 void Phase1::populate(NrnThread& nt, OMP_Mutex& mut) {
     nt.n_presyn = this->output_gids.size();
     nt.n_netcon = this->netcon_srcgids.size();
@@ -66,7 +70,7 @@ void Phase1::populate(NrnThread& nt, OMP_Mutex& mut) {
     std::copy(this->netcon_srcgids.begin(), this->netcon_srcgids.end(),
               netcon_srcgid[nt.id]);

-    if (!coreneuron::netcon_negsrcgid_tid.empty()) { // multiple threads and direct mode.
+    if (corenrn_embedded) { // multiple threads and direct mode.
         coreneuron::netcon_negsrcgid_tid[nt.id] = this->netcon_negsrcgid_tid;
     }

pramodk · 2020-09-13T20:00:22Z

@alexsavulescu @nrnhines : changes look good to me. But I stumble on this:

        std::vector<int> dummy;
        std::vector<int>& negsrcgid_tid = corenrn_embedded ? nrnthreads_netcon_negsrcgid_tid[ith] : dummy;

...
        if (!negsrcgid_tid.empty()) {
            tid = negsrcgid_tid[i_tid++];
         }

if anyway we have to protect access to negsrcgid_tid with if(!negsrcgid_tid.empty()), then may be more clear to remove usage of dummy and use :

        if (corenrn_embedded) {
            tid = nrnthreads_netcon_negsrcgid_tid[ith][i_tid++];
         }

which was this comment #390 (comment) I think. (reference to empty dummy vector and usage pattern in the code looked bit confusing)

nrnhines · 2020-09-13T20:09:32Z

I like that suggestion. Good balance between @alexsavulescu point about "closer to your original logic of resizing only for direct mode." and just creatiing a vector of empty vectors. But I guess we still need to create the vector of empty vectors for the file mode a la 7af3e93, otherwise how can one safely

std::vector<int>& negsrcgid_tid =nrnthreads_netcon_negsrcgid_tid[ith];

as opposed to

std::vector<int>& negsrcgid_tid = corenrn_embedded ? nrnthreads_netcon_negsrcgid_tid[ith] : ?????;

Nevermind, I didn't notice your suggesting factoring in tid = nrnthreads_netcon_negsrcgid_tid[ith] into the loop over n_netcon.

Darn... There is a subtle bug when replacing

                if (!negsrcgid_tid.empty()) {
                    tid = negsrcgid_tid[i_tid++];
                }

with

        if (corenrn_embedded) {
            tid = nrnthreads_netcon_negsrcgid_tid[ith][i_tid++];
         }

and that is the fact that nrnthreads_netcon_negsrcgid_tid is not set up on the NEURON side when nrn_nthread=1.
I guess I vote for the original 7af3e93 fix. I.e. when nrnthreads_netcon_negsrcgid_tid is not to be used it is still a nrn_nthread size vector of empty vectors (so that it is the emptyness that skips over tid = negsrcgid_tid[i_tid++] instead of whether it is direct or file mode transfer/.

Comments indicate that nrnthreads_netcon_negsrc_gid_tid subvectors are definitely empty when single thread or file transfer. (will be empty anyway if there are no negative gids) This reverts commit c870bf3.

alexsavulescu · 2020-09-14T19:51:38Z

Given that nrnthreads_netcon_negsrc_gid_tid is not populated for single thread or file transfer, 7af3e93 is definitely a win.

…390 (#392) * Implement support for version from last git commit - coreneuron version was printed as "version id unimplemented" - add cmake module GitRevision.cmake to find git commit information - this commit adds support to print project version along with git commit and it's date. e.g. Version : 0.21.0 bd747a8 (14-09-2020 16:03) * Added --version CLI flag - Use git show instead of git show to be compatible with older git (e.g. on BB5) * Avoid nrn_abort when --help is called * Bug fix for merged pull request #390 : tid and ith args were mixed * For BB5 Jenkins, exclude problematic agent bb5-07 (has strange PGI compiler issue) Co-authored-by: Michael Hines <[email protected]>

…ain/CoreNeuron#390) * If a netcon_srcgid is negative, need to determine the thread. Only for direct transfer mode is it allowed that a negative srcgid is not in the same thread as the NetCon target. To enable thread determination an std::vector<int> involving the name netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative gid appears in netcon_srcgid, the tid is the value of the element in netcon_negsrcgid_tid. * always resize netcon_negsrcgid_tid * link netcon_negsrcgid_tid usage to corenrn_embedded * nrnthreads_netcon_srcgid is more meaninfgul name. * Revert "link netcon_negsrcgid_tid usage to corenrn_embedded" Comments indicate that nrnthreads_netcon_negsrc_gid_tid subvectors are definitely empty when single thread or file transfer. (will be empty anyway if there are no negative gids) This reverts commit c870bf32b03e936654394789a48e9a0bdb749a13. Co-authored-by: Alexandru Savulescu <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@bd747a8

…lueBrain/CoreNeuron#390 (BlueBrain/CoreNeuron#392) * Implement support for version from last git commit - coreneuron version was printed as "version id unimplemented" - add cmake module GitRevision.cmake to find git commit information - this commit adds support to print project version along with git commit and it's date. e.g. Version : 0.21.0 bd747a8e (14-09-2020 16:03) * Added --version CLI flag - Use git show instead of git show to be compatible with older git (e.g. on BB5) * Avoid nrn_abort when --help is called * Bug fix for merged pull request BlueBrain/CoreNeuron#390 : tid and ith args were mixed * For BB5 Jenkins, exclude problematic agent bb5-07 (has strange PGI compiler issue) Co-authored-by: Michael Hines <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@e0a2774

nrnhines requested review from alexsavulescu and pramodk September 8, 2020 22:45

nrnhines mentioned this pull request Sep 8, 2020

If a netcon_srcgid is negative, need to determine the thread. neuronsimulator/nrn#735

Merged

alexsavulescu reviewed Sep 11, 2020

View reviewed changes

coreneuron/io/nrn_setup.cpp Outdated Show resolved Hide resolved

coreneuron/io/nrn_setup.cpp Outdated Show resolved Hide resolved

coreneuron/io/phase1.cpp Outdated Show resolved Hide resolved

always resize netcon_negsrcgid_tid

7af3e93

link netcon_negsrcgid_tid usage to corenrn_embedded

c870bf3

nrnthreads_netcon_srcgid is more meaninfgul name.

6958a33

alexsavulescu approved these changes Sep 13, 2020

View reviewed changes

Revert "link netcon_negsrcgid_tid usage to corenrn_embedded"

9dc7079

Comments indicate that nrnthreads_netcon_negsrc_gid_tid subvectors are definitely empty when single thread or file transfer. (will be empty anyway if there are no negative gids) This reverts commit c870bf3.

alexsavulescu merged commit bd747a8 into master Sep 14, 2020

nrnhines added a commit that referenced this pull request Sep 20, 2020

Bug fix for merged pull request #390

c22bf8b

pramodk pushed a commit that referenced this pull request Sep 21, 2020

Bug fix for merged pull request #390

bb4df8f

pramodk pushed a commit that referenced this pull request Sep 21, 2020

Bug fix for merged pull request #390

04b6164

pramodk deleted the acell-thread branch December 4, 2020 18:14

If a netcon_srcgid is negative, need to determine the thread. #390

If a netcon_srcgid is negative, need to determine the thread. #390

Uh oh!

Conversation

nrnhines commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexsavulescu commented Sep 11, 2020

Uh oh!

nrnhines commented Sep 12, 2020

Uh oh!

alexsavulescu commented Sep 12, 2020

Uh oh!

alexsavulescu commented Sep 12, 2020

Uh oh!

alexsavulescu commented Sep 12, 2020

Uh oh!

nrnhines commented Sep 12, 2020

Uh oh!

nrnhines commented Sep 12, 2020

Uh oh!

alexsavulescu commented Sep 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramodk commented Sep 13, 2020

Uh oh!

nrnhines commented Sep 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexsavulescu commented Sep 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nrnhines commented Sep 8, 2020 •

edited

Loading

alexsavulescu commented Sep 12, 2020 •

edited

Loading

nrnhines commented Sep 13, 2020 •

edited

Loading