Skip to content
This repository was archived by the owner on Mar 20, 2023. It is now read-only.

Conversation

@nrnhines
Copy link
Collaborator

@nrnhines nrnhines commented Sep 8, 2020

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target.

To enable thread determination an std::vector involving the name
netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative
gid appears in netcon_srcgid, the tid is the value of the element in
netcon_negsrcgid_tid.

This pr is assciated with neuronsimulator/nrn#735

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target.

To enable thread determination an std::vector<int> involving the name
netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative
gid appears in netcon_srcgid, the tid is the value of the element in
netcon_negsrcgid_tid.
@alexsavulescu
Copy link
Contributor

A stack trace from the BBP pipeline:

MPT: (gdb) #0  0x00002aaaab44f12c in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaaabdc6776 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7fffffffa3c0 "MPT ERROR: Rank 0(g:0) received signal SIGSEGV(11).\n\tProcess ID: 164164, Host: r2i0n25, Program: /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/bin/x86_64/special-core\n\tMPT Version: HPE MPT 2"...) at sig.c:340
MPT: #3  0x00002aaaabdc6972 in first_arriver_handler (signo=signo@entry=11, 
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2aaab0da0080) at sig.c:489
MPT: #4  0x00002aaaabdc6d0b in slave_sig_handler (signo=11, 
MPT:     siginfo=<optimized out>, extra=<optimized out>) at sig.c:565
MPT: #5  <signal handler called>
MPT: #6  0x00002aaaab0e9408 in std::vector<int, std::allocator<int> >::begin (
MPT:     this=0x0)
MPT:     at /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/gcc-8.3.0-4hd4baobq2/include/c++/8.3.0/bits/stl_vector.h:708
MPT: #7  0x00002aaaab0e9598 in std::vector<int, std::allocator<int> >::empty (
MPT:     this=0x0)
MPT:     at /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/gcc-8.3.0-4hd4baobq2/include/c++/8.3.0/bits/stl_vector.h:895
MPT: #8  0x00002aaaab0e574e in coreneuron::determine_inputpresyn ()
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/io/nrn_setup.cpp:318
MPT: #9  0x00002aaaab0e662d in coreneuron::nrn_setup (
MPT:     filesdat=0x1759f00 "/jenkins/07/workspace/oss.coreneuron.pipeline/tests/integration/ring/files.dat", is_mapping_needed=false, run_setup_cleanup=true, 
MPT:     datpath=0x6fd360 "/jenkins/07/workspace/oss.coreneuron.pipeline/tests/integration/ring", restore_path=0x7fffffffba50 "", 
MPT:     mindelay=0x2aaaab43a0f0 <coreneuron::corenrn_param+112>)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/io/nrn_setup.cpp:514
MPT: #10 0x00002aaaab0cb477 in coreneuron::nrn_init_and_load_data (argc=11, 
MPT:     argv=0x7fffffffbe48, is_mapping_needed=false, run_setup_cleanup=true)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/apps/main1.cpp:252
MPT: #11 0x00002aaaab0c8d52 in run_solve_core (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/coreneuron/apps/main1.cpp:488
MPT: #12 0x00002aaaaacd0bfe in solve_core (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/share/coreneuron/enginemech.cpp:42
MPT: #13 0x000000000040179f in main (argc=11, argv=0x7fffffffbe48)
MPT:     at /jenkins/07/workspace/oss.coreneuron.pipeline/build_AoS/share/coreneuron/coreneuron.cpp:34

@nrnhines
Copy link
Collaborator Author

coreneuron::determine_inputpresyn () ... coreneuron/io/nrn_setup.cpp:318
                if (!negsrcgid_tid.empty()) {

It puzzles me how this can be NULL since line 291

        std::vector<int>& negsrcgid_tid = netcon_negsrcgid_tid[ith];

and netcon_negsrcgid_tid was fully allocated in line 503

        netcon_negsrcgid_tid.resize(nrn_nthread);

prior to Phase1 processing. Shouldn't that have made a nrn_nthread size vector of empty vectors?

@alexsavulescu
Copy link
Contributor

prior to Phase1 processing. Shouldn't that have made a nrn_nthread size vector of empty vectors?

Absolutely. My guess is that it doesn't get called. I can have a quick look, I have CoreNeuron set up.

@alexsavulescu
Copy link
Contributor

determine_inputpresyn() was using netcon_negsrcgid_tid as it if were resized regardless if we run embedded or not (and it was resized only for the embedded control path). The logic must be adjusted, I will propose something.

@alexsavulescu
Copy link
Contributor

@nrnhines what do you think about c870bf3?

@nrnhines
Copy link
Collaborator Author

c870bf3 should work but I would prefer avoiding the additional clutter by just creating an empty vector of vectors for netcon_negsrcgid_tid for the file mode transfer. ie. in line 503 of io/nrn_setup.cpp just move

$ git diff
diff --git a/coreneuron/io/nrn_setup.cpp b/coreneuron/io/nrn_setup.cpp
index 7135274..c232326 100644
--- a/coreneuron/io/nrn_setup.cpp
+++ b/coreneuron/io/nrn_setup.cpp
@@ -497,10 +497,10 @@ void nrn_setup(const char* filesdat,
         nrn_partrans::gap_mpi_setup(userParams.ngroup);
     }
 
+    netcon_negsrcgid_tid.resize(nrn_nthread);
     if (!corenrn_embedded) {
         coreneuron::phase_wrapper<coreneuron::phase::one>(userParams);
     } else {
-        netcon_negsrcgid_tid.resize(nrn_nthread);
         nrn_multithread_job([](NrnThread* n) {
             Phase1 p1; 
             p1.read_direct(n->id);

@nrnhines
Copy link
Collaborator Author

I did not notice that 7af3e93 was the same as my last suggestion. What went wrong with that that kept you going with c870bf3

@alexsavulescu
Copy link
Contributor

alexsavulescu commented Sep 12, 2020

I did not notice that 7af3e93 was the same as my last suggestion. What went wrong with that that kept you going with c870bf3

I know it's more verbose, just wanted to be closer to your original logic of resizing only for direct mode. Then I realized it makes things more obvious if we linked netcon_negsrcgid_tid to corenrn_embedded instead of relying on it being empty or not.

We can revert back to 7af3e93 if you prefer, however in that case we would need to keep the following (because empty() will not be a valid check, since netcon_negsrcgid_tid is always resized) :

diff --git a/coreneuron/io/phase1.cpp b/coreneuron/io/phase1.cpp
index fd5d3287..eff0dc4c 100644
--- a/coreneuron/io/phase1.cpp
+++ b/coreneuron/io/phase1.cpp
@@ -58,6 +58,10 @@ void Phase1::read_direct(int thread_id) {
     delete[] netcon_srcgid;
 }

+extern "C" {
+extern bool corenrn_embedded;
+}
+
 void Phase1::populate(NrnThread& nt, OMP_Mutex& mut) {
     nt.n_presyn = this->output_gids.size();
     nt.n_netcon = this->netcon_srcgids.size();
@@ -66,7 +70,7 @@ void Phase1::populate(NrnThread& nt, OMP_Mutex& mut) {
     std::copy(this->netcon_srcgids.begin(), this->netcon_srcgids.end(),
               netcon_srcgid[nt.id]);

-    if (!coreneuron::netcon_negsrcgid_tid.empty()) { // multiple threads and direct mode.
+    if (corenrn_embedded) { // multiple threads and direct mode.
         coreneuron::netcon_negsrcgid_tid[nt.id] = this->netcon_negsrcgid_tid;
     }

@pramodk
Copy link
Collaborator

pramodk commented Sep 13, 2020

@alexsavulescu @nrnhines : changes look good to me. But I stumble on this:

        std::vector<int> dummy;
        std::vector<int>& negsrcgid_tid = corenrn_embedded ? nrnthreads_netcon_negsrcgid_tid[ith] : dummy;

...
        if (!negsrcgid_tid.empty()) {
            tid = negsrcgid_tid[i_tid++];
         }

if anyway we have to protect access to negsrcgid_tid with if(!negsrcgid_tid.empty()), then may be more clear to remove usage of dummy and use :

        if (corenrn_embedded) {
            tid = nrnthreads_netcon_negsrcgid_tid[ith][i_tid++];
         }

which was this comment #390 (comment) I think. (reference to empty dummy vector and usage pattern in the code looked bit confusing)

@nrnhines
Copy link
Collaborator Author

nrnhines commented Sep 13, 2020

I like that suggestion. Good balance between @alexsavulescu point about "closer to your original logic of resizing only for direct mode." and just creatiing a vector of empty vectors. But I guess we still need to create the vector of empty vectors for the file mode a la 7af3e93, otherwise how can one safely

std::vector<int>& negsrcgid_tid =nrnthreads_netcon_negsrcgid_tid[ith];

as opposed to

std::vector<int>& negsrcgid_tid = corenrn_embedded ? nrnthreads_netcon_negsrcgid_tid[ith] : ?????;

Nevermind, I didn't notice your suggesting factoring in tid = nrnthreads_netcon_negsrcgid_tid[ith] into the loop over n_netcon.

Darn... There is a subtle bug when replacing

                if (!negsrcgid_tid.empty()) {
                    tid = negsrcgid_tid[i_tid++];
                }

with

        if (corenrn_embedded) {
            tid = nrnthreads_netcon_negsrcgid_tid[ith][i_tid++];
         }

and that is the fact that nrnthreads_netcon_negsrcgid_tid is not set up on the NEURON side when nrn_nthread=1.
I guess I vote for the original 7af3e93 fix. I.e. when nrnthreads_netcon_negsrcgid_tid is not to be used it is still a nrn_nthread size vector of empty vectors (so that it is the emptyness that skips over tid = negsrcgid_tid[i_tid++] instead of whether it is direct or file mode transfer/.

Comments indicate that nrnthreads_netcon_negsrc_gid_tid subvectors are
definitely empty when single thread or file transfer. (will be empty
anyway if there are no negative gids)

This reverts commit c870bf3.
@alexsavulescu
Copy link
Contributor

Given that nrnthreads_netcon_negsrc_gid_tid is not populated for single thread or file transfer, 7af3e93 is definitely a win.

@alexsavulescu alexsavulescu merged commit bd747a8 into master Sep 14, 2020
nrnhines added a commit that referenced this pull request Sep 20, 2020
pramodk pushed a commit that referenced this pull request Sep 21, 2020
pramodk pushed a commit that referenced this pull request Sep 21, 2020
pramodk added a commit that referenced this pull request Sep 21, 2020
…390 (#392)

* Implement support for version from last git commit
  - coreneuron version was printed as "version id unimplemented"
  - add cmake module GitRevision.cmake to find git commit information
  - this commit adds support to print project version along with
    git commit and it's date. e.g.
      Version : 0.21.0 bd747a8 (14-09-2020 16:03)
* Added --version CLI flag
  - Use git show instead of git show to be compatible with older git (e.g. on BB5)
* Avoid nrn_abort when --help is called
* Bug fix for merged pull request #390 : tid and ith args were mixed
* For BB5 Jenkins, exclude problematic agent bb5-07 (has strange PGI compiler issue)

Co-authored-by: Michael Hines <[email protected]>
@pramodk pramodk deleted the acell-thread branch December 4, 2020 18:14
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
…ain/CoreNeuron#390)

* If a netcon_srcgid is negative, need to determine the thread.

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target.

To enable thread determination an std::vector<int> involving the name
netcon_negsrcgid_tid is associated with netcon_srcgid in that when a negative
gid appears in netcon_srcgid, the tid is the value of the element in
netcon_negsrcgid_tid.

* always resize netcon_negsrcgid_tid

* link netcon_negsrcgid_tid usage to corenrn_embedded

* nrnthreads_netcon_srcgid is more meaninfgul name.

* Revert "link netcon_negsrcgid_tid usage to corenrn_embedded"

Comments indicate that nrnthreads_netcon_negsrc_gid_tid subvectors are
definitely empty when single thread or file transfer. (will be empty
anyway if there are no negative gids)

This reverts commit c870bf32b03e936654394789a48e9a0bdb749a13.

Co-authored-by: Alexandru Savulescu <[email protected]>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@bd747a8
pramodk added a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
…lueBrain/CoreNeuron#390 (BlueBrain/CoreNeuron#392)

* Implement support for version from last git commit
  - coreneuron version was printed as "version id unimplemented"
  - add cmake module GitRevision.cmake to find git commit information
  - this commit adds support to print project version along with
    git commit and it's date. e.g.
      Version : 0.21.0 bd747a8e (14-09-2020 16:03)
* Added --version CLI flag
  - Use git show instead of git show to be compatible with older git (e.g. on BB5)
* Avoid nrn_abort when --help is called
* Bug fix for merged pull request BlueBrain/CoreNeuron#390 : tid and ith args were mixed
* For BB5 Jenkins, exclude problematic agent bb5-07 (has strange PGI compiler issue)

Co-authored-by: Michael Hines <[email protected]>

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@e0a2774
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants