Skip to content

Conversation

@nrnhines
Copy link
Member

@nrnhines nrnhines commented Sep 8, 2020

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target. Otherwise raise an error.

The test/coreneuron/test_datareturn.py for multiple threads is activated.

This pr is associated with BlueBrain/CoreNeuron#390

Only for direct transfer mode is it allowed that a negative srcgid is not
in the same thread as the NetCon target. Otherwise raise an error.

The test/coreneuron/test_datareturn.py for multiple threads is activated.
@nrnhines
Copy link
Member Author

nrnhines commented Sep 9, 2020

It is a puzzle to me why travis job 1378.5 is failing at

3591 10: max diff permuted with 2 threads = 1.57986
3612 10: AssertionError
3613 10/10 Test #10: coreneuron_datareturn_py .........***Failed    0.49 sec

Since on my desktop with

cmake .. -DCMAKE_INSTALL_PREFIX=install -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_TESTS=ON -DIV_DIR=$HOME/neuron/ivcmake/build/install

I get

$ make test
...
11/12 Test #11: coreneuron_datareturn_py .........   Passed    0.80 sec

and in test/coreneuron

python test_datareturn.py
...
max diff unpermuted = 0
max diff permuted = 1.84386e-12
max diff permuted with 2 threads = 1.84386e-12

Note, on travis the test is number 10 instead of number 11 because rxd_mpi_tests are skipped on travis. Anyway, I know that the git versions are at least matching on travis since

Submodule path 'external/coreneuron': checked out '44e075bb376a536ee94ca3a3b2d78fd798d799bf'

Retrying on my desktop with -DNRN_ENABLE_MPI=OFF to be more consistent with this travis job, I still successfully

10/10 Test #10: coreneuron_datareturn_py .........   Passed    0.80 sec

My last stab at similarity on the desktop was to build with gcc-5

cmake .. -DCMAKE_INSTALL_PREFIX=install -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_TESTS=ON -DIV_DIR=$HOME/neuron/ivcmake/build/install -DNRN_ENABLE_MPI=OFF -DCMAKE_C_COMPILER=gcc-5 -DCMAKE_CXX_COMPILER=g++-5

and still

10/10 Test #10: coreneuron_datareturn_py .........   Passed    0.81 sec

Copy link
Member

@alexsavulescu alexsavulescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexsavulescu
Copy link
Member

I've re-launched Travis to see if it's deterministic.

@alexsavulescu
Copy link
Member

So the issue appears to be non-deterministic. Job 5 was successful on 1st run and others were successful after re-launching them.

@pramodk
Copy link
Member

pramodk commented Sep 13, 2020

I will get time to look at this more closely tomorrow morning. (#728 have different travis issue)

@nrnhines
Copy link
Member Author

Still puzzled by travis failures. For the one linux case, restarting the job succeeded. But for the mac 1392.9 case restarting fails.
I don't have a problem on my mac catalina. Here again,

11: max diff permuted with 2 threads = 1.57986
...
11:     assert(max_permuted_thread < 1e-10)
11: AssertionError

but it seems we are using the correct coreneuron

Submodule path 'external/coreneuron': checked out 'bd747a8e94117bbbe7c8646b8f1a83c0b214b179'

@nrnhines
Copy link
Member Author

I'm starting to think there is a thread race condition in the code. Strangely, I am successful with
python test_datareturn.py but valgrind /home/hines/.pyenv/versions/3.7.6/bin/python test_datareturn.py fails (though no relevant valgrind errors). So, at least, finally I have something to investigate on my desktop. I built with

cmake .. -DCMAKE_INSTALL_PREFIX=install -DNRN_ENABLE_BINARY_SPECIAL=ON -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_TESTS=ON

@alexsavulescu
Copy link
Member

I'm starting to think there is a thread race condition in the code. Strangely, I am successful with
python test_datareturn.py but valgrind /home/hines/.pyenv/versions/3.7.6/bin/python test_datareturn.py fails (though no relevant valgrind errors).

What if you run python test_datareturn.py in a loop (say 100x) ? valgrind usually slows down the application, so maybe you are right about a thread race.

@pramodk
Copy link
Member

pramodk commented Sep 16, 2020

I'm starting to think there is a thread race condition in the code. Strangely,

I "think" I have seen OpenMP related race condition during setup of CoreNEURON. Let me push a small change disabling OpenMP on CoreNEURON side and we will find it out.

olupton added a commit that referenced this pull request Dec 7, 2022
Also use ninja and ccache. Add better failure of the version number
detection from git history in shallow clones. Slim down apt/brew install
commands. Test on newer Ubuntu and macOS images in addition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants