Re-implement parallel sync using the NBX algorithm#1965
Conversation
|
Spraying this thing with tests, but at this point if Civet decides it's happy then so am I. Not going to merge until @friedmud agrees too, of course. |
|
@roystgnr sorry I dropped out on this - I've had my head buried in my dissertation for the last couple of weeks. Let me review this for a bit and get back to you... |
Also adds nonblocking_barrier() and possibly_receive()
This is based on Derek's algorithm, but retains backwards compatibility with manually selected tags.
The libmesh_call_mpi wrapper() should hide the actual MPI call, and then initializing the int to 0 should make the --disable-mpi behaviour "possibly_receive always returns false", which sounds accurate enough. Now I just need to test to see if this breaks horribly in the context of the NBX parallel_sync.h when MPI is disabled.
Otherwise the linker gets confused when it can't find an instantiation.
This first test just covers M->M push syncs.
6812333 to
6f3d7cc
Compare
This sort of error is a bit easier to debug with libMesh testing than with my MPI stack's.
And refactor out a bit of repetitiveness.
And refactor a bit more to reduce the size of this rapidly growing test collection
We assume the same round-robin rule here that we do with CheckpointIO, so hopefully we can standardize the way we deal with N->M mesh splitting
Oversized pushes work now
This looks like it'll be the best way to support N->M pull_parallel, but it turns out to be easy and cheap and might be useful in its own right.
6f3d7cc to
f988717
Compare
|
I'm guessing that Bison |
|
So, N->M pushes and pulls are now working properly in unit tests, which means there's a chance that this PR also fixes the underlying bug(s) behind #1950 and partially obviates #1815... but now that I'm moving on to try something more complicated than a unit test, I find I can't replicate the problem in #1950. At least not on less than a hundred thousand elements; I'm trying out half a million next. In hindsight this shouldn't have surprised me, since the trigger for the bug must have been a hell of a corner case for us to not hit it sooner. @friedmud, can you send me a test case (MOOSE input file + mesh file(s)?) that goes nuts in either fashion? Something I can fit on 24 cores and 64GB would be ideal; preferably still something that triggers within 640 cores and 1280GB otherwise. Even if we don't get that testing done right away or this turns out to be an insufficient fix, I'm increasingly confident that it's a solid improvement, but since it's also a major change IMHO we should still wait until after 1.4.0 branches before merging so our git users can shake it down before our release users get it. |
|
@jwpeterson recently got a new release out with the older more tested code, and @bboutkov recently reported some solid performance improvement results for this on libmesh-devel, and I recently wasted some time after accidentally triggering a bug that this branch fixes, so even if this branch isn't a fix-all I do think it's enough of an improvement to be merged and it's the ideal time to merge it. I'll do so shortly unless anyone screams. |
This takes the algorithm from @friedmud in #1826, but fixes the MPI_TAG_UB bug, retains the ability to generate a MessageTag with a specified value, and adds a dbg/devel mode verification of automatic tag values when those are included.
This still needs a couple unit tests and I'm just getting started with my own private testing but I think it's in shape to throw at Civet and I'm done for today.
Also, we need to look at the tag choices in System::write_serialized_blocked_dof_objects - is it just me, or is that going to risk failure once num_blks exceeds 100, with only a few hundred million elements?