Re-implement parallel sync using the NBX algorithm by roystgnr · Pull Request #1965 · libMesh/libmesh

roystgnr · 2018-12-04T23:03:18Z

This takes the algorithm from @friedmud in #1826, but fixes the MPI_TAG_UB bug, retains the ability to generate a MessageTag with a specified value, and adds a dbg/devel mode verification of automatic tag values when those are included.

This still needs a couple unit tests and I'm just getting started with my own private testing but I think it's in shape to throw at Civet and I'm done for today.

Also, we need to look at the tag choices in System::write_serialized_blocked_dof_objects - is it just me, or is that going to risk failure once num_blks exceeds 100, with only a few hundred million elements?

roystgnr · 2018-12-06T18:22:49Z

Spraying this thing with tests, but at this point if Civet decides it's happy then so am I. Not going to merge until @friedmud agrees too, of course.

friedmud · 2018-12-20T20:33:07Z

@roystgnr sorry I dropped out on this - I've had my head buried in my dissertation for the last couple of weeks. Let me review this for a bit and get back to you...

Also adds nonblocking_barrier() and possibly_receive()

This is based on Derek's algorithm, but retains backwards compatibility with manually selected tags.

The libmesh_call_mpi wrapper() should hide the actual MPI call, and then initializing the int to 0 should make the --disable-mpi behaviour "possibly_receive always returns false", which sounds accurate enough. Now I just need to test to see if this breaks horribly in the context of the NBX parallel_sync.h when MPI is disabled.

Otherwise the linker gets confused when it can't find an instantiation.

This first test just covers M->M push syncs.

This sort of error is a bit easier to debug with libMesh testing than with my MPI stack's.

And refactor out a bit of repetitiveness.

And refactor a bit more to reduce the size of this rapidly growing test collection

We assume the same round-robin rule here that we do with CheckpointIO, so hopefully we can standardize the way we deal with N->M mesh splitting

Oversized pushes work now

This looks like it'll be the best way to support N->M pull_parallel, but it turns out to be easy and cheap and might be useful in its own right.

roystgnr · 2019-01-29T18:06:17Z

I'm guessing that Bison Node #427 specified in 'top_disp_r_fuel' not found in the mesh! in serial wasn't likely to be due to a change in parallel codes that pass all the other tests. Hopefully it'll have been fixed by this next push.

roystgnr · 2019-02-01T17:07:56Z

So, N->M pushes and pulls are now working properly in unit tests, which means there's a chance that this PR also fixes the underlying bug(s) behind #1950 and partially obviates #1815... but now that I'm moving on to try something more complicated than a unit test, I find I can't replicate the problem in #1950. At least not on less than a hundred thousand elements; I'm trying out half a million next.

In hindsight this shouldn't have surprised me, since the trigger for the bug must have been a hell of a corner case for us to not hit it sooner. @friedmud, can you send me a test case (MOOSE input file + mesh file(s)?) that goes nuts in either fashion? Something I can fit on 24 cores and 64GB would be ideal; preferably still something that triggers within 640 cores and 1280GB otherwise.

Even if we don't get that testing done right away or this turns out to be an insufficient fix, I'm increasingly confident that it's a solid improvement, but since it's also a major change IMHO we should still wait until after 1.4.0 branches before merging so our git users can shake it down before our release users get it.

roystgnr · 2019-03-25T19:57:02Z

@jwpeterson recently got a new release out with the older more tested code, and @bboutkov recently reported some solid performance improvement results for this on libmesh-devel, and I recently wasted some time after accidentally triggering a bug that this branch fixes, so even if this branch isn't a fix-all I do think it's enough of an improvement to be merged and it's the ideal time to merge it. I'll do so shortly unless anyone screams.

roystgnr mentioned this pull request Dec 20, 2018

Skip partitioning in copy_nodes_and_elems if we should #1815

Closed

friedmud and others added 11 commits January 24, 2019 16:51

Re-implement parallel sync using the NBX algorithm.

6f9b850

Also adds nonblocking_barrier() and possibly_receive()

Add get_unique_tag autoselection mechanism

54b6b9a

This is based on Derek's algorithm, but retains backwards compatibility with manually selected tags.

Finish implementation of NBX and use it for both push and pull

b2d3819

Use automatic get_unique_tag where it makes sense

806c539

Verify and comment on automatic get_unique_tag()

e569581

Add header to get inline Comm::max(int)

046cbb3

Otherwise the linker gets confused when it can't find an instantiation.

Unit tests for MessageTag get_unique_tag()

df1cccb

Run MessageTag unit tests

8dc93c9

Start adding unit tests for parallel_sync.h

6453fe1

This first test just covers M->M push syncs.

Build parallel_sync_test.C

cdb0744

roystgnr force-pushed the nbx_push_vectors branch from 6812333 to 6f3d7cc Compare January 24, 2019 23:53

roystgnr added 11 commits January 29, 2019 08:42

Re-bootstrap

877da9b

Assert processor ids are in-range

a1f0524

This sort of error is a bit easier to debug with libMesh testing than with my MPI stack's.

Add unit test for M->M pull syncs

37ac473

Add push-vectors-of-vectors unit test

f13e778

And refactor out a bit of repetitiveness.

Add pull-vector-of-vectors parallel_sync test

666a099

And refactor a bit more to reduce the size of this rapidly growing test collection

Allow parallel_sync push to handle M > N

2a96106

We assume the same round-robin rule here that we do with CheckpointIO, so hopefully we can standardize the way we deal with N->M mesh splitting

parallel_sync unit tests for oversized data

d94412d

Oversized pushes work now

Support for push_parallel(multimap<vector>)

ce8d5a4

This looks like it'll be the best way to support N->M pull_parallel, but it turns out to be easy and cheap and might be useful in its own right.

Unit tests for push_parallel(multimap<vector>)

dece6fb

Unit tests for push(multimap<vec<vec>>)

fc87a15

multimap use/support in pull_parallel_foo

f988717

roystgnr force-pushed the nbx_push_vectors branch from 6f3d7cc to f988717 Compare January 29, 2019 18:06

roystgnr added 2 commits January 31, 2019 11:42

Oversized parallel vec<scalar> pulls work now

3569dff

Fix PullVecVec test, remove unused variables

2593b79

Enable PullVecVec test

b895125

roystgnr mentioned this pull request Mar 5, 2019

Don't use mesh.partition(1) in split_mesh #2058

Closed

roystgnr merged commit a43f79a into libMesh:master Mar 26, 2019

roystgnr mentioned this pull request Apr 12, 2019

GenericProjector rewrite #1938

Merged

roystgnr deleted the nbx_push_vectors branch November 12, 2019 20:25

roystgnr mentioned this pull request Nov 12, 2019

Re-implement parallel sync using the NBX algorithm. #1826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement parallel sync using the NBX algorithm#1965

Re-implement parallel sync using the NBX algorithm#1965
roystgnr merged 25 commits into
libMesh:masterfrom
roystgnr:nbx_push_vectors

roystgnr commented Dec 4, 2018

Uh oh!

roystgnr commented Dec 6, 2018

Uh oh!

friedmud commented Dec 20, 2018

Uh oh!

roystgnr commented Jan 29, 2019

Uh oh!

roystgnr commented Feb 1, 2019

Uh oh!

roystgnr commented Mar 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

roystgnr commented Dec 4, 2018

Uh oh!

roystgnr commented Dec 6, 2018

Uh oh!

friedmud commented Dec 20, 2018

Uh oh!

roystgnr commented Jan 29, 2019

Uh oh!

roystgnr commented Feb 1, 2019

Uh oh!

roystgnr commented Mar 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants