Delay new node partitioning by roystgnr · Pull Request #1660 · libMesh/libmesh

roystgnr · 2018-04-19T14:04:19Z

The second half of the work in #1659; this delays partitioning nodes newly created by refinement, allowing us to partition them with any heuristic but without accidentally repartitioning old nodes if the user has disallowed that.

In some cases we know we won't need more than one iteration of sync, so we can cut our communication in half by skipping the second "check if everything's fine" iteration

This lets us identify new nodes when sync'ing up a distributed mesh later. *That* lets us truly respect skip_partitioning requests, because we can assign processor ids to new nodes without risking inadvertently reassigning the processor id of an existing node.

roystgnr · 2018-04-19T14:25:18Z

Still passing Rattlesnake? In that case I'll merge once the rest of the CI checkboxes are happy.

roystgnr · 2018-04-19T14:26:01Z

Hmm.. or maybe use this excuse to add some more expensive optional MOOSE tests, since GRINS-dbg might take a while.

roystgnr · 2018-04-19T18:58:24Z

DistributedMesh recover exodiff failures at 3 processors with variables/fe_hier.test_hier_2_1d, at 10 with mesh/named_entities.test_periodic_names, at 12 with restart/restart.test_nodal_var_2, and at 16 with executioners/executioner.test_steady, misc/exception.parallel_exception_jacobian_transient_non_zero_rank, and misc/exception.parallel_exception_residual_transient_non_zero_rank...

And I can't replicate a single one of those failures.

We still have those tests marked "failed but allowed" (for good reason; there was one long-standing failure there that I never managed to replicate) and this PR may not be what broke them (when's the last time we did a distributed recover pass on Civet? The https://civet.inl.gov/recipe_events/23169/ log doesn't show any of the previous runs.) so I'm going to merge anyway, but I'm despairing trying to figure out how to bisect test failures I can't reproduce.

roystgnr added 4 commits April 19, 2018 09:01

Parallel::sync_node_data_by_element_id_once

b9abade

In some cases we know we won't need more than one iteration of sync, so we can cut our communication in half by skipping the second "check if everything's fine" iteration

Comment on partitioning behavior change

f0f2d59

Fix comment

aea965c

moosebuild added the PR: Failed but allowed label Apr 19, 2018

roystgnr merged commit 91474e1 into libMesh:master Apr 19, 2018

roystgnr deleted the delay_new_node_partitioning_2 branch April 19, 2018 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay new node partitioning#1660

Delay new node partitioning#1660
roystgnr merged 4 commits into
libMesh:masterfrom
roystgnr:delay_new_node_partitioning_2

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

roystgnr commented Apr 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants