Skip to content

Delay new node partitioning#1660

Merged
roystgnr merged 4 commits into
libMesh:masterfrom
roystgnr:delay_new_node_partitioning_2
Apr 19, 2018
Merged

Delay new node partitioning#1660
roystgnr merged 4 commits into
libMesh:masterfrom
roystgnr:delay_new_node_partitioning_2

Conversation

@roystgnr
Copy link
Copy Markdown
Member

The second half of the work in #1659; this delays partitioning nodes newly created by refinement, allowing us to partition them with any heuristic but without accidentally repartitioning old nodes if the user has disallowed that.

In some cases we know we won't need more than one iteration of sync,
so we can cut our communication in half by skipping the second "check
if everything's fine" iteration
This lets us identify new nodes when sync'ing up a distributed mesh
later.

*That* lets us truly respect skip_partitioning requests, because we
can assign processor ids to new nodes without risking inadvertently
reassigning the processor id of an existing node.
@roystgnr
Copy link
Copy Markdown
Member Author

Still passing Rattlesnake? In that case I'll merge once the rest of the CI checkboxes are happy.

@roystgnr
Copy link
Copy Markdown
Member Author

Hmm.. or maybe use this excuse to add some more expensive optional MOOSE tests, since GRINS-dbg might take a while.

@roystgnr
Copy link
Copy Markdown
Member Author

DistributedMesh recover exodiff failures at 3 processors with variables/fe_hier.test_hier_2_1d, at 10 with mesh/named_entities.test_periodic_names, at 12 with restart/restart.test_nodal_var_2, and at 16 with executioners/executioner.test_steady, misc/exception.parallel_exception_jacobian_transient_non_zero_rank, and misc/exception.parallel_exception_residual_transient_non_zero_rank...

And I can't replicate a single one of those failures.

We still have those tests marked "failed but allowed" (for good reason; there was one long-standing failure there that I never managed to replicate) and this PR may not be what broke them (when's the last time we did a distributed recover pass on Civet? The https://civet.inl.gov/recipe_events/23169/ log doesn't show any of the previous runs.) so I'm going to merge anyway, but I'm despairing trying to figure out how to bisect test failures I can't reproduce.

@roystgnr roystgnr merged commit 91474e1 into libMesh:master Apr 19, 2018
@roystgnr roystgnr deleted the delay_new_node_partitioning_2 branch April 19, 2018 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants