Skip to content

Memory Usage and time in Partitioner::set_node_processor_ids() #1950

@friedmud

Description

@friedmud

I'm seeing some pretty massive memory usage and time spent when trying to split a mesh with 100M+ elements during Partitioner::set_node_processor_ids().

This is well before the communication portion of that routine...

The memory usage when the program entered that routine was around 50GB of RAM. It's currently climbed to 155GB and counting (I only have 187GB on this node so hopefully it stops soon!). It has also taken a REALLY long time. (maybe an hour?).

Here is a current stack trace:

#0  0x00002aaab0a5d451 in __memmove_ssse3_back () from /lib64/libc.so.6
#1  0x00002aaaacb90112 in libMesh::Partitioner::set_node_processor_ids(libMesh::MeshBase&) () from /home/gastdr/projects/lemhi/libmesh/lib/libmesh_opt.so.0
#2  0x00002aaaacb923fa in libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from /home/gastdr/projects/lemhi/libmesh/lib/libmesh_opt.so.0
#3  0x00002aaaac93a388 in libMesh::split_mesh(libMesh::MeshBase&, unsigned int) () from /home/gastdr/projects/lemhi/libmesh/lib/libmesh_opt.so.0
#4  0x00002aaaab89c031 in SplitMeshAction::act() () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#5  0x00002aaaab899134 in Action::timedAct() () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#6  0x00002aaaab8a3c69 in ActionWarehouse::executeActionsWithAction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#7  0x00002aaaab8a5348 in ActionWarehouse::executeAllActions() () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#8  0x00002aaaabec90ae in MooseApp::runInputFile() () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#9  0x00002aaaabec7660 in MooseApp::run() () from /home/gastdr/projects/lemhi/moose/framework/libmoose-opt.so.0
#10 0x0000000000404377 in main ()

Just since I wrote all of that it finally ran out of memory and died. Crap. I was only running one MPI per node to try to get this to go through.

If we could get some eyeballs on this I would appreciate it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions