Skip to content

Intel Compiler Bug#161

Merged
sbryngelson merged 82 commits into
MFlowCode:masterfrom
anandrdbz:patch-1
Aug 1, 2023
Merged

Intel Compiler Bug#161
sbryngelson merged 82 commits into
MFlowCode:masterfrom
anandrdbz:patch-1

Conversation

@anandrdbz
Copy link
Copy Markdown
Contributor

@anandrdbz anandrdbz commented May 25, 2023

The issue with the intel compilers seems to stem from the MPI_SENDRECV not occurring correctly in 2D / 3D.

I added a NaN check in the receive buffer to narrow down where this comes from, however, adding this check seems to remove this bug entirely. All test cases pass with the intel compiler.

The source of the bug seems rather bizarre, since adding a NaN check should not fundamentally change anything in the code. If I had to guess, MPI_SENDRECV using intel mpi seems to not be perfectly blocking as it should and the receive buffer is not fully populated. The extra time required to perform the check perhaps allows for the transfer to complete.

Either way, bug seems to be compiler related and not introduced in the code.

@sbryngelson , modular fp seems to fail in another case file with single precision (with gcc), so I'm not sure if this is the cause of the bug there, but we can always check

@anandrdbz anandrdbz requested a review from sbryngelson as a code owner May 25, 2023 11:39
Anand Radhakrishnan and others added 3 commits May 25, 2023 07:48
@henryleberre
Copy link
Copy Markdown
Collaborator

@anandrdbz It is a very strange issue. Is there a reason for including this check on CPUs no matter the compiler but not on GPUs, apart from requiring a GPU kernel launch when using Cuda aware MPI to perform the NaN check? If we only need it for Intel Compilers, we could guard it using the __INTEL_COMPILER preprocessor definition.

@anandrdbz
Copy link
Copy Markdown
Contributor Author

anandrdbz commented May 25, 2023

@henryleberre it's only required on intel compilers (CPU), so yes we can change it from !ACC to just INTEL. I'll change it in the commit

Also, @sbryngelson , the PR says it failed on GPUs, but I verified that it works on Phoenix myself, so most likely it's just a random CI issue. You can confirm that it runs with Intel compilers as well

@sbryngelson
Copy link
Copy Markdown
Member

Also, @sbryngelson , the PR says it failed on GPUs, but I verified that it works on Phoenix myself, so most likely it's just a random CI issue. You can confirm that it runs with Intel compilers as well

Perhaps, but if that's the case then we need to fix it.

Anand and others added 2 commits May 25, 2023 13:07
@sbryngelson sbryngelson requested a review from henryleberre as a code owner May 25, 2023 19:43
@sbryngelson sbryngelson removed the request for review from henryleberre May 25, 2023 19:52
@sbryngelson
Copy link
Copy Markdown
Member

@anandrdbz I added Intel compilers back into the CI for this and you can see that they are failing.

@anandrdbz
Copy link
Copy Markdown
Contributor Author

@sbryngelson it seems like it's failing a lot further into the test cases (3D viscous + bubbles), so I'll try to see what's causing it

@sbryngelson
Copy link
Copy Markdown
Member

Some ideas (from @henryleberre): what happens if you disable mpi via --no-mpi? what if you add debug and disable mpi --no-mpi --debug? This should narrow down the places where the problems could occur.

@sbryngelson sbryngelson linked an issue Aug 1, 2023 that may be closed by this pull request
@sbryngelson sbryngelson merged commit 363d584 into MFlowCode:master Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Intel compilers require some help

3 participants