Intel Compiler Bug#161
Conversation
|
@anandrdbz It is a very strange issue. Is there a reason for including this check on CPUs no matter the compiler but not on GPUs, apart from requiring a GPU kernel launch when using Cuda aware MPI to perform the NaN check? If we only need it for Intel Compilers, we could guard it using the __INTEL_COMPILER preprocessor definition. |
|
@henryleberre it's only required on intel compilers (CPU), so yes we can change it from !ACC to just INTEL. I'll change it in the commit Also, @sbryngelson , the PR says it failed on GPUs, but I verified that it works on Phoenix myself, so most likely it's just a random CI issue. You can confirm that it runs with Intel compilers as well |
Perhaps, but if that's the case then we need to fix it. |
|
@anandrdbz I added Intel compilers back into the CI for this and you can see that they are failing. |
|
@sbryngelson it seems like it's failing a lot further into the test cases (3D viscous + bubbles), so I'll try to see what's causing it |
|
Some ideas (from @henryleberre): what happens if you disable mpi via |
The issue with the intel compilers seems to stem from the MPI_SENDRECV not occurring correctly in 2D / 3D.
I added a NaN check in the receive buffer to narrow down where this comes from, however, adding this check seems to remove this bug entirely. All test cases pass with the intel compiler.
The source of the bug seems rather bizarre, since adding a NaN check should not fundamentally change anything in the code. If I had to guess, MPI_SENDRECV using intel mpi seems to not be perfectly blocking as it should and the receive buffer is not fully populated. The extra time required to perform the check perhaps allows for the transfer to complete.
Either way, bug seems to be compiler related and not introduced in the code.
@sbryngelson , modular fp seems to fail in another case file with single precision (with gcc), so I'm not sure if this is the cause of the bug there, but we can always check