Add version-check of MPI in configure#1822
Conversation
jwpeterson
left a comment
There was a problem hiding this comment.
Looks like a step in the right direction, although I'd prefer not to copy/paste the same test in two places and it would be better to have an implementation-agnostic MPI standard check if there is a standard way of doing such things in MPI.
Here we seem to be assuming that OMPI_MAJOR_VERSION==1 implies that only the MPI-1 standard is supported. This may be true for OpenMPI but I don't think it's true for MPI implementations in general.
| MPI_INCLUDES_PATHS="-I$MPI_INCLUDES_PATH" | ||
| AC_LANG_RESTORE | ||
| CPPFLAGS=$tmpCPPFLAGS | ||
| mpimajor=`grep "OMPI_MAJOR_VERSION" $MPI_INCLUDES_PATH/mpi/mpi.h | sed -e "s/#define OMPI_MAJOR_VERSION[ ]*//g"` |
There was a problem hiding this comment.
Is this a copy/pasted version of the same test as above? Also, if I'm reading this correctly your checks are specific to OpenMPI?
There was a problem hiding this comment.
yes; currently it is specific.
I just need to look closer at different statements....
There was a problem hiding this comment.
As of MPI-1.2 there's MPI_VERSION and MPI_SUBVERSION in the standard. Can't we just test those values to make sure they're large enough, or scream and die if we don't see them defined at all?
There was a problem hiding this comment.
As of MPI-1.2 there's MPI_VERSION and MPI_SUBVERSION in the standard.
Those are the hooks we should use. Becomes very easy to test too!
There was a problem hiding this comment.
Oh, you are right; this string is available as well; I somehow didn't recognize it first.
But than the situation of the issue #1816 (with which I started) is a bit different: mpi.h says
#define MPI_VERSION 2
#define MPI_SUBVERSION 1
but
#define OMPI_MAJOR_VERSION 1
#define OMPI_MINOR_VERSION 6
#define OMPI_RELEASE_VERSION 5
that means that MPI 2.1 is not fully supported by libMesh.
But maybe you are right: We should use the MPI_VERSION-string and decide on some minimum version to be supported.
There was a problem hiding this comment.
@BalticPinguin if I understand correctly, OpenMPI 1.6.5 claims to be MPI standard 2.1 compliant, and at least in this case, it is, since MPI_Pack with a non-const pointer first argument is in the MPI-2.1 standard, see e.g. page 121 of this copy of the MPI-2.1 standard. So it appears that we are the ones using MPI_Pack incorrectly, although perhaps this API changed in MPI-2.2, I am checking that now.
Regardless, while it will be nice to know the MPI_VERSION and MPI_SUBVERSION an implementation claims to support, we will still always need AC_TRY_COMPILE tests to be absolutely sure.
There was a problem hiding this comment.
The first argument of MPI_Pack is non-const in MPI-2.2 as well, see pg. 121 of this PDF. It first became const in MPI-3.0. Therefore, MPI standard 2 vs. MPI standard 3 has always been the issue here, not MPI 1 vs 2.
| MPI_IMPL="built-in" | ||
| AC_MSG_RESULT( [$CXX Compiler Supports MPI] ) | ||
| AC_DEFINE(HAVE_MPI, 1, [Flag indicating whether or not MPI is available]) | ||
| AC_TRY_RUN([@%:@include <mpi.h> |
There was a problem hiding this comment.
I'd vote to stay away from AC_TRY_RUN for MPI-enabled code. Some clusters won't let you run MPI-compiled binaries on the head node, so we'd get a false positive from this test on those systems.
There was a problem hiding this comment.
you are right; this is in general considered bad style (maybe for the very same reason as you write).
However, I understand it such that we end up here only if we didn't find the mpi.h-file, so that some grep is not accessible.
So I see two solutions: either we assume in this case that the version is >1 and leave this test away, or one tests first whether AC_TRY_RUN works with mpi and than checks the version.
But maybe there is a more elegant way?
There was a problem hiding this comment.
... Ok; I just found out that there is e.g. the command: mpichversion; probably there are similar ones for the other implementations and one just tries to link and if it works, we test all these commands one after the other?
This assumes that only the correct mpi-version answers; but in a well-configured environment this should be given, right?
Edit: the ompi-command is ompi_info --version; but maybe mpiexec --version is even better and supported by all implementations; I will look this up.
|
I deduce from the discussion in #1827 that one can simplify things: Doing the checks for MPI as before, there are two cases:
Than we can use the predefined version-strings inside the library to distinguish different algorithms; than the config-part is mainly independent from this (I would still try to exclude Did I forget something important now? As @friedmud pointed out, we than rely on the fact that the given version is consistent for all functions we use. There might be some reason to hope that this is true, but IIRC, I learned in my HPC-lectures that various implementations often vary rom the MPI-standard for some more fancy functions (whatever this means). If this happened for some libMesh-functions, it would cause some real trouble. |
|
Based on #1827 I'd say we ought to:
And then we should probably put out a libMesh 1.3.1 release so that we can break MPI-2 compatibility afterward and we don't have to force @friedmud to create a fallback code path for #1826.
This is absolutely a worry, but I'm not sure what we can do about it. If an MPI implementation is missing a function it's supposed to have we can try to detect that at configure time rather than compile time, but unless we're creating MPI-2 fallback code paths there's not much we can do to be helpful about it. If an MPI implementation is mis-implementing a function that would be an utter disaster we can't even detect until runtime. In either way I'm not sure how much we can do to try and preemptively fix things. |
|
But according to the above discussion, #1816 actually issues that MPI-2 support is missing. The question is whether one wants to add MPI-2 support for only one release? |
|
Sorry, I'm not clearly separating two different issues. #1816 is about MPI-1 support missing (first broken, then deliberately removed after we realized that nobody had encountered the breakage for a year) #1827 is about the desire to remove MPI-2 support (which should be working fine now and for every single past release, but which wouldn't be sufficient to support the current version of #1826) I'm not personally eager to remove MPI-2 support, but the alternative of "we can't merge this major optimization until someone sets up a non-MPI-3 code path" isn't great either, and in #1827 the consensus seemed to be that nobody thought continuing MPI-2 support was worthwhile, in which case I'd like to just get it marked deprecated for one final release and then require MPI-3. Since the work to do that mostly overlaps the work to test MPI version here, I figured it could be part of the same updated PR? |
|
I think the discussion is distributed among too many threads and comment-sections by now; even for me constantly following it becomes hard to track the different branches :-) After the first comment in this PR, me and @jwpeterson discussed some issues, finding out that the error #1816 indicates that no MPI smaller 3.0 is supported (see this comment). With the rest I perfectly agree. |
|
Holy hell, I missed that comment; thank you. |
|
It appears that I put the non-backwards-compatible MPI_Pack invocations into libMesh on May 7, which postdates our last major release on April 9. If that's the case then I don't think this is a clear "we've already dropped MPI-2 support, might as well make it official" situation, not like we had with MPI-1. It's definitely more evidence in that direction, though... |
|
So, finally I have created another code which is, up to some testing, almost ready. But I have some questions about it: But thereafter, when searching for the (i.e. without the |
Sounds like a good idea to me. I was thinking of doing something similar myself; there may even be a canned autoconf test that already does this...
That's up to you, I don't mind either way.
Is it possible that the |
Yes, this seems to be what happens. I think a main issue is that the mpi-config searches for Besides handling of this issue, now the new suggestion is up. I hope it looks better to you and I didn't forget about too many things. |
|
Now I have added also a search through |
|
Could you try rebasing on current libmesh master and re-pushing? The newly added submodule seems be a problem since your HEAD doesn't have it. |
0194da5 to
d537ccc
Compare
|
ah, great. thank you @brianmoose for the hint. Now it seems to work better. |
|
|
||
| dnl if MPI is not set in the config, search it via the $INCLUDE and $LIBRARY_PATH variables; | ||
| dnl with this we can hope to get a configure that is consistent with the environment. | ||
| AS_IF([test "$MPI" == "/.."], |
There was a problem hiding this comment.
Should be a single equals sign in this test I think. It's possible that some shells support both = and == but the former is more portable IIRC.
| break]) | ||
| done | ||
| ]) | ||
| AS_IF([test "$MPI" == "/.."], |
| dnl check that MPI_VERSION is not 2; else through a warning | ||
| AC_TRY_LINK([@%:@include <mpi.h>], | ||
| [ | ||
| #if MPI_VERSION == 2 |
There was a problem hiding this comment.
These # signs can use the same @%:@ quadrigraph that is used for includes. Shouldn't make any difference in the generated configure script but it does help syntax highlighting in some editors (emacs).
| ], | ||
| [ | ||
| dnl MPI_VERSION >2: Be happy and configure with MPI. | ||
| AC_MSG_RESULT([ The MPI found has version >= 3.0. ]); |
There was a problem hiding this comment.
Not sure I understand the logic here? If the previous two tests for MPI_VERSION==1 and 2 fail, it is assumed to be version 3? That seems dangerous, perhaps there is just something wrong with the MPI installation that caused the preceding two tests to fail... I'd rather have a successful test for MPI_VERSION==3 along with (eventually) some API tests for MPI_VERSION==3 only code before concluding that we have MPI-3 support.
There was a problem hiding this comment.
hmm. You are right. I only had in mind that at some point libraries with MPI_VERSION 4 will come out; but the appearance of some strange definitions is more important to catch.
|
thanks for the comments; I hope you agree with the more compact form of version-check; if you rather want the different cases as separate checks (to make the |
|
... before this PR gets lost: Can someone comment on it? Is there some issue left? |
|
Sorry, I will try and take a look at this today.
… On Sep 24, 2018, at 2:47 AM, Hubert Weißmann ***@***.***> wrote:
... before this PR gets lost: Can someone comment on it? Is there some issue left?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
| @%:@error "MPI-version seems to be < 1.5." | ||
| @%:@endif | ||
| @%:@if MPI_VERSION > 3 | ||
| @%:@error "MPI-version too high. There is some error." |
There was a problem hiding this comment.
Are we sure we want to do this? Seems like it will just break things eventually:
https://www.mpi-forum.org/mpi-40/
Ideally I'd like to print whatever MPI_VERSION we see to make debugging actual "some error" cases, but screaming and dying is too much.
There was a problem hiding this comment.
So you would prefer to distinguish the different cases and separate these errors?
I excluded MPI 4 because the standard seems to be still under discussion and so there should be no compatible implementation of it...
| [ dnl MPI_VERSION is not defined. | ||
| AC_MSG_WARN(["ERROR: MPI-version seems to be too low: MPI < 1.5. Disable MPI now..."]); enablempi=no | ||
| [ dnl MPI_VERSION is not defined or has unexpected value. | ||
| AC_MSG_WARN(["ERROR: MPI-version seems to be too low: Need MPI 2.X or 3.X. Disable MPI now..."]); enablempi=no |
There was a problem hiding this comment.
I'd say "2.x or compatible" - that'll be accurate whether a fully-backwards-compatible or deprecated-stuff-removed releases come out in the future.
|
Ok, I hope I addressed your points appropriately. |
|
If @roystgnr is happy with the overall logic, I will go ahead and fix a few indentation/spacing issues before merge. |
|
Looks good to me; thanks everyone! |
| dnl with this we can hope to get a configure that is consistent with the environment. | ||
| AS_IF([test "$MPI" = "/.."], | ||
| [ | ||
| for i in $(echo $LIBRARY_PATH | tr ':' '\n') |
There was a problem hiding this comment.
According to this SO question, $LIBRARY_PATH is searched by gcc/g++ when linking. Anyone know if that's true for other compilers as well? I don't believe LIBRARY_PATH is something I've ever used on any system...
There was a problem hiding this comment.
This is a good point; so we probably should do this step only if the gcc is used...
I have just seen that according to the configure-script, libraries can be also given via
CXXFLAGS C++ compiler flags
LDFLAGS linker flags, e.g. -L<lib dir>
LIBS libraries to pass to the linker, e.g. -l<library>
CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I<include dir>
CFLAGS C compiler flags
libmesh_CPPFLAGS User-specified C/C++ preprocessor flags
libmesh_CXXFLAGS User-specified C++ compilation flags
libmesh_CFLAGS User-specified C compilation flags
So we should check these variables as well, shouldn't we?
|
|
||
| AS_IF([test "x$PETSC_MPI" != x], | ||
| [AC_DEFINE(HAVE_MPI, 1, [Flag indicating whether or not MPI is available])]) | ||
| [VERSION_MPI]) |
There was a problem hiding this comment.
Just so I'm clear: we don't run the risk of detecting/using a different MPI from the one PETSc is using by running the VERSION_MPI check right here, right? Many systems have multiple MPIs installed on them, and we have to be careful not to link against more than one by accident...
There was a problem hiding this comment.
Since we don't set any variables (except for HAVE_MPI) in the VERSION_MPI-function, I don't think this leads to problems of actually linking against different MPI-libraries.
But the detection of another library than that used by Petsc is of course an issue; in my understanding the AC_LANG_SAVE and AC_LANG_CPLUSPLUS should take care of this:
- The the current status is saved
- The cpp-preferences are loaded (such as
LD-LIBRARY_PATHetc).
This should avoid cases that wrong libraries are loaded.
I tried testing this but I always got errors due to compiler-inconsistencies and so the Petsc-dependency was removed.
There was a problem hiding this comment.
correct me if I am wrong, but at this point it is ensured that we use the same compiler as that used to compile PETSc; this should ensure that we have the same MPI, since it is linked with the compiler.
Moreover, if there is a way to link to the wrong MPI-library in VERSION_MPI, than we are using an inconsistent setup anyway; this should be ensured before already.
2d648dd to
770abc0
Compare
|
ok; I have updated the last commit (I hope it is not too inconvenient for you, but since it were small changes in both cases...). |
|
Just to not forget about this completely: In my viewpoint, this PR is complete so far; if you'd like me to add/change something, please let me know. |
|
@jwpeterson, any more concerns holding up a merge? |
|
I don't have any specific remaining issues, but I have not had a chance to perform detailed testing with these changes yet, and don't really know when I will... I suppose I'm OK with merging it and addressing an issues which arise later. |
Hi,
so this is a first attempt in fixing the bug #1816 , at least if the mpi is not just taken from petsc.
I would like to get some comments whether this is the direction to go; since it is my first work with m4, maybe there are some ceveir issues in my approach.
Moreover, if PETSc is used, (as I see it), the MPI-config is just taken from there without any checks, right?
What do you think if we add a check similar to the TRY_RUN in this commit? Or is there a better way to solve it?