Skip to content

Add version-check of MPI in configure#1822

Merged
roystgnr merged 4 commits into
libMesh:masterfrom
BalticPinguin:master
Oct 18, 2018
Merged

Add version-check of MPI in configure#1822
roystgnr merged 4 commits into
libMesh:masterfrom
BalticPinguin:master

Conversation

@BalticPinguin
Copy link
Copy Markdown
Member

Hi,
so this is a first attempt in fixing the bug #1816 , at least if the mpi is not just taken from petsc.
I would like to get some comments whether this is the direction to go; since it is my first work with m4, maybe there are some ceveir issues in my approach.

Moreover, if PETSc is used, (as I see it), the MPI-config is just taken from there without any checks, right?
What do you think if we add a check similar to the TRY_RUN in this commit? Or is there a better way to solve it?

@roystgnr roystgnr mentioned this pull request Aug 20, 2018
Copy link
Copy Markdown
Member

@jwpeterson jwpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a step in the right direction, although I'd prefer not to copy/paste the same test in two places and it would be better to have an implementation-agnostic MPI standard check if there is a standard way of doing such things in MPI.

Here we seem to be assuming that OMPI_MAJOR_VERSION==1 implies that only the MPI-1 standard is supported. This may be true for OpenMPI but I don't think it's true for MPI implementations in general.

Comment thread m4/mpi.m4 Outdated
MPI_INCLUDES_PATHS="-I$MPI_INCLUDES_PATH"
AC_LANG_RESTORE
CPPFLAGS=$tmpCPPFLAGS
mpimajor=`grep "OMPI_MAJOR_VERSION" $MPI_INCLUDES_PATH/mpi/mpi.h | sed -e "s/#define OMPI_MAJOR_VERSION[ ]*//g"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy/pasted version of the same test as above? Also, if I'm reading this correctly your checks are specific to OpenMPI?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; currently it is specific.
I just need to look closer at different statements....

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of MPI-1.2 there's MPI_VERSION and MPI_SUBVERSION in the standard. Can't we just test those values to make sure they're large enough, or scream and die if we don't see them defined at all?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of MPI-1.2 there's MPI_VERSION and MPI_SUBVERSION in the standard.

Those are the hooks we should use. Becomes very easy to test too!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right; this string is available as well; I somehow didn't recognize it first.
But than the situation of the issue #1816 (with which I started) is a bit different: mpi.h says

  #define MPI_VERSION 2
  #define MPI_SUBVERSION 1

but

  #define OMPI_MAJOR_VERSION 1
  #define OMPI_MINOR_VERSION 6
  #define OMPI_RELEASE_VERSION 5

that means that MPI 2.1 is not fully supported by libMesh.
But maybe you are right: We should use the MPI_VERSION-string and decide on some minimum version to be supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BalticPinguin if I understand correctly, OpenMPI 1.6.5 claims to be MPI standard 2.1 compliant, and at least in this case, it is, since MPI_Pack with a non-const pointer first argument is in the MPI-2.1 standard, see e.g. page 121 of this copy of the MPI-2.1 standard. So it appears that we are the ones using MPI_Pack incorrectly, although perhaps this API changed in MPI-2.2, I am checking that now.

Regardless, while it will be nice to know the MPI_VERSION and MPI_SUBVERSION an implementation claims to support, we will still always need AC_TRY_COMPILE tests to be absolutely sure.

Copy link
Copy Markdown
Member

@jwpeterson jwpeterson Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument of MPI_Pack is non-const in MPI-2.2 as well, see pg. 121 of this PDF. It first became const in MPI-3.0. Therefore, MPI standard 2 vs. MPI standard 3 has always been the issue here, not MPI 1 vs 2.

Comment thread m4/mpi.m4 Outdated
MPI_IMPL="built-in"
AC_MSG_RESULT( [$CXX Compiler Supports MPI] )
AC_DEFINE(HAVE_MPI, 1, [Flag indicating whether or not MPI is available])
AC_TRY_RUN([@%:@include <mpi.h>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote to stay away from AC_TRY_RUN for MPI-enabled code. Some clusters won't let you run MPI-compiled binaries on the head node, so we'd get a false positive from this test on those systems.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right; this is in general considered bad style (maybe for the very same reason as you write).
However, I understand it such that we end up here only if we didn't find the mpi.h-file, so that some grep is not accessible.

So I see two solutions: either we assume in this case that the version is >1 and leave this test away, or one tests first whether AC_TRY_RUN works with mpi and than checks the version.
But maybe there is a more elegant way?

Copy link
Copy Markdown
Member Author

@BalticPinguin BalticPinguin Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Ok; I just found out that there is e.g. the command: mpichversion; probably there are similar ones for the other implementations and one just tries to link and if it works, we test all these commands one after the other?
This assumes that only the correct mpi-version answers; but in a well-configured environment this should be given, right?

Edit: the ompi-command is ompi_info --version; but maybe mpiexec --version is even better and supported by all implementations; I will look this up.

@BalticPinguin
Copy link
Copy Markdown
Member Author

I deduce from the discussion in #1827 that one can simplify things: Doing the checks for MPI as before, there are two cases:

  • no (working) MPI is found -> do not use MPI.
  • MPI is found; than (in addition) check that MPI_VERSION and MPI_SUBVERSION are defined; if not, one might throw a warning and configure without MPI (assuming it is a very old version).

Than we can use the predefined version-strings inside the library to distinguish different algorithms; than the config-part is mainly independent from this (I would still try to exclude MPI_VERSION=1 in the config-part, if we have access to mpi.h) and we still have access to the MPI-versioning.

Did I forget something important now?
If there are no complaints, I would create an update PR in the next days.

As @friedmud pointed out, we than rely on the fact that the given version is consistent for all functions we use. There might be some reason to hope that this is true, but IIRC, I learned in my HPC-lectures that various implementations often vary rom the MPI-standard for some more fancy functions (whatever this means). If this happened for some libMesh-functions, it would cause some real trouble.

@roystgnr
Copy link
Copy Markdown
Member

Based on #1827 I'd say we ought to:

  • Scream and configure without MPI if we detect MPI-1
  • Scream a "deprecated!" warning if we detect MPI-2
  • Happily proceed if we detect MPI-3

And then we should probably put out a libMesh 1.3.1 release so that we can break MPI-2 compatibility afterward and we don't have to force @friedmud to create a fallback code path for #1826.

As @friedmud pointed out, we than rely on the fact that the given version is consistent for all functions we use. There might be some reason to hope that this is true, but IIRC, I learned in my HPC-lectures that various implementations often vary rom the MPI-standard for some more fancy functions (whatever this means). If this happened for some libMesh-functions, it would cause some real trouble.

This is absolutely a worry, but I'm not sure what we can do about it. If an MPI implementation is missing a function it's supposed to have we can try to detect that at configure time rather than compile time, but unless we're creating MPI-2 fallback code paths there's not much we can do to be helpful about it. If an MPI implementation is mis-implementing a function that would be an utter disaster we can't even detect until runtime. In either way I'm not sure how much we can do to try and preemptively fix things.

@BalticPinguin
Copy link
Copy Markdown
Member Author

But according to the above discussion, #1816 actually issues that MPI-2 support is missing. The question is whether one wants to add MPI-2 support for only one release?
I fear that the error given in #1816 is by far not the only one and I am not sure someone has motivation to fix it, knowing that it will be removed few months later.
Also, since it worked for quite some time without complains, there might be no one really bothering about MPI-2 .

@roystgnr
Copy link
Copy Markdown
Member

Sorry, I'm not clearly separating two different issues.

#1816 is about MPI-1 support missing (first broken, then deliberately removed after we realized that nobody had encountered the breakage for a year)

#1827 is about the desire to remove MPI-2 support (which should be working fine now and for every single past release, but which wouldn't be sufficient to support the current version of #1826)

I'm not personally eager to remove MPI-2 support, but the alternative of "we can't merge this major optimization until someone sets up a non-MPI-3 code path" isn't great either, and in #1827 the consensus seemed to be that nobody thought continuing MPI-2 support was worthwhile, in which case I'd like to just get it marked deprecated for one final release and then require MPI-3. Since the work to do that mostly overlaps the work to test MPI version here, I figured it could be part of the same updated PR?

@BalticPinguin
Copy link
Copy Markdown
Member Author

I think the discussion is distributed among too many threads and comment-sections by now; even for me constantly following it becomes hard to track the different branches :-)

After the first comment in this PR, me and @jwpeterson discussed some issues, finding out that the error #1816 indicates that no MPI smaller 3.0 is supported (see this comment).

With the rest I perfectly agree.

@roystgnr
Copy link
Copy Markdown
Member

Holy hell, I missed that comment; thank you.

@roystgnr
Copy link
Copy Markdown
Member

It appears that I put the non-backwards-compatible MPI_Pack invocations into libMesh on May 7, which postdates our last major release on April 9. If that's the case then I don't think this is a clear "we've already dropped MPI-2 support, might as well make it official" situation, not like we had with MPI-1. It's definitely more evidence in that direction, though...

@BalticPinguin
Copy link
Copy Markdown
Member Author

So, finally I have created another code which is, up to some testing, almost ready. But I have some questions about it:
First: It is completely new code: My idea was to write another function which is called after a working MPI is found and which just checks the version. Therefore the question: Should I submit it with another commit or is it more convenient if I overwrite the current one?
My second question concerns testing: I am working with the 'module'-environment to link against different libraries. But I have seen that in all cases, the config finds the '/usr/include/mpi/mpi.h' (the native one) and does

configure:34881: mpicxx -c  -std=gnu++11 -I/usr/include/mpi  conftest.cpp >&5
configure:34881: $? = 0
configure:34881: result: yes
configure:34881: checking mpi.h presence
configure:34881: mpicxx -E -I/usr/include/mpi  conftest.cpp
configure:34881: $? = 0
configure:34881: result: yes
configure:34881: checking for mpi.h
configure:34881: result: yes

But thereafter, when searching for the MPI_VERSION, the config.log says

configure:34993: mpicxx -o conftest  -std=gnu++11   conftest.cpp  >&5

(i.e. without the -I/usr/include/mpi option) and than seems to link against the mpi from the module-environment; (according to the output following).
Long story, short question: There seems to be an error in my environment; but more important is for me currently: How do I make the checks consistent with the earlier ones?

@jwpeterson
Copy link
Copy Markdown
Member

First: It is completely new code: My idea was to write another function which is called after a working MPI is found and which just checks the version.

Sounds like a good idea to me. I was thinking of doing something similar myself; there may even be a canned autoconf test that already does this...

Should I submit it with another commit or is it more convenient if I overwrite the current one?

That's up to you, I don't mind either way.

Long story, short question: There seems to be an error in my environment; but more important is for me currently: How do I make the checks consistent with the earlier ones?

Is it possible that the mpicxx compiler script is being taken from your environment's $PATH variable, and corresponds to a different MPI than the one that is in /usr/include/mpi? You can check the output of which mpicxx as well as looking at the contents of mpicxx -show.

@BalticPinguin
Copy link
Copy Markdown
Member Author

BalticPinguin commented Aug 31, 2018

Long story, short question: There seems to be an error in my environment; but more important is for me currently: How do I make the checks consistent with the earlier ones?

Is it possible that the mpicxx compiler script is being taken from your environment's $PATH variable, and corresponds to a different MPI than the one that is in /usr/include/mpi? You can check the output of which mpicxx as well as looking at the contents of mpicxx -show.

Yes, this seems to be what happens. I think a main issue is that the mpi-config searches for MPIHOME and if it is empty, it starts searching the /usr-dir. When I set the MPIHOME to the module-path, it works well.
From the user-side, the easy workaround is to set the path in the options of ./configure; a possible improvement would be to search $INCLUDE before defaulting to /usr.
Anyway, if ACX_MPI finds and checks a particular library than the version-check must use the same.

Besides handling of this issue, now the new suggestion is up. I hope it looks better to you and I didn't forget about too many things.

@BalticPinguin
Copy link
Copy Markdown
Member Author

Now I have added also a search through $LIBRARY_PATH and $INCLUDE to find the proper library and it works for me so far.
Unfortunately, I don't understand where all the errors in the tests come from; for me they look quite unrelated.
Also, I had a look at a config in some 'Test debug' check and there the new check of MPI-version seems to be skipped: Did I forget about some cases (where mpi is taken from a dependent library) or is autoconf not run by (some of) the checks?

@brianmoose
Copy link
Copy Markdown
Contributor

Could you try rebasing on current libmesh master and re-pushing? The newly added submodule seems be a problem since your HEAD doesn't have it.

@BalticPinguin
Copy link
Copy Markdown
Member Author

ah, great. thank you @brianmoose for the hint. Now it seems to work better.

Comment thread m4/mpi.m4 Outdated

dnl if MPI is not set in the config, search it via the $INCLUDE and $LIBRARY_PATH variables;
dnl with this we can hope to get a configure that is consistent with the environment.
AS_IF([test "$MPI" == "/.."],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a single equals sign in this test I think. It's possible that some shells support both = and == but the former is more portable IIRC.

Comment thread m4/mpi.m4 Outdated
break])
done
])
AS_IF([test "$MPI" == "/.."],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single equals sign.

Comment thread m4/mpi.m4 Outdated
dnl check that MPI_VERSION is not 2; else through a warning
AC_TRY_LINK([@%:@include <mpi.h>],
[
#if MPI_VERSION == 2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These # signs can use the same @%:@ quadrigraph that is used for includes. Shouldn't make any difference in the generated configure script but it does help syntax highlighting in some editors (emacs).

Comment thread m4/mpi.m4 Outdated
],
[
dnl MPI_VERSION >2: Be happy and configure with MPI.
AC_MSG_RESULT([ The MPI found has version >= 3.0. ]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the logic here? If the previous two tests for MPI_VERSION==1 and 2 fail, it is assumed to be version 3? That seems dangerous, perhaps there is just something wrong with the MPI installation that caused the preceding two tests to fail... I'd rather have a successful test for MPI_VERSION==3 along with (eventually) some API tests for MPI_VERSION==3 only code before concluding that we have MPI-3 support.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. You are right. I only had in mind that at some point libraries with MPI_VERSION 4 will come out; but the appearance of some strange definitions is more important to catch.

@BalticPinguin
Copy link
Copy Markdown
Member Author

thanks for the comments; I hope you agree with the more compact form of version-check; if you rather want the different cases as separate checks (to make the AC_MSG_* clearer), let me know, than I can change it back. But I like that we have less nested conditions...

@BalticPinguin
Copy link
Copy Markdown
Member Author

... before this PR gets lost: Can someone comment on it? Is there some issue left?

@jwpeterson
Copy link
Copy Markdown
Member

jwpeterson commented Sep 24, 2018 via email

Comment thread m4/mpi.m4 Outdated
@%:@error "MPI-version seems to be < 1.5."
@%:@endif
@%:@if MPI_VERSION > 3
@%:@error "MPI-version too high. There is some error."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to do this? Seems like it will just break things eventually:
https://www.mpi-forum.org/mpi-40/

Ideally I'd like to print whatever MPI_VERSION we see to make debugging actual "some error" cases, but screaming and dying is too much.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would prefer to distinguish the different cases and separate these errors?
I excluded MPI 4 because the standard seems to be still under discussion and so there should be no compatible implementation of it...

Comment thread m4/mpi.m4 Outdated
[ dnl MPI_VERSION is not defined.
AC_MSG_WARN(["ERROR: MPI-version seems to be too low: MPI < 1.5. Disable MPI now..."]); enablempi=no
[ dnl MPI_VERSION is not defined or has unexpected value.
AC_MSG_WARN(["ERROR: MPI-version seems to be too low: Need MPI 2.X or 3.X. Disable MPI now..."]); enablempi=no
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "2.x or compatible" - that'll be accurate whether a fully-backwards-compatible or deprecated-stuff-removed releases come out in the future.

@BalticPinguin
Copy link
Copy Markdown
Member Author

Ok, I hope I addressed your points appropriately.
Sorry that you have so much work with it...

@jwpeterson
Copy link
Copy Markdown
Member

If @roystgnr is happy with the overall logic, I will go ahead and fix a few indentation/spacing issues before merge.

@roystgnr
Copy link
Copy Markdown
Member

Looks good to me; thanks everyone!

Comment thread m4/mpi.m4 Outdated
dnl with this we can hope to get a configure that is consistent with the environment.
AS_IF([test "$MPI" = "/.."],
[
for i in $(echo $LIBRARY_PATH | tr ':' '\n')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this SO question, $LIBRARY_PATH is searched by gcc/g++ when linking. Anyone know if that's true for other compilers as well? I don't believe LIBRARY_PATH is something I've ever used on any system...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point; so we probably should do this step only if the gcc is used...

I have just seen that according to the configure-script, libraries can be also given via

  CXXFLAGS    C++ compiler flags
  LDFLAGS     linker flags, e.g. -L<lib dir> 
  LIBS        libraries to pass to the linker, e.g. -l<library>
  CPPFLAGS    (Objective) C/C++ preprocessor flags, e.g. -I<include dir>
  CFLAGS      C compiler flags
  libmesh_CPPFLAGS      User-specified C/C++ preprocessor flags
  libmesh_CXXFLAGS      User-specified C++ compilation flags
  libmesh_CFLAGS           User-specified C compilation flags

So we should check these variables as well, shouldn't we?

Comment thread m4/petsc.m4

AS_IF([test "x$PETSC_MPI" != x],
[AC_DEFINE(HAVE_MPI, 1, [Flag indicating whether or not MPI is available])])
[VERSION_MPI])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I'm clear: we don't run the risk of detecting/using a different MPI from the one PETSc is using by running the VERSION_MPI check right here, right? Many systems have multiple MPIs installed on them, and we have to be careful not to link against more than one by accident...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't set any variables (except for HAVE_MPI) in the VERSION_MPI-function, I don't think this leads to problems of actually linking against different MPI-libraries.
But the detection of another library than that used by Petsc is of course an issue; in my understanding the AC_LANG_SAVE and AC_LANG_CPLUSPLUS should take care of this:

  1. The the current status is saved
  2. The cpp-preferences are loaded (such as LD-LIBRARY_PATH etc).
    This should avoid cases that wrong libraries are loaded.

I tried testing this but I always got errors due to compiler-inconsistencies and so the Petsc-dependency was removed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct me if I am wrong, but at this point it is ensured that we use the same compiler as that used to compile PETSc; this should ensure that we have the same MPI, since it is linked with the compiler.

Moreover, if there is a way to link to the wrong MPI-library in VERSION_MPI, than we are using an inconsistent setup anyway; this should be ensured before already.

@BalticPinguin
Copy link
Copy Markdown
Member Author

BalticPinguin commented Sep 28, 2018

ok; I have updated the last commit (I hope it is not too inconvenient for you, but since it were small changes in both cases...).
I just accounted for the LIBRARY_PATH-issue mentioned by @jwpeterson.
The point with petsc is still open, but I keep trying to find a more definite answer, but currently all tests fail because no "simple PETSc-program can be compiled" with the inconsistent options I tried.
Edit:
it should be "simple PETSc-program can not be compiled". Sorry.

@BalticPinguin
Copy link
Copy Markdown
Member Author

Just to not forget about this completely: In my viewpoint, this PR is complete so far; if you'd like me to add/change something, please let me know.
If you accept it, I can put all changes into one commit; I think this is better for the history.
Best,
Hubert

@roystgnr
Copy link
Copy Markdown
Member

@jwpeterson, any more concerns holding up a merge?

@jwpeterson
Copy link
Copy Markdown
Member

I don't have any specific remaining issues, but I have not had a chance to perform detailed testing with these changes yet, and don't really know when I will... I suppose I'm OK with merging it and addressing an issues which arise later.

@roystgnr
Copy link
Copy Markdown
Member

Hmmm... OK. I know @friedmud is planning a MOOSE libMesh module update ASAP, but we've got everything else that's ready to merge merged, so if he's also worried about this then he can just update to 05101bc for now.

@roystgnr roystgnr merged commit bc6cea6 into libMesh:master Oct 18, 2018
roystgnr added a commit to roystgnr/libmesh that referenced this pull request Oct 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants