-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix build of xSDKTrilinos with static TPL libs #2
Comments
@BarrySmith , the current 'master' version of PETSc fails the download of SuperLUDist. Do you know what happened with this? See the detailed error below. CC: @jwillenbring Detailed Notes: Trying to produce a static build of xSDK with xSDKTrilinos ... I cloned the installxSDK repo with:
I then attempted to do the static build with:
but it got the following error in the configure of NetCDF:
I sent Barry and Jason an email with the detailed configure.log file. I just realized that the configure script for PETSc that Alicia and I were using did not download and install NetCDF or ExodusII or Metis or Boost. Barry said that the installxSDK.sh script is actaully currently cloning PETSc from bitbucket master, cloning Trilinos on the 12.6 branch and then cloning xSDKTrilinos on its master branch. Therefore, I will go back and try doing a static build of PETSc then Trilinos, then xSDKTrilinos individually and see if I can reproduce this problem. 1) Install PETSc and TPLs with static libs: Using PETSc on master:
Configure, build, and install PETSc and its TPLs:
This failed to configure, showing the failure:
It appears as though that tag does not exist in that git repo:
The remote is actually a PETSc mirror:
I did a fetch:
and it shows:
What is up with that? Why is PETSc 'master' pointing to a tag that does not even exist? It seems this change got made 3 days ago in the commit:
I don't understand how this works for anyone else. |
@BarrySmith, can someone please point me to a version of PETSc that is known to work for the arguments:
? The problem is the that download (git clone then checkout at a commit or tag) fails. Detailed Notes: since I can't configure and build the current 'master' version of PETSc, I am going to try to get an older version to work. How about the version right before b8a3fda2eec0? 1) Install PETSc and TPLs with static libs: Using PETSc on master:
Configure, build, and install PETSc and its TPLs:
This time I got the failure:
Darn, this could be harder that I imagined. |
Sorry about this. I will check why the SuperLU_Dist is not working. First it should be defaulting to using the tarball so I am not sure why the it is using the git version. I may have made a mistake in the tag name. |
Hmm, I just got the superlu_dist repository and it definitely has the tag v5.0.0 $ git clone https://github.com/xiaoyeli/superlu_dist.git ~/Src/superlu_dist ((v5.0.0)) Can you try deleting the entire directory under installxSDK.sh and run it again. If that fails email the configure.log from the petsc directory to [email protected] |
I was able to run with a clean build and get the failure in xSDKTrilinos as you predicted due to library link order issues. Here is the relevent information. If you are able to update the xSDKTrilinos cmake stuff I can try again. Just let me know. |
Barry, Okay, I know what happened. When I ran
Generally, I consider this to be a feature of git clean (and we rely on this with CASL VERA when cleaning all of the *.pyc files before creating a release tarball). So that means to completely clean you can't just run
(This is an example of why it is generally desirable to put build artifacts under a different directory not under the git repo. That way, you can safely use Anyway, I am sure it will work fine now (i.e. I should be able to reproduce the error on my machine). Thanks! |
What is likely happening is described in TriBITSPub/TriBITS#115. The fix will be a simple hack (i.e. list all of the TPL dependencies, even the indirect ones, in xSDKTrilinos/cmake/Dependencies.cmake). The correct fix will come when I can work TriBITSPub/TriBITS#115. |
Do you have write access to xSDKTrilinos or do we need to make fork to put in the fix? Barry
|
I have write access. Once I have this reproduced, I will put in the fix and push. Will be tomorrow before I get that finished though. |
I have updated these scripts to work with the current version of PETSc 'master' that pulls SuperLUDist from the GitHub repo and uses tag v5.0.0. I also updated the scripts to require that the env vars PETSC_INSTALL_DIR and TRILINOS_INSTALL_DIR be explicitly set by the user and not assumed. This tripped my up the first time I tried to do static lib build and it found the wrong install dir first try.
I hacked and added -lX11 and -lssl to xSDKTrilinos_EXTRA_LINK_FLAGS since PETSc (or one of the packages that PETSc is building) is finding and linking against these libraries on the ORNL CASL Fissile4 machines. NOTE: This is not really a hack since this script is specific to the Fissile4 machines anyway.
Barry, I did a full static build of PETSc, Trilinos, and xSDKTrilinos manually as described below. Other than having to update my configure scripts for SuperLUDist v5.0.0 and having to add The issue may be in how Trilinos and/or xSDKTrilinos are being configured by installxSDK.sh or PETSc when static libs are used. Unfortunately, I can't reproduce the problem with installxSDK.sh on my own machine because it bombs in the configure of NetCDF on my machine as described above (which is perhaps an issue that should be looked at separately because it impacts the portability of xSDK). Is there a way to turn off the download/install of NetCDF and other problemantic libraries with installxSDK.sh? Do I do this using Detailed Notes: Reproducing failed static build ... 1) Install PETSc with static libs: Using PETSc on master:
Configure, build, and install PETSc and its TPLs:
This produced the install tree:
2) Install Trilinos packages needed by xSDKTrilinos with static libs: The version of Trilinos on the branch
Configure, build and install Trilinos packages needed by xSDKTrilinos:
This installed:
3) Build xSDKTrilinos wtih static libs: The xSDKTrilinos version of 'master' is:
Configure and build xSDKTrilinos against installed Trilinos:
This resulted in the build failure:
The listed order of the TPL libraries listed above is libpetsc.a, libHYPRE.a, libsuperlu_dist.a, libparmetis.a, libmetis.a, and liblapack.a. Is that not the currect order? Looking at the link errors, what library is supposed to provide What library provides ... So adding
I did this in the xSDKTrilinos commit:
With that change, all of xSDKTrilinos links and passes all of the tests:
What this means is that I can't find a problem with the link order of the libraries. My guess is that the problem is with how Trilinos and/or xSDKTrilinos are being configured. |
Yes --with-netcdf=0 and --with-exodusii=0 will turn off their installs but I am not sure if Trilinos will build on without those packages. You can email the petsc/configure.log to [email protected] for the failed netcdf case so we can try to figure out why it failed to build. We can usually figure out failed builds from configure.log |
Barry, I sent two emails to the list [email protected]. One for a --download-netcdf failure and one for a --download-boost failure.
What would be nice is if PETSc would react to --with-netcdf=0 by configuring Trilinos with You can do the same thing with the Boost and BoostLib TPLs too if --with-boost=0 is passed in. That is, if --with-boost=0 is passed in, then configure Trilinos with The issue is that we not need NetCDF or Boost in order to build and test xSDKTrilinos. We don't want to present Trilinos as one huge glob of software that is take it or leave it (because people will leave it if they run into these problems). That is why Trilinos is partitioned into a bunch of smaller package with carefully managed dependencies. Does that seem reasonable? Also, why is PETSc building with external ExodusII? Exodus is actually built as part of the of the Trilinos package SEACAS. Does that mean the two version of Exodus are being installed? |
For one - netcdf is already listed as optional for trilinos. I can switch boost to be optional. per the above option. you've listed -DTrilinos_DISABLE_ENABLED_FORWARD_DEP_PACKAGES=ON with both netcdf and boost. What happens if this flag is always set? [irrespective of enabling/disabling netcdf/boost] |
Roscoe,
Satish,
Barry
|
We have used Exodusii directly for many years. If the Exodusii in SEACAS is identical to the independent Exodusii I can add some code to insure only one is included. Barry
|
Yes, it is fine to set this always, unconditionally. This just determines what happens when you have inconsistent enables and disables as described here. In my opinion, I think that this should be on by default for all TriBITS projects, and it is with CASL VERA. It just massively simplifies dependency management with very large and complex dependency structures. It was just that some Trilinos users got confused by this behavior and wanted the default to error out, which is fine. |
Ross,
Barry
|
Thanks! I will give this a try right away! |
Barry, I have reproduced the link failures with xSDKTrilinos through installxSDK.sh. I believe that I know how to resolve the ordering issues by just making changes to xSDKTrilinos. However, there are a few things that will need to be changed on the PETSc side to get xSDKTrilinos to work with static libraries for the way the installxSDK.sh/PETSc is configuring and building everything. First, when PETSc builds ParMETIS against METIS (i.e. --download-parmetis --download-metis), then PETSc needs to pass in the link info for METIS along with ParMETIS. If PETSc were to enable any of the Trilinos tests or examples that depend (directly or indirectly) on ParMETIS, then you would see link failures just in Trilinos with static libs. One way to pass in METIS link info is to just pass it in through TPL_ParMETIS_LIBRARIES (i.e. add Second, since PETSc is building against the X11 and SSL libraries on my system (and likely other systems), it needs to specify these system libraries as well in order to work with static libs. Currently, PETSc is configuring xSDKTrilinos with:
By default, this will only result in the CMake/TriBITS configure of xSDKTrilinos finding libpetsc.a (when static libs are used). The PETSC configure of xsDKTrilinos can resolve this in one of two ways. Either PETSc can pass:
or can pass these extra libraries (since they are system libraries) as:
Passing common system libraries through Of course, PETSc will need to pass all of its linked against system libraries in this way. I will make the necessary changes to xSDKTrilinos on my side to fix static libraries (and verify it works with the PETSc way of configuring Trilinos and xSDKTrilinos) and push to the 'master' branch in the xSDKTrilinos github repo. After that, if someone on the PETSc side could take care of passing in the METIS info to Trilinos and the system libraries for PETSc to xSDKTrilinos, then I think we will have the static library build of xSDK fixed! |
On Thu, 14 Apr 2016, Roscoe A. Bartlett wrote:
Perhaps petsc and all dependent libraries aka PETSC_LIB should be There is no reason there should be 2 code paths that resolve package Satish
|
Barry, A few other things that I noticed when looking into this:
|
Yes, but you have huge duplication if you list the same system libraries over and over again for each TPL separately. It is much cleaner, IMHO, to pass universal system libraries and things like RPATH to the GCC libraries through |
On Thu, 14 Apr 2016, Roscoe A. Bartlett wrote:
Any given package should support the following configure modes [perhaps in autoconf configure terms]: Normally users would be able to use any of these 3 modes. But in xsdk currently xSDKTrilinos.py appears to use mode (a) - which asks xSDKTrilinos to detect libraries & dependencies. With (b) - the libraries detected by xsdk wrapper could be passed to xSDKTrilinos - and avoid mismatch in this library detection. With (c) - both the library detection and dependency info thats already done in xsdk wrapper and it can be passed to xSDKTrilinos Satish |
Ross,
Satish,
Barry
|
Ross,
Barry It seems if we pass it again then there will be tons of overlinking where the same library appears a bunch of times in the same link line; that can work but is ugly.
|
It should reuse it and it does from what I have seen today. IT is just that the librareis were included in the wrong order (which I am fixing now). PETSc should only need to pass unique info to xSDKTrilinos for PETSc and HYPRE. All of the static libs that are already being used by Trilinos packages like ParMETIS, SuperLUDist, etc. will already be listed on the link line for xSDKTrilinos. BTW, it looks like CMake is cleaning up all the duplicate libraries, link directories etc. Therefore, it does not impact the final compile and link lines that the compiler and linker sees (one of the built-in features of CMake). |
Ah, then we should be able to do something pretty clean. Thanks
|
…inos #2) This should resolve the link order problem when using static libraries building xSDKTrilinos as its own CMkae project.
…linos #2) Now xSDKTrilinos when configured as its own CMake project will automatically use the compilers, compiler flags, linker, etc, that was used for Trilinos, as read from the TrilinosConfig.cmake file. Therefore, you can now have a universial do-configure script which is agnostic of what system you are actually on. I put in a hack to the Fissile4 Trilinos configure script to add -lXll and -lssl so that this works with PETSc on the Fissile4. This is a bit of a hack but who cares?
Barry and Satish, I just pushed the commits:
to the 'master' branch of the xSDKTrilinos GitHub repo [email protected]:trilinos/xSDKTrilinos.git. Using this version of xSDKTrilinos, I was was able to run the PETSc generated configure command and then build. It showed the libraries in the correct order. It failed to build because of the missing -lmetis -lX11 -lssl. I then manually configured again hacking on:
and I was able to get all of xSDKTrilinos to link and pass the tests. See details below. Now, if PETSc can take care of -lmetis and -lX11 -lssl, then installxSDK.sh should be able to build xSDKTrilinos with all static libs. Detailed Notes: Fixing the issues with xSDKTrilinos ... 1) Split the TrilinosTpl TPL into TrilinosTplsTpl and TrilinosPkgsTpl TPLs and order these TrilinosTplsTpl, HYPRE, TrilinosPkgsTpl, PETSC (because PETSc may be statically linked against some Trilinos packages; test this to verify this is true): I made the local commits:
2) Move changes for xSDKTrilinos over to the installxSDK.sh git source dir for xSDKTrilinos and test over there with the PETSc configure (manually run it adding in -lX11 -lssl): Now to just try this out manually in the installxSDK.sh directory by getting the branch:
Now to run the configure command generated by PETSc gotten from the petsc/configure.log file:
Now I get link failures like:
I am going to cheat a little and add -lmetis to the extra link flags to make this link ...
After configuring with that, I was able to build and run the xSDKTrilinos tests:
So once I push these changes and PETSc passes -lmetis -lX11 and -lssl, then this should work for static libraries, as well as for shared libraries. |
Satish,
Actually, you can't use that approach in general because some of TPL libraries may be incompatible. For example, for several years in CASL VERA, we had to support two versions of PETSc and Trilinos at the same time. An older version of PETSc built with ML (from a very old version of Trilinos) was being used by the Hydra-TH code but a newer version of PETSc was being used by the core simulator codes COBRA-TF and MPACT and a very new version of Trilinos (integrated with repo syncs) was being used by the Exnihilo codes. This was all done under one cmake configure for "VERA" which make it easy to run automated builds for testing, posting to CDash, doing deployments, etc. This was no problem because these two versions of PETSc and Trilinos never got linked into the same libs or executables. Therefore, a general meta-build system (like TriBITS is almost) can't assume that all the libraries can just be lumped together. It just happened that TriBITS already supported that use case well. Because if this, it made sense to split out the unique libraries into different TriBITS TPLs and then put all of the common system libraries into a single variable to avoid duplication. Does that make sense? |
BTW, TriBITS is currently missing this mode, but it will be added at some point soon: It has been requested a couple of times. |
On Thu, 14 Apr 2016, Roscoe A. Bartlett wrote:
Perhaps I'm misreading the above example. Is your suggestion that the above example can be avoided by using interfaces (a),(b) - but not (c)? In the examples I provided - all have single version of PKG1 and PKG2. Yes - its possible to abuse and sneek in multiple versions of packages [either through the configure interface or through autodetection configure code in some of the packages] but thats possible with all 3 modes listed. [not just 'c'] I agree linking multiple versions of packages is not desired. And these can sneak up in prebuilt packages. This can happen with all 3 modes listed. If any checks can be added to detect/avoid this issue - all 3 modes will equally benefit from it. The premise I initially proposed is - if a top level configure script is processing package installation, dependencies, and liblist - it should tell each package it installs exactly what it should use - and not let it autodetect or autoenable things that can conflict with what the top level script figured out already.. Satish
|
In the case of CASL VERA, the older PETSc and Trilinos TPL was named "HydraTHTPLs" so to TriBITS it appeared to be a completely separate TPL. But you can also have cases where two different TPLs have clashing header files and/or libraries. That is the case right now, for example, with the FEI in Trilinos and HYPRE (see SDK-54). If the include dirs (and perhaps the libraries) for HYPRE were added globally, you could never build the version of FEI actually in Trilinos. Just making the point that in a more general meta-build env, you can have some code that is not well namespaced and you need mechanisms to avoid clashes. That was only point in arguing against option-c.
Agreed. Autodetection and autoenable is about the number one reason why meta-builds of different software using a heterogeneous set of build systems are not portable. For example, PETSc found X11 and SSL on my system and decided to build against it. Then the link failed in xSDKTrilinos because it was not being passed in. I am not picking on PETSc, there are examples in Trilinos that do the same thing. The issue is that when you have a bunch of different builds systems all making their own decisions, then porting becomes much more difficult because of autodetection and autoenable. This also was/is a problem with libmesh/MOOSE inside of CASL VERA. On a new system the autotools libmesh system suddenly found something new on some other system (TBB in one case I can think of) and decided to use it and it killed the build. |
Update on mira: The xSDKTrilinos test problems are getting built now, but executing them fails. I'm assuming that I can run a test with: For output I'm getting: *** Unit test suite ... stderr is: The link line is now: |
@sarich, are you sure this is due to static library issues? Do all of the xSDKTrilinos tests fail like this? If you build and run native Trilinos tests (like for Epetra with -DEpetra_ENABLE_TESTS=ON) do they also fail? The error message just says that a segfault is occurring but not really giving any clues that I can see for what is causing this. |
The xsdk trilinos tests are now working when run with a small number of On Wed, Apr 20, 2016 at 5:39 PM, Roscoe A. Bartlett <
|
In the below email, Jason reports that xSDKTrilinos is putting the libraries in the wrong order on the link line.
This Issue ticket is to investigate this issue and try to resolve it.
From: Jason Sarich [email protected] on behalf of "Sarich, Jason J." [email protected]
Date: Tuesday, April 12, 2016 at 1:41 PM
To: "Smith, Barry F." [email protected]
Cc: "Xiaoye (Sherry) Li" [email protected], "Balay, Satish" [email protected], "Willenbring, James M" [email protected], Sherry Li [email protected], "Klinvex, Alicia Marie" [email protected], Lois Curfman McInnes [email protected]
Subject: Re: nearing final preparations for xSDK release
I'm still having some trouble with installxSDK.sh on mira, I was able to fix a few issues, but now I'm getting a bad link line for xsdktrilinos.
libpetsc.a is not added until the very end, but it depends on previous libraries (hdf5, superlu, metis, hypre, ml, etc.), see link command below.
I'm hoping somebody more familiar with cmake and the xsdktrilinos package can figure out where the library list is coming from quicker than I can.
There's also a '-rdynamic' in there that I don't think will work.
The text was updated successfully, but these errors were encountered: