Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract the install prefix from the shared library. #13147

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bosilca
Copy link
Member

@bosilca bosilca commented Mar 17, 2025

This is part of a multi-project effort, a similar PR will be created in OpenPMIX and OMPI. The goal of each of these changes is the same: instead of using build-time generated prefix that ignore a project rebase, take the prefix from the shared library of each project and derive the necessary paths from it. The user can however overwrite this using the environment variables, and the configuration files.

Open PMIX PR
PRRTE PR

To test it: build OMPI with a prefix, then move the install directory some other place. Set your $PATH and $LD_LIBRARY_PATH (or $DYLD_LIBRARY_PATH on OSX) to correctly point to the new locations then run mpirun -np 1 nothing (yes run a non existing application). If everything works fine you should see a nice error message, like this:

mpirun -np 1 nothing
--------------------------------------------------------------------------
prterun was unable to find the specified executable file, and therefore did
not launch the job.  This error was first reported for process rank
0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a prterun command
   line parameter option (remember that prterun interprets the first
   unrecognized command line token as the executable).

   Node:       XXX
   Executable: nothing
--------------------------------------------------------------------------

If it doesn't, then you will get something like:

--------------------------------------------------------------------------
Sorry!  You were supposed to get help about:
    no-plugins
But I couldn't open the help file:
   .../share/prte/help-pmix-runtime.txt: No such file or directory.  Sorry!
--------------------------------------------------------------------------

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add an owner.txt file.

char* prefix = opal_dirname(dname);
free(dname);

mca_installdirs_runtime_component.install_dirs_data.prefix = prefix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ordering / priority in the installdirs framework such that you can guarantee that this is the last component to be opened? I.e., that writing this prefix will be the one that "wins"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the last component, the last component is the original config. I put the priority of this to be inbetween the environment prefix and the static config data. However, both the env prefix and this module only fill the install prefix, everything else is statically set (as templates) by the config, which therefore must always be last.

There are still setups that don't make sense. As an example if the user manually set a prefix and that prefix is not the same as the loaded library. Is this bad or something the user purposely did (maybe because heterogeneous runs) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you might need to reset exec_prefix. You'll need to see if the configured prefix and exec_prefix are the same, I think...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still setups that don't make sense. As an example if the user manually set a prefix and that prefix is not the same as the loaded library. Is this bad or something the user purposely did (maybe because heterogeneous runs) ?

I agree -- we don't (can't) protect the user from shooting themselves in the foot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the exec_prefix as it it defined as ${prefix} in the config component, and the templating logic will automatically set it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exec_prefix defaults to ${prefix}, but it doesn't have to be.

./configure --prefix=/opt/openmpi --exec-prefix=/something/else ...

Copy link
Member Author

@bosilca bosilca Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't change the exec_prefix I have no idea where the binaries were placed if the user did what you suggested above. I feel like we're trying to solve all problems here, that was not the goal of this PR. The goal was to simply let people move their install directory and have things work properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libdir defaults to ${exec_prefix}/lib. Hence, if exec_prefix is not the same as prefix, and libdir is set relative to exec_prefix, then setting install_dirs_data.prefix won't do what you expect.

We're not trying to solve everything here -- but solving this problem does have to be done correctly / handle the use cases that already exist.

}

char* dname = opal_dirname(info.dli_fname);
char* prefix = opal_dirname(dname);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a comment in here explaining the logic? (i.e., that you're just taking the simple algorithm of going one directory up from the libdir)

A slightly better algorithm might be to see if the config-supplied libdir is ${prefix}/SOMETHING or ${exec_prefix}/SOMETHING. If that's the case, then you should find the same suffix SOMETHING in dname -- stripping off that suffix will result in a more accurate value for prefix and/or exec_prefix.

In most cases, SOMETHING will be a single directory (e.g., lib or lib64), and therefore this algorithm effectively reduces to the same as what you did (go up one directory from the path to libopen_pal). But if the libdir actually has multiple subdirectories, this algorithm would still result in a correct prefix / exec_prefix. For example, if someone configured Open MPI with --libdir=${exec_prefix}/lib/ompi, then dname would be /some/new/prefix/directory/lib/ompi, but you can correctly deduce that your new exec_prefix should be /some/new/prefix/directory -- not /some/new/prefix/directory/lib.

Copy link
Member Author

@bosilca bosilca Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SOMETHING equal to lib is from the statically generated installdirs.h file, but there is no macro for just the suffix. I could extract it from the #define, but that seems to cater to extreme cases. I'm saying that because if someone configure with a complicated libdir prefix, such as your suggestion of lib/ompi, they can't just move the install somewhere else, they would have to do a lot of work to move correctly all the directories. I assume such users would know better how to handle this case, or how not to put themselves in harm way.

Copy link
Member

@jsquyres jsquyres Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow you. You could do this:

./configure --prefix=/build/openmpi '--libdir=${prefix}/my/complicated/libdir' ...

And then you can still relocate all of /build/openmpi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could do that, but the funny thing is that most likely nobody did. I know because that wouldn't work for any internal build with libevent.

@bosilca bosilca force-pushed the topic/runtime_prefix branch from 1ec7a3a to 785e4c9 Compare March 17, 2025 14:46
@rhc54
Copy link
Contributor

rhc54 commented Mar 17, 2025

I'm wondering if this is actually turning out to be all that useful. I have to set LD_LIBRARY_PATH anyway, which means we still require the user to provide prefix values for PRRTE and PMIx (for PRRTE) and for app's PMIx (along with a flag to tell us not to apply any prefix). All this does is let me not include the prefix itself in the environment - but that's trivial and not worth the bother.

The goal was to avoid having the user provide us prefix values, but I don't see any way out of it. Do you?

@bosilca
Copy link
Member Author

bosilca commented Mar 17, 2025

Correct, users always have to set PATH and LD_LIBRARY_PATH, because otherwise mpirun will not start due to missing shared libraries. But, that all they have to set, because now we can extract the prefix from the shared library and infer the prefix.

@rhc54
Copy link
Contributor

rhc54 commented Mar 17, 2025

Correct, users always have to set PATH and LD_LIBRARY_PATH, because otherwise mpirun will not start due to missing shared libraries. But, that all they have to set, because now we can extract the prefix from the shared library and infer the prefix.

I'm afraid that isn't quite correct - you forget that the reason for prefix is that the libs on the remote nodes are in a different location. So you cannot set PATH and LD_LIBRARY_PATH for those locations where mpirun is located or mpirun will fail to start. Instead, you have to pass the prefix - and then I set the paths for the backend.

So this really doesn't do anything for us that I can see - we still need the user to specify the prefix values so we can locate the backend libs.

@rhc54
Copy link
Contributor

rhc54 commented Mar 18, 2025

Just a heads-up: I added this to PRRTE and it broke the show-help subsystem - couldn't find the help files. This was on a Mac with no special configure options and no envar setting.

@bosilca
Copy link
Member Author

bosilca commented Mar 18, 2025

I'm not sure I understand your comment regarding PATH and LD_LIBRARY_PATH (especially the part regarding mpirun on the remote nodes). Anyway, in any case if you set the PREFIX it will take precedence and things will go back to the way they were before.

What this does is that if I move the install directory and update the local PATH and LD_LIBRARY_PATH, I don't have to set the prefix to get things working properly.

How did it broke the help subsystem ? How can I reproduce ?

@rhc54
Copy link
Contributor

rhc54 commented Mar 18, 2025

So here is how the system works. If the user has installed PRRTE/PMIx in a different location on the remote nodes, then we have to point the PATH and LD_LIBRARY_PATH at those locations when we start the daemons. This typically happens when the mount point for a shared file system is different on the login vs compute nodes - a not uncommon situation. The user cannot update PATH/LD_LIBRARY_PATH locally as that would break mpirun itself (you wouldn't find the executable or libprrte/libpmix), which is one reason why we created the prefix capability.

So when you give us a prefix (cmd line or envar), we overwrite the local PATH and LD_LIBRARY_PATH with the prefixed version prior to calling ssh or srun or whatever. Thus, it is not the local value of those variables that gets to the backend - it is the prefixed value. We also insert the prefix into the environment.

Point being: if someone has moved the install, or has put the install into a different location on the backend (which to us looks just like a relocation), then they have to give us a prefix so we know how to set the PATH (to find the prted) and the LD_LIBRARY_PATH (to load the base library - e.g., libprrte and libpmix for PRRTE, libopen-pal for OMPI). Otherwise the proc won't even start, and you'll never get to the installdirs code.

So if they have to set the prefix anyway, it isn't clear to me that this code actually does anything for us. The point was to try and avoid having the user provide us with a prefix, but I don't see a way to do it.

I wouldn't worry about the show-help problem until we determine this is actually worth doing. All I did was commit openpmix/prrte#2167 and do a vanilla install, then prterun --help. I've reverted it and things are back to normal, so no worries there.

@bosilca
Copy link
Member Author

bosilca commented Mar 18, 2025

I think you have misunderstood the scope of this PR. It was never to get rid of the PREFIX, it was to not require the 3 PREFIXes in some very precise cases.

When the software was relocated but is mounted in the same location across the entire cluster. The user has updated his PATH and LD_LIBRARY_PATH before mpirun (maybe even a module load) and the environment is propagated by the batch scheduler.

Until few days ago things would have started remotely, but access to the help files would not have worked, not even when the user was setting their own PREFIX. With the fix you put, things will now work nicely and the user will get all the error messages correctly for as long as they set 3 prefixes, OPAL, PRTE and PMIX.

With this PR, in the same conditions the app will run correctly, including the right modules and help files, even if they don't set three prefixes, because we will extract the prefix from the corresponding shared library. Of course the shared library would have to be loaded, that was the premise of the correctness of the LD_LIBRARY_PATH across the entire setup.

bosilca added 2 commits March 18, 2025 16:08
If we detect a static build don't try to provide a prefix.

This is part of a multi-project effort, a similar PR will be created in
OpenPMIX and OMPI. The goal of each of these changes is the same:
instead of using build-time generated prefix that ignore a project
rebase, take the prefix from the shared library of each project and
derive the necessary paths from it. The user can however overwrite this
using the environment variables, and the configuration files.

Signed-off-by: George Bosilca <[email protected]>
Unset by default, but can be used to chack installations.

Signed-off-by: George Bosilca <[email protected]>
@bosilca bosilca force-pushed the topic/runtime_prefix branch from 785e4c9 to f682f4e Compare March 18, 2025 20:08
@rhc54
Copy link
Contributor

rhc54 commented Mar 18, 2025

Errr...if that's the case, then I think we all misunderstood things. We don't require 3 prefixes in that very limited scenario you describe. It works just fine with only one. If the user sets OPAL_PREFIX, your mpirun wrapper automatically sets PRTE_PREFIX and PMIX_PREFIX to that same value. The change I made (without any other changes) will propagate those values correctly and everything works fine.

The exceptions to this are many. For example, the above only works for a monolith - i.e., it won't work if the user is working with an external PRRTE or PMIx. It also doesn't work for those systems where backend locations are different from frontend locations (e.g., login vs compute nodes). Or for containers where the PMIx used by the app is different from the one used by PRRTE. Etc.

But if the one prefix goal for the homogeneous location case is what we are striving to achieve here - we already have that solution courtesy of the general solution. Given we require the general solution anyway, it isn't clear to me what problem this addition solves.

bosilca added 3 commits March 19, 2025 17:28
This is the only way to make the default install path available for all
components in the installdir framework.

Signed-off-by: George Bosilca <[email protected]>
For the libdir it now matched the libdir prefix from the installdirs.h
in order to find the real prefix.

It does the same for executable allowing us to execute our compiler
wrappers and other tools from a relocated directory. The problem here
is that all our internal executbales are statically linked against
libopal_core, we we will find the basic symbols not in a shared library
but directly in the executable.

This works fine, but we need a way to prevent this mechanism from
triggering on a statically build user application.

Signed-off-by: George Bosilca <[email protected]>
@bosilca
Copy link
Member Author

bosilca commented Mar 19, 2025

This entire discussion started because the prefixes were not propagated (aka they were actually manually removed), and now you come and claim we all misunderstood and that we dont require 3 prefixes.

Anyway, it didn't work when all this discussion started and it doesn't work in the latest stable either. But let's hope it works today, maybe once all the branches get synched and pulled into the stable (or main) version.

@rhc54
Copy link
Contributor

rhc54 commented Mar 19, 2025

It won't work in your main branch because you are still pointed at the PRRTE fork, which is broken. It is working just fine right now in your v5.0 branch because we updated PRRTE there, and it works fine in the upstream PRRTE master branch.

My point was that OMPI, when operating as a monolith, only needs the OPAL_PREFIX because you propagate that value into PRTE_PREFIX and PMIX_PREFIX in the mpirun wrapper - outside the view of the user. Internally, we still use three prefixes to support the three layers.

The monolith solution is a limited special case. Your current code breaks anyone using external PMIx or PRRTE because it incorrectly sets prefix values for those packages. I'm not sure how we could adjust the wrapper, though, to handle those cases. Might just need to tell people not to use that wrapper unless they are working with the monolith case. Or maybe adjust the wrapper so it doesn't overwrite PRTE_PREFIX and/or PMIX_PREFIX if the user already set those in their environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants