-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS] Rework postinstall.sh's backgrounding, add additional error checking, relax pcluster dependencies #83
Conversation
@stephenmsachs please take a look 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for the libssl-dev stuff.
This has turned into a larger change. My motivations are two-fold:
I've tested this version on the following non-pcluster AMIs:
However I've recently realized my testing scripts pre-installed more packages than I've realized, so I'll need to address that, and I still need to test on pcluster amis to double-check I haven't broken anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help. I like the changes you did, especially the new -fg
option. I am not sure if all those bash options are necessary and help reproducibility. And we will have to clean up in which cases the export
statements are still necessary, but that's not essential.
AWS/parallelcluster/postinstall.sh
Outdated
else | ||
spack mirror add --scope site "aws-pcluster-$(stack_arch)" "https://binaries.spack.io/develop/aws-pcluster-$(stack_arch)" | ||
fi | ||
mirror_url="https://binaries.spack.io/${SPACK_BRANCH:-develop}/aws-pcluster-$(stack_arch)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work in case SPACK_BRANCH
points to some feature branch which does not yet have a cache. The assumption is that you are most likely forking off of develop
, so if there is no matching cache for this brnahc use the develop
cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So I tested it and it seems that I can add a mirror with a bad URL and spack does not complain. So to be specific it may fail doing something at a later time (I couldn't make my example fail) or it might just silently ignore it.
But your point is valid: if I close a different version of spack, perhaps I should still try to use the develop mirror?
How about we add both the develop and the branch mirror. If the user chooses to install with some feature branch, then we'll get develop binaries, and if they install with some old branch, we'll get the old binaries (and none of the develop ones unless they have a matching hash).
What do you think?
5e1540d
to
a546390
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still remove the pipefail since it adds more than it helps, e.g. the pipe in line 345 is allowed to fail safely.
AWS/parallelcluster/postinstall.sh
Outdated
@@ -350,6 +421,18 @@ setup_pcluster_buildcache_stack() { | |||
[ -n "${SLURM_VERSION}" ] && yq -i ".packages.slurm.externals[0].spec = \"slurm@${SLURM_VERSION} +pmix\"" "${SPACK_ROOT}"/etc/spack/packages.yaml | |||
[ -d "${SLURM_ROOT}" ] && yq -i ".packages.slurm.externals[0].prefix = \"${SLURM_ROOT}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml | |||
[ -n "${LIBFABRIC_VERSION}" ] && yq -i ".packages.libfabric.externals[0].spec = \"libfabric@${LIBFABRIC_VERSION}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml | |||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes that's a fun one. But I think we csan turn these around to avoid the return 0
:
# Patch packages.yaml
[ -z "${SLURM_VERSION}" ] || yq -i ".packages.slurm.externals[0].spec = \"slurm@${SLURM_VERSION} +pmix\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
[ ! -d "${SLURM_ROOT}" ] || yq -i ".packages.slurm.externals[0].prefix = \"${SLURM_ROOT}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
[ -z "${LIBFABRIC_VERSION}" ] || yq -i ".packages.libfabric.externals[0].spec = \"libfabric@${LIBFABRIC_VERSION}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
}
To provide additional confidence that this script has completed successfully, we `set -euoo posix pipefail` during execution. Simplify background execution logic to avoid creating temporary scripts and re-exporting functions. Signed-off-by: Luke Robison <[email protected]>
@stephenmsachs I believe I have addressed your recommended changes:
Are we ready for merge? |
Rework nohup, and raise errors as they are encountered
To provide additional confidence that this script has completed
successfully, we
set -euoo posix pipefail
during execution.Simplify background execution logic to avoid creating temporary
scripts and re-exporting functions.
Improve support for non-pcluster AMIs
Skip install of ACfL compiler if OS is not supported