Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Rework postinstall.sh's backgrounding, add additional error checking, relax pcluster dependencies #83

Merged
merged 4 commits into from
Mar 24, 2025

Conversation

lrbison
Copy link
Contributor

@lrbison lrbison commented Jan 28, 2025

Rework nohup, and raise errors as they are encountered

To provide additional confidence that this script has completed
successfully, we set -euoo posix pipefail during execution.

Simplify background execution logic to avoid creating temporary
scripts and re-exporting functions.

Improve support for non-pcluster AMIs

Skip install of ACfL compiler if OS is not supported

@lrbison
Copy link
Contributor Author

lrbison commented Jan 28, 2025

@stephenmsachs please take a look 🙏

Copy link
Collaborator

@stephenmsachs stephenmsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me except for the libssl-dev stuff.

@lrbison
Copy link
Contributor Author

lrbison commented Feb 5, 2025

This has turned into a larger change. My motivations are two-fold:

  1. I want to be able to use this script even on a complete single node without pcluster or PCS to install spack with binary cache support on AWS platforms. To this end we need to remove some dependency on slurm and libfabric which we previously assumed.
  2. I want to be able to confidently test the script has completed successfully. This helps me test on many OSs using scripts.

I've tested this version on the following non-pcluster AMIs:

  • Ubuntu 20.04, 22.04, and 24.04
  • Rocky 9.5
  • Amazon Linux 2 and 2023.

However I've recently realized my testing scripts pre-installed more packages than I've realized, so I'll need to address that, and I still need to test on pcluster amis to double-check I haven't broken anything.

@lrbison lrbison changed the title Allow AWS's postinstall.sh script to run on even more base AMIs [AWS] Rework postinstall.sh's backgrounding, add additional error checking, relax pcluster dependencies Feb 5, 2025
Copy link
Collaborator

@stephenmsachs stephenmsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help. I like the changes you did, especially the new -fg option. I am not sure if all those bash options are necessary and help reproducibility. And we will have to clean up in which cases the export statements are still necessary, but that's not essential.

else
spack mirror add --scope site "aws-pcluster-$(stack_arch)" "https://binaries.spack.io/develop/aws-pcluster-$(stack_arch)"
fi
mirror_url="https://binaries.spack.io/${SPACK_BRANCH:-develop}/aws-pcluster-$(stack_arch)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work in case SPACK_BRANCH points to some feature branch which does not yet have a cache. The assumption is that you are most likely forking off of develop, so if there is no matching cache for this brnahc use the develop cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So I tested it and it seems that I can add a mirror with a bad URL and spack does not complain. So to be specific it may fail doing something at a later time (I couldn't make my example fail) or it might just silently ignore it.

But your point is valid: if I close a different version of spack, perhaps I should still try to use the develop mirror?

How about we add both the develop and the branch mirror. If the user chooses to install with some feature branch, then we'll get develop binaries, and if they install with some old branch, we'll get the old binaries (and none of the develop ones unless they have a matching hash).

What do you think?

@lrbison lrbison force-pushed the morebaseami branch 2 times, most recently from 5e1540d to a546390 Compare February 17, 2025 23:59
Copy link
Collaborator

@stephenmsachs stephenmsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still remove the pipefail since it adds more than it helps, e.g. the pipe in line 345 is allowed to fail safely.

@@ -350,6 +421,18 @@ setup_pcluster_buildcache_stack() {
[ -n "${SLURM_VERSION}" ] && yq -i ".packages.slurm.externals[0].spec = \"slurm@${SLURM_VERSION} +pmix\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
[ -d "${SLURM_ROOT}" ] && yq -i ".packages.slurm.externals[0].prefix = \"${SLURM_ROOT}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
[ -n "${LIBFABRIC_VERSION}" ] && yq -i ".packages.libfabric.externals[0].spec = \"libfabric@${LIBFABRIC_VERSION}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
return 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes that's a fun one. But I think we csan turn these around to avoid the return 0:

    # Patch packages.yaml
    [ -z "${SLURM_VERSION}" ] || yq -i ".packages.slurm.externals[0].spec = \"slurm@${SLURM_VERSION} +pmix\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
    [ ! -d "${SLURM_ROOT}" ] || yq -i ".packages.slurm.externals[0].prefix = \"${SLURM_ROOT}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
    [ -z "${LIBFABRIC_VERSION}" ] || yq -i ".packages.libfabric.externals[0].spec = \"libfabric@${LIBFABRIC_VERSION}\"" "${SPACK_ROOT}"/etc/spack/packages.yaml
}

lrbison added 4 commits March 19, 2025 22:09
To provide additional confidence that this script has completed
successfully, we `set -euoo posix pipefail` during execution.

Simplify background execution logic to avoid creating temporary
scripts and re-exporting functions.

Signed-off-by: Luke Robison <[email protected]>
@lrbison
Copy link
Contributor Author

lrbison commented Mar 24, 2025

@stephenmsachs I believe I have addressed your recommended changes:

Are we ready for merge?

@stephenmsachs stephenmsachs merged commit fb0c4a1 into spack:main Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants