You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add support for OCI containers
Update the configure-eda.sh script to install rootless docker for compute nodes.
Add playbook and script to configure users to run rootless docker.
Add support for pyxis and enroot from:
https://docs.aws.amazon.com/en_us/parallelcluster/latest/ug/tutorials_11_running-containerized-jobs-with-pyxis.htmlResolves#292
* Add support for pyxis containers
Build and install enroot and pyxis on external login nodes.
Configure /etc/subuid and /etc/subgid for AD users.
Resolves#259
Slurm supports running jobs in unprivileged containers a couple of different ways.
4
+
It natively supports running jobs in unprivileged [Open Container Initiative (OCI) containers](https://slurm.schedmd.com/containers.html).
5
+
Starting with ParallelCluster 3.11.1, it also supports running docker containers using the [Pyxis SPANK plugin](https://github.com/NVIDIA/pyxis) which uses [enroot](https://github.com/NVIDIA/enroot) to run unprivileged containers.
6
+
I will describe using Pyxis first because it is easier than using OCI containers.
7
+
8
+
**Note**: Most EDA tools are not containerized.
9
+
Some won't run in containers and some may run in a container, but not correctly.
10
+
I recommend following the guidance of your EDA vendor and consult with them.
11
+
12
+
I've seen a couple of main motivations for using containers for EDA tools.
13
+
The first is because orchestration tools like Kubernetes and AWS Batch require jobs to run in containers.
14
+
The other is to have more flexibility managing the run time environment of the tools.
15
+
Since the EDA tools themselves aren't containerized, the container is usually used to manage file system mounts and packages that are used by the tools.
16
+
If new packages are required by a new tool, then it is easy to update and distribute a new version of the container.
17
+
Another reason is to use legacy OS distributions on an instance running a newer distribution.
18
+
19
+
## Using Pyxis
20
+
21
+
The enroot and Pyxis packages were developed by NVIDIA to make it easier to run containers on Slurm compute nodes.
22
+
ParallelCluster started installing enroot and Pyxis in version 3.11.1 so that you can [run containerized jobs with Pyxis](https://docs.aws.amazon.com/en_us/parallelcluster/latest/ug/tutorials_11_running-containerized-jobs-with-pyxis.html).
23
+
24
+
To configure Slurm to use the Pyxis plugin, set the **slurm/ParallelClusterConfig/EnablePyxis** parameter to **true** and create or update your cluster.
25
+
This will configure the head node to use the Pyxis plugin.
26
+
It will also configure your external login nodes to install, configure, and use enroot and the Pyxis plugin.
27
+
28
+
### Running a containerized job using Pyxis
29
+
30
+
With Pyxis configured in your cluster, you have new options in srun and sbatch to specify a container image.
Slurm supports [running jobs in unprivileged OCI containers](https://slurm.schedmd.com/containers.html).
43
+
OCI is the [Open Container Initiative](https://opencontainers.org/), an open governance structure with the purpose of creating open industry standards around container formats and runtimes.
44
+
45
+
I'm going to document how to add OCI support to your EDA Slurm cluster.
46
+
47
+
**NOTE**: Rootless docker requires user-specific setup for each user that will run the containers.
48
+
For this reason, it is much easier to use Pyxis.
49
+
50
+
### Configure rootless docker on login and compute nodes
51
+
52
+
The login and compute nodes must be configured to use an unprivileged container runtime.
53
+
54
+
Run the following script as root to install rootless Docker.
The script [installs the latest Docker from the Docker yum repo](https://docs.docker.com/engine/install/rhel/).
61
+
62
+
The creation of a compute node AMI with rootless docker installed has been automated in the [creation of a custom compute node AMI](custom-amis.md).
63
+
Use one of the build config files with **docker** in the name to create a custom AMI and configure your cluster to use it.
64
+
65
+
### Per user configuration
66
+
67
+
Next, [configure Docker to run rootless](https://docs.docker.com/engine/security/rootless/) by running the following script as the user that will be running Docker.
68
+
69
+
```
70
+
dockerd-rootless-setuptool.sh
71
+
```
72
+
73
+
Each user that will run Docker must have an entry in `/etc/subuid` and `/etc/subgid`.
74
+
The creates_users_groups_json.py script will create `/opt/slurm/config/subuid` and `/opt/slurm/config/subgid` and the compute nodes will copy them to `/etc/subuid` and `/etc/subgid`.
75
+
76
+
You must configure docker to use a non-NFS storage location for storing images.
logger.error(f"{self.config['slurm']['ParallelClusterConfig']['Image']['Os']} is not supported in ParallelCluster version {self.PARALLEL_CLUSTER_VERSION}.")
# Recommend to not use EFA unless necessary to avoid insufficient capacity errors when starting new instances in group or when multiple instance types in the group
1562
+
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-cluster
0 commit comments