Skip to content

Commit fac7a65

Browse files
authored
Add support for OCI containers (#292)
* Add support for OCI containers Update the configure-eda.sh script to install rootless docker for compute nodes. Add playbook and script to configure users to run rootless docker. Add support for pyxis and enroot from: https://docs.aws.amazon.com/en_us/parallelcluster/latest/ug/tutorials_11_running-containerized-jobs-with-pyxis.html Resolves #292 * Add support for pyxis containers Build and install enroot and pyxis on external login nodes. Configure /etc/subuid and /etc/subgid for AD users. Resolves #259
1 parent 56b6b31 commit fac7a65

File tree

32 files changed

+890
-37
lines changed

32 files changed

+890
-37
lines changed

docs/containers.md

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Containers
2+
3+
Slurm supports running jobs in unprivileged containers a couple of different ways.
4+
It natively supports running jobs in unprivileged [Open Container Initiative (OCI) containers](https://slurm.schedmd.com/containers.html).
5+
Starting with ParallelCluster 3.11.1, it also supports running docker containers using the [Pyxis SPANK plugin](https://github.com/NVIDIA/pyxis) which uses [enroot](https://github.com/NVIDIA/enroot) to run unprivileged containers.
6+
I will describe using Pyxis first because it is easier than using OCI containers.
7+
8+
**Note**: Most EDA tools are not containerized.
9+
Some won't run in containers and some may run in a container, but not correctly.
10+
I recommend following the guidance of your EDA vendor and consult with them.
11+
12+
I've seen a couple of main motivations for using containers for EDA tools.
13+
The first is because orchestration tools like Kubernetes and AWS Batch require jobs to run in containers.
14+
The other is to have more flexibility managing the run time environment of the tools.
15+
Since the EDA tools themselves aren't containerized, the container is usually used to manage file system mounts and packages that are used by the tools.
16+
If new packages are required by a new tool, then it is easy to update and distribute a new version of the container.
17+
Another reason is to use legacy OS distributions on an instance running a newer distribution.
18+
19+
## Using Pyxis
20+
21+
The enroot and Pyxis packages were developed by NVIDIA to make it easier to run containers on Slurm compute nodes.
22+
ParallelCluster started installing enroot and Pyxis in version 3.11.1 so that you can [run containerized jobs with Pyxis](https://docs.aws.amazon.com/en_us/parallelcluster/latest/ug/tutorials_11_running-containerized-jobs-with-pyxis.html).
23+
24+
To configure Slurm to use the Pyxis plugin, set the **slurm/ParallelClusterConfig/EnablePyxis** parameter to **true** and create or update your cluster.
25+
This will configure the head node to use the Pyxis plugin.
26+
It will also configure your external login nodes to install, configure, and use enroot and the Pyxis plugin.
27+
28+
### Running a containerized job using Pyxis
29+
30+
With Pyxis configured in your cluster, you have new options in srun and sbatch to specify a container image.
31+
32+
```
33+
# Submitting an interactive job
34+
srun -N 2 --container-image docker://rockylinux:8 hostname
35+
36+
# Submitting a batch job
37+
sbatch -N 2 --wrap='srun --container-image docker://rockylinux:8 hostname'
38+
```
39+
40+
## Using OCI containers
41+
42+
Slurm supports [running jobs in unprivileged OCI containers](https://slurm.schedmd.com/containers.html).
43+
OCI is the [Open Container Initiative](https://opencontainers.org/), an open governance structure with the purpose of creating open industry standards around container formats and runtimes.
44+
45+
I'm going to document how to add OCI support to your EDA Slurm cluster.
46+
47+
**NOTE**: Rootless docker requires user-specific setup for each user that will run the containers.
48+
For this reason, it is much easier to use Pyxis.
49+
50+
### Configure rootless docker on login and compute nodes
51+
52+
The login and compute nodes must be configured to use an unprivileged container runtime.
53+
54+
Run the following script as root to install rootless Docker.
55+
56+
```
57+
/opt/slurm/${ClusterName}/config/bin/install-rootless-docker.sh
58+
```
59+
60+
The script [installs the latest Docker from the Docker yum repo](https://docs.docker.com/engine/install/rhel/).
61+
62+
The creation of a compute node AMI with rootless docker installed has been automated in the [creation of a custom compute node AMI](custom-amis.md).
63+
Use one of the build config files with **docker** in the name to create a custom AMI and configure your cluster to use it.
64+
65+
### Per user configuration
66+
67+
Next, [configure Docker to run rootless](https://docs.docker.com/engine/security/rootless/) by running the following script as the user that will be running Docker.
68+
69+
```
70+
dockerd-rootless-setuptool.sh
71+
```
72+
73+
Each user that will run Docker must have an entry in `/etc/subuid` and `/etc/subgid`.
74+
The creates_users_groups_json.py script will create `/opt/slurm/config/subuid` and `/opt/slurm/config/subgid` and the compute nodes will copy them to `/etc/subuid` and `/etc/subgid`.
75+
76+
You must configure docker to use a non-NFS storage location for storing images.
77+
78+
`~/.config/docker/daemon.json`:
79+
80+
```
81+
{
82+
"data-root": "/var/tmp/${USER}/containers/storage"
83+
}
84+
```
85+
86+
### Create OCI Bundle
87+
88+
Each container requires an [OCI bundle](https://slurm.schedmd.com/containers.html#bundle).
89+
90+
The bundle directories can be stored on NFS and shared between users.
91+
For example, you could create an oci-bundles directory on your shared file system.
92+
93+
This shows how to create an ubuntu bundle.
94+
You can do this as root with the docker service running, but it would be better to run
95+
it using rootless Docker.
96+
97+
```
98+
export OCI_BUNDLES_DIR=~/oci-bundles
99+
export IMAGE_NAME=ubuntu
100+
export BUNDLE_NAME=ubuntu
101+
mkdir -p $OCI_BUNDLES_DIR
102+
cd $OCI_BUNDLES_DIR
103+
mkdir -p $BUNDLE_NAME
104+
cd $BUNDLE_NAME
105+
docker pull $IMAGE_NAME
106+
docker export $(docker create $IMAGE_NAME) > $BUNDLE_NAME.tar
107+
mkdir rootfs
108+
tar -C rootfs -xf $IMAGE_NAME.tar
109+
runc spec --rootless
110+
runc run containerid1
111+
```
112+
113+
The same process works for Rocky Linux 8.
114+
115+
```
116+
export OCI_BUNDLES_DIR=~/oci-bundles
117+
export IMAGE_NAME=rockylinux:8
118+
export BUNDLE_NAME=rockylinux8
119+
mkdir -p $OCI_BUNDLES_DIR
120+
cd $OCI_BUNDLES_DIR
121+
mkdir -p $BUNDLE_NAME
122+
cd $BUNDLE_NAME
123+
docker pull $IMAGE_NAME
124+
docker export $(docker create $IMAGE_NAME) > $BUNDLE_NAME.tar
125+
mkdir rootfs
126+
tar -C rootfs -xf $BUNDLE_NAME.tar
127+
runc spec --rootless
128+
runc run containerid2
129+
```
130+
131+
### Run a bundle on Slurm using OCI container
132+
133+
```
134+
export OCI_BUNDLES_DIR=~/oci-bundles
135+
export BUNDLE_NAME=rockylinux8
136+
137+
srun -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --pty hostname
138+
139+
srun -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --pty bash
140+
141+
sbatch -p interactive --container $OCI_BUNDLES_DIR/$BUNDLE_NAME --wrap hostname
142+
```

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ nav:
1515
- 'job_preemption.md'
1616
- 'rest_api.md'
1717
- 'onprem.md'
18+
- 'containers.md'
1819
# - 'federation.md'
1920
- 'delete-cluster.md'
2021
# - 'implementation.md'

source/cdk/cdk_slurm_stack.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@
5353
import boto3
5454
from botocore.exceptions import ClientError
5555
import config_schema
56-
from config_schema import get_PARALLEL_CLUSTER_LAMBDA_RUNTIME, get_PARALLEL_CLUSTER_MUNGE_VERSION, get_PARALLEL_CLUSTER_PYTHON_VERSION, get_PC_SLURM_VERSION, get_SLURM_VERSION
56+
from config_schema import get_PARALLEL_CLUSTER_ENROOT_VERSION, get_PARALLEL_CLUSTER_LAMBDA_RUNTIME, get_PARALLEL_CLUSTER_MUNGE_VERSION, get_PARALLEL_CLUSTER_PYTHON_VERSION, get_PARALLEL_CLUSTER_PYXIS_VERSION, get_PC_SLURM_VERSION, get_SLURM_VERSION
5757
from constructs import Construct
5858
from copy import copy, deepcopy
5959
from hashlib import sha512
@@ -289,6 +289,10 @@ def check_config(self):
289289
self.mount_home_src = mount_dict['src']
290290
logger.info(f"Mounting /home from {self.mount_home_src} on compute nodes")
291291

292+
if self.config['slurm']['ParallelClusterConfig']['EnablePyxis'] and not config_schema.PARALLEL_CLUSTER_SUPPORTS_PYXIS(self.PARALLEL_CLUSTER_VERSION):
293+
logger.error(f"Cannot EnablePyxis before ParaallelCluster version {config_schema.PARALLEL_CLUSTER_SUPPORTS_PYXIS_VERSION}")
294+
config_errors += 1
295+
292296
# Check OS
293297
if self.config['slurm']['ParallelClusterConfig']['Image']['Os'] not in config_schema.get_PARALLEL_CLUSTER_ALLOWED_OSES(self.config):
294298
logger.error(f"{self.config['slurm']['ParallelClusterConfig']['Image']['Os']} is not supported in ParallelCluster version {self.PARALLEL_CLUSTER_VERSION}.")
@@ -498,7 +502,7 @@ def check_config(self):
498502

499503
if 'Xio' in self.config['slurm']:
500504
if self.config['slurm']['ParallelClusterConfig']['Architecture'] != 'x86_64':
501-
logger.error("Xio is only supported on x86_64 architecture, not {self.config['slurm']['ParallelClusterConfig']['Architecture']}")
505+
logger.error(f"Xio is only supported on x86_64 architecture, not {self.config['slurm']['ParallelClusterConfig']['Architecture']}")
502506
config_errors += 1
503507

504508
if config_errors:
@@ -1254,11 +1258,13 @@ def create_parallel_cluster_assets(self):
12541258
# Additions or deletions to the list should be reflected in config_scripts in on_head_node_start.sh.
12551259
files_to_upload = [
12561260
'config/bin/configure-eda.sh',
1261+
'config/bin/configure-rootless-docker.sh',
12571262
'config/bin/create_or_update_users_groups_json.sh',
12581263
'config/bin/create_users_groups_json.py',
12591264
'config/bin/create_users_groups_json_configure.sh',
12601265
'config/bin/create_users_groups_json_deconfigure.sh',
12611266
'config/bin/create_users_groups.py',
1267+
'config/bin/install-rootless-docker.sh',
12621268
'config/bin/on_head_node_start.sh',
12631269
'config/bin/on_head_node_configured.sh',
12641270
'config/bin/on_head_node_updated.sh',
@@ -1532,6 +1538,7 @@ def create_parallel_cluster_lambdas(self):
15321538
'ConfigureEdaScriptS3Url': self.custom_action_s3_urls['config/bin/configure-eda.sh'],
15331539
'ErrorSnsTopicArn': self.config.get('ErrorSnsTopicArn', ''),
15341540
'ImageBuilderSecurityGroupId': self.imagebuilder_sg.security_group_id,
1541+
'InstallDockerScriptS3Url': self.custom_action_s3_urls['config/bin/install-rootless-docker.sh'],
15351542
'ParallelClusterVersion': self.config['slurm']['ParallelClusterConfig']['Version'],
15361543
'Region': self.cluster_region,
15371544
'SubnetId': self.config['SubnetId'],
@@ -2273,6 +2280,7 @@ def get_instance_template_vars(self, instance_role):
22732280
"cluster_name": cluster_name,
22742281
"region": self.cluster_region,
22752282
"time_zone": self.config['TimeZone'],
2283+
"parallel_cluster_version": self.PARALLEL_CLUSTER_VERSION
22762284
}
22772285
instance_template_vars['default_partition'] = 'batch'
22782286
instance_template_vars['file_system_mount_path'] = '/opt/slurm'
@@ -2287,9 +2295,12 @@ def get_instance_template_vars(self, instance_role):
22872295
instance_template_vars['accounting_storage_host'] = self.config['slurm']['ParallelClusterConfig']['Slurmdbd']['Host']
22882296
else:
22892297
instance_template_vars['accounting_storage_host'] = ''
2298+
instance_template_vars['enable_pyxis'] = self.config['slurm']['ParallelClusterConfig']['EnablePyxis']
22902299
instance_template_vars['licenses'] = self.config['Licenses']
2300+
instance_template_vars['parallel_cluster_enroot_version'] = get_PARALLEL_CLUSTER_ENROOT_VERSION(self.config)
22912301
instance_template_vars['parallel_cluster_munge_version'] = get_PARALLEL_CLUSTER_MUNGE_VERSION(self.config)
22922302
instance_template_vars['parallel_cluster_python_version'] = get_PARALLEL_CLUSTER_PYTHON_VERSION(self.config)
2303+
instance_template_vars['parallel_cluster_pyxis_version'] = get_PARALLEL_CLUSTER_PYXIS_VERSION(self.config)
22932304
instance_template_vars['primary_controller'] = True
22942305
instance_template_vars['slurm_uid'] = self.config['slurm']['SlurmUid']
22952306
instance_template_vars['slurmctld_port'] = self.slurmctld_port
@@ -2310,8 +2321,11 @@ def get_instance_template_vars(self, instance_role):
23102321
instance_template_vars['xio_config'] = self.config['slurm']['Xio']
23112322
instance_template_vars['xio_config']['ExtraMounts'] = self.config['slurm'].get('storage', {}).get('ExtraMounts', [])
23122323
elif instance_role == 'ParallelClusterExternalLoginNode':
2324+
instance_template_vars['enable_pyxis'] = self.config['slurm']['ParallelClusterConfig']['EnablePyxis']
23132325
instance_template_vars['slurm_version'] = get_SLURM_VERSION(self.config)
2326+
instance_template_vars['parallel_cluster_enroot_version'] = get_PARALLEL_CLUSTER_ENROOT_VERSION(self.config)
23142327
instance_template_vars['parallel_cluster_munge_version'] = get_PARALLEL_CLUSTER_MUNGE_VERSION(self.config)
2328+
instance_template_vars['parallel_cluster_pyxis_version'] = get_PARALLEL_CLUSTER_PYXIS_VERSION(self.config)
23152329
instance_template_vars['slurmrestd_port'] = self.slurmrestd_port
23162330
instance_template_vars['file_system_mount_path'] = f'/opt/slurm/{cluster_name}'
23172331
instance_template_vars['slurm_base_dir'] = f'/opt/slurm/{cluster_name}'

source/cdk/config_schema.py

+32
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,14 @@ def get_PARALLEL_CLUSTER_MUNGE_VERSION(config):
272272
parallel_cluster_version = get_parallel_cluster_version(config)
273273
return PARALLEL_CLUSTER_MUNGE_VERSIONS[parallel_cluster_version]
274274

275+
def get_PARALLEL_CLUSTER_ENROOT_VERSION(config):
276+
parallel_cluster_version = get_parallel_cluster_version(config)
277+
return PARALLEL_CLUSTER_ENROOT_VERSIONS[parallel_cluster_version]
278+
279+
def get_PARALLEL_CLUSTER_PYXIS_VERSION(config):
280+
parallel_cluster_version = get_parallel_cluster_version(config)
281+
return PARALLEL_CLUSTER_PYXIS_VERSIONS[parallel_cluster_version]
282+
275283
def get_PARALLEL_CLUSTER_PYTHON_VERSION(config):
276284
parallel_cluster_version = get_parallel_cluster_version(config)
277285
return PARALLEL_CLUSTER_PYTHON_VERSIONS[parallel_cluster_version]
@@ -338,6 +346,12 @@ def get_PARALLEL_CLUSTER_LAMBDA_RUNTIME(parallel_cluster_version):
338346
else:
339347
return aws_lambda.Runtime.PYTHON_3_12
340348

349+
# Version 3.11.1
350+
351+
PARALLEL_CLUSTER_SUPPORTS_PYXIS_VERSION = parse_version('3.11.1')
352+
def PARALLEL_CLUSTER_SUPPORTS_PYXIS(parallel_cluster_version):
353+
return parallel_cluster_version >= PARALLEL_CLUSTER_SUPPORTS_PYXIS_VERSION
354+
341355
# Version 3.12.0
342356

343357
def PARALLEL_CLUSTER_REQUIRES_FSXZ_OUTBOUND_SG_RULES(parallel_cluster_version):
@@ -527,6 +541,7 @@ def DEFAULT_OS(config):
527541
# 2 cores
528542
'm7i.xlarge',
529543
'r7a.large',
544+
'r8g.large',
530545
# 4 cores
531546
'c7i.2xlarge',
532547
'm7a.xlarge',
@@ -538,6 +553,7 @@ def DEFAULT_OS(config):
538553
# 32 GB:
539554
# 2 cores
540555
'r7iz.xlarge',
556+
'r8g.xlarge',
541557
# 4 core(s):
542558
'm7i.2xlarge',
543559
'r7a.xlarge',
@@ -555,6 +571,7 @@ def DEFAULT_OS(config):
555571
# 8 core(s):
556572
'm7i.4xlarge',
557573
'r7a.2xlarge',
574+
'r8g.2xlarge',
558575
# 16 core(s): ['m8g.4xlarge']
559576
'c7i.8xlarge',
560577
'm7a.4xlarge',
@@ -573,18 +590,22 @@ def DEFAULT_OS(config):
573590
# 16 cores
574591
'm7i.8xlarge',
575592
'r7a.4xlarge',
593+
'r8g.4xlarge',
576594
# 32 cores
577595
'c7i.16xlarge',
578596
'm7a.8xlarge',
597+
'm8g.8xlarge',
579598
# 64 cores
580599
'c7a.16xlarge',
600+
'c8g.16xlarge',
581601

582602
# 192 GB:
583603
# 48 cores
584604
'c7i.24xlarge',
585605
'm7a.12xlarge',
586606
# 96 cores
587607
'c7a.24xlarge',
608+
'c8g.24xlarge',
588609

589610
# 256 GB:
590611
# 4 cores
@@ -593,8 +614,10 @@ def DEFAULT_OS(config):
593614
# 32 cores
594615
'm7i.16xlarge',
595616
'r7a.8xlarge',
617+
'r8g.8xlarge',
596618
# 64 cores
597619
'm7a.16xlarge',
620+
'm8g.16xlarge',
598621
# 128 cores
599622
'c7a.32xlarge',
600623

@@ -605,24 +628,29 @@ def DEFAULT_OS(config):
605628
# 96 cores
606629
'c7i.48xlarge',
607630
'm7a.24xlarge',
631+
'm8g.24xlarge',
608632
# 192 cores
609633
'c7a.48xlarge',
634+
'c8g.48xlarge',
610635

611636
# 512 GB:
612637
# 8 cores
613638
'x2iedn.4xlarge', # Have newer r7iz
614639
'x2iezn.4xlarge', # Have newer r7iz
615640
# 64 cores
616641
'r7a.16xlarge',
642+
'r8g.16xlarge',
617643
# 128 cores
618644
'm7a.32xlarge',
619645

620646
# 768 GB:
621647
# 96 cores
622648
'm7i.48xlarge',
623649
'r7a.24xlarge',
650+
'r8g.24xlarge',
624651
# 192 cores
625652
'm7a.48xlarge',
653+
'm8g.48xlarge',
626654

627655
# 1024 GB:
628656
# 16 cores
@@ -1530,6 +1558,10 @@ def get_config_schema(config):
15301558
},
15311559
Optional('Architecture', default=DEFAULT_ARCHITECTURE): And(str, lambda s: s in VALID_ARCHITECTURES),
15321560
Optional('ComputeNodeAmi'): And(str, lambda s: s.startswith('ami-')),
1561+
# Recommend to not use EFA unless necessary to avoid insufficient capacity errors when starting new instances in group or when multiple instance types in the group
1562+
# See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-cluster
1563+
Optional('EnableEfa', default=False): bool,
1564+
Optional('EnablePyxis', default=False): bool,
15331565
Optional('Database'): {
15341566
Optional('DatabaseStackName'): str,
15351567
Optional('FQDN'): str,

source/resources/lambdas/CreateBuildFiles/CreateBuildFiles.py

+17
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,23 @@ def lambda_handler(event, context):
217217
Body = build_file_content
218218
)
219219

220+
# Image with rootless Docker installed
221+
template_vars['ImageName'] = f"parallelcluster-{parallelcluster_version_name}-docker-{distribution}-{version}-{architecture}".replace('_', '-')
222+
template_vars['ComponentS3Url'] = environ['InstallDockerScriptS3Url']
223+
build_file_s3_key = f"{assets_base_key}/config/build-files/{template_vars['ImageName']}.yml"
224+
if requestType == 'Delete':
225+
response = s3_client.delete_object(
226+
Bucket = assets_bucket,
227+
Key = build_file_s3_key
228+
)
229+
else:
230+
build_file_content = build_file_template.render(**template_vars)
231+
s3_client.put_object(
232+
Bucket = assets_bucket,
233+
Key = build_file_s3_key,
234+
Body = build_file_content
235+
)
236+
220237
# Image with EDA packages
221238
template_vars['ImageName'] = f"parallelcluster-{parallelcluster_version_name}-eda-{distribution}-{version}-{architecture}".replace('_', '-')
222239
template_vars['ComponentS3Url'] = environ['ConfigureEdaScriptS3Url']

source/resources/parallel-cluster/config/bin/configure-eda.sh

+3-1
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,6 @@ ansible-playbook $PLAYBOOKS_PATH/eda_tools.yml \
8888
-i inventories/local.yml \
8989
-e @$ANSIBLE_PATH/ansible_head_node_vars.yml
9090

91-
popd
91+
ansible-playbook $PLAYBOOKS_PATH/install-rootless-docker.yml \
92+
-i inventories/local.yml \
93+
-e @$ANSIBLE_PATH/ansible_head_node_vars.yml

0 commit comments

Comments
 (0)