Skip to content

Commit ceada4e

Browse files
committed
[jobs/arrays] Finalize the structure of the array page
- Remove dead code and comments. - Add example with complicated commands per task invocation.
1 parent a705b9d commit ceada4e

File tree

1 file changed

+158
-93
lines changed

1 file changed

+158
-93
lines changed

docs/jobs/arrays.md

Lines changed: 158 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,7 @@
66
A naive way to submit multiple jobs is to programmatically submit the jobs with a custom script. This however can quickly hit the [maximum job limit](/slurm/qos/#available-qoss) (`MaxJobPU`), which is `100` jobs per user in the `normal` QoS. If your jobs share the same options, then consider using job arrays. Job arrays create job records for task progressively, so they will help you keep within `MaxJobPU` while reducing the load for the scheduler.
77

88
!!! warning "When _not_ to use job arrays"
9-
Every job in a job array requires an allocation from the scheduler, which is an expensive operation. If you plan to submit many small jobs in an array that require the allocation of more than _10 jobs per minute_, please use [GNU parallel](/jobs/gnu-parallel/) to batch multiple tasks in a single job allocation.
10-
11-
<!--
12-
quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.), however it is possible to change some of these options after the job has begun execution using the scontrol command specifying the JobID of the array or individual ArrayJobID.
13-
14-
In HPC systems, cluster policy may enforce job submission limits in order to protect the scheduler from overload.
15-
16-
When you want to submit multiple jobs that share the same initial options (e.g. qos, time limit etc.) but with different input parameters the naive way is to manually or programatically generate and submit multiple scripts with different parameters each with its own sbatch job allocation. But doing this may quickly hit the cluster limits and risks having your job submission rejected.
17-
18-
Job arrays provides you with a mechanism for submitting and managing collections of similar jobs quickly and easily, while still giving you fine control over the maximum simultaneously running tasks from the Job array.
19-
-->
9+
Every job in a job array requires an allocation from the scheduler, which is an expensive operation. If you plan to submit many small jobs in an array that require the allocation of [more than _10 jobs per minute_](#launch-rate-calculations), please use [GNU parallel](/jobs/gnu-parallel/) to batch multiple tasks in a single job allocation.
2010

2111
## Using job arrays
2212

@@ -29,7 +19,7 @@ The job array feature of Slurm groups similar jobs and provides functionality fo
2919
- `SLURM_ARRAY_TASK_MIN` is the smallest task ID in the job array.
3020

3121
??? info "Inner workings of jobs arrays and the job ID of the whole array"
32-
When a job array is submitted to Slurm, [only one job record is created](https://slurm.schedmd.com/job_array.html#squeue). The `SLURM_JOB_ID` of this initial job will then be the `SLURM_ARRAY_JOB_ID` of the whole array. Additional jobs records are then created by the initial job. Using the `squeue` someone can see additional jobs appear in the as their records are created by the initial job, and you will also see the initial job name change to reflect the progress of the job array execution.
22+
When a job array is submitted to Slurm, [only one job record is created](https://slurm.schedmd.com/job_array.html#squeue). The `SLURM_JOB_ID` of this initial job will then be the `SLURM_ARRAY_JOB_ID` of the whole array. Additional jobs records are then created by the initial job. The [`squeue`](#viewing-array-job-status) command shows that additional jobs appear in the queue as their records are created by the initial job, and the initial [job ID string changes](#managing-tasks-and-arrays) to reflect the progress of the job array execution. This gradual submission of jobs ensures that the user remains within the limits specified by the Slurm configuration. For instance in a job array with `400` jobs, up to 100 jobs will be launched in parallel in the [`normal` QoS](/slurm/qos/#available-qoss) that has a limit (`MaxJobPU`) of 100 jobs.
3323

3424
Typically the Slurm job with `SLURM_ARRAY_JOB_ID` will also execute the last task in the array before terminating, but this is implementation dependent and not part of the job array interface.
3525

@@ -48,7 +38,11 @@ A job array is submitted with the `--array` (`-a` short form) option of `sbatch`
4838
sbatch --array=0-31 job_array_script.sh
4939
```
5040

51-
where the `--array` is used to control how many Slurm jobs are created. Inside the `job_array_script.sh` the `SLURM_ARRAY_TASK_ID` can be used to control to differentiate the operation of the script.
41+
where the `--array` is used to control how many Slurm jobs are created. Inside the `job_array_script.sh` the `SLURM_ARRAY_TASK_ID` can be used to control to differentiate the operation of the script. The number of jobs that runs in parallel is controlled using the suffix
42+
```
43+
sbatch --array=<task list>%<number of parallel jobs> job_script.sh
44+
```
45+
where `<number of parallel jobs>` is the maximum number of jobs that will run in parallel.
5246

5347
??? info "Advances _task list_ specifications"
5448
During debugging or testing it's often convenient to specify a subrange of tasks to execute. The _task list_ supports a rich syntax. The types of _entries_ in the task list are
@@ -66,9 +60,9 @@ where the `--array` is used to control how many Slurm jobs are created. Inside t
6660
- `--array=1-7:2,0-6:2` is equivalent to `--array=1,3,5,7,0,2,4,6`, and
6761
- `--array=1-4,1-7:2` is equivalent to `--array=1,2,3,4,5,7`.
6862

69-
A task list is _valid_ if all task IDs in the list are the range `0-MaxArraySize`.
63+
A task list is _valid_ if all task IDs in the list are in the range `0-(MaxArraySize-1)`.
7064

71-
If you job specification has a syntax error or lists tasks with ID outside the range `0-MaxArraySize`, then the array job submission fails immediately with an error message
65+
If you job specification has a syntax error or lists tasks with ID outside the range `0-(MaxArraySize-1)`, then the array job submission fails immediately with an error message.
7266

7367
!!! warning "Job execution order"
7468
The task ID simply provides a way to differentiate the job array tasks and their behavior, there is no guaranty in which order tasks will run. If you need your job array tasks to run in a particular order consider using job dependencies.
@@ -80,9 +74,52 @@ A combination of the `${SLURM_ARRAY_TASK_ID}` and `${SLURM_ARRAY_JOB_ID}` can re
8074
- Use the `${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}` to refer to a task of the job array.
8175
- Use the `${SLURM_ARRAY_JOB_ID}` to refer to all the tasks of the job array collectively.
8276

77+
Each array job, is associated with a job ID string that contains information about the status of the array. The job ID string is formatted using the [_task list_](#submitting-a-job-array) as follows.
78+
```
79+
<${SLURM_ARRAY_JOB_ID}>_[<task list>]
80+
```
81+
82+
As the execution of the job array with job `${SLURM_ARRAY_JOB_ID}` progresses, the job ID string is updated to reflect the progress.
83+
84+
!!! example "The job array ID string"
85+
Assume that a job
86+
```console
87+
sbatch --array=0-399%4 job_script.sh
88+
```
89+
is submitted and gets assigned `SLURM_ARRAY_JOB_ID=625449`.
90+
91+
- The initial job ID string is: `9625449_[0-399%4]`
92+
- After tasks `0-23` are executed, the new ID string is: `9625449_[24-399%4]`
93+
94+
A few example with the most representative use for job ID strings cases follow.
95+
96+
#### Canceling job arrays and job array tasks
97+
98+
With `scancel` some of the array tasks or all the array can be cancelled. Assume that array with `SLURM_ARRAY_JOB_ID=9624577` is running.
99+
100+
- To cancel the whole job array use:
101+
```console
102+
scancel 9624577
103+
```
104+
- To cancel the job array task with `SLURM_ARRAY_TASK_ID=197` in particular use:
105+
```console
106+
scancel 9624577_197
107+
```
108+
109+
!!! info "Syntax shortcuts for job ID strings"
110+
When addressing a single task ID, the square brackets in the job ID string can be dropped. For instance,
111+
```
112+
9624577_[197]
113+
```
114+
is equivalent to
115+
```
116+
9624577_197
117+
```
118+
in all cases where job ID strings appear.
119+
83120
#### Viewing array job status
84121

85-
Assume that array with `SLURM_ARRAY_JOB_ID=9624577` is running.
122+
The `squeue` can access the job ID string for the whole array and task ID strings of individual tasks. Assume that array with `SLURM_ARRAY_JOB_ID=9624577` is running.
86123

87124
- To view the status of the whole job array use:
88125
```console
@@ -128,18 +165,6 @@ Assume that array with `SLURM_ARRAY_JOB_ID=9624577` is running.
128165
export SQUEUE_FORMAT2='JobID:10,ArrayTaskID:20,Partition:10,QOS:10,Name:30,UserName:20,NumNodes:6,State:15,TimeUsed:6,TimeLeft:10,PriorityLong:10,ReasonList:30'`
129166
```
130167

131-
#### Canceling job arrays and job array tasks
132-
133-
Assume that array with `SLURM_ARRAY_JOB_ID=9624577` is running.
134-
135-
- To cancel the whole job array use:
136-
```console
137-
scancel 9624577
138-
```
139-
- To cancel the job array task with `SLURM_ARRAY_TASK_ID=197` in particular use:
140-
```console
141-
scancel 9624577_197
142-
```
143168
#### Modifying job array tasks
144169

145170
Even though job array tasks are submitted with the exact same scheduler options, individual jobs can be modified at any point before completion with the `scotrol` command. For instance, you can increase the runtime of the task of a job array that has already been submitted.
@@ -161,8 +186,6 @@ Consider submitting the following job.
161186
declare test_duration=720 # 12min
162187

163188
srun \
164-
--nodes=1 \
165-
--ntasks=1 \
166189
stress-ng \
167190
--cpu ${SLURM_CPUS_PER_TASK} \
168191
--timeout "${test_duration}"
@@ -183,88 +206,130 @@ The tasks in `stress_test.sh` do not have sufficient time to finish. After submi
183206
scontrol update jobid=9625003_4 TimeLimit=00:15:00
184207
```
185208

209+
## Job array scripts
186210

211+
Consider a job array script designed to stress test a set of network file systems mounted on `${FILE_SYSTEM_PATH_PREFIX}_0` to `${FILE_SYSTEM_PATH_PREFIX}_255`. The job array launch script is the following.
187212

188-
??? info "Examples using `${SLURM_ARRAY_JOB_ID}` and `${SLURM_ARRAY_TASK_ID}`"
189-
190-
191-
192-
With `the `squeue` command y
193-
- `squeue --job=312_2` will print information about task with `SLURM_ARRAY_TASK_ID=2` of job array with `SLURM_ARRAY_JOB_ID=312`, and
194-
- `squeue --job=312` will print informatino about all jobs of job array with `SLURM_ARRAY_JOB_ID=312`.
195-
196-
Use 9624577_197
197-
198-
- `scancel 312_2` will cancel task with `SLURM_ARRAY_TASK_ID=2` of job array with `SLURM_ARRAY_JOB_ID=312`, and
199-
- `scancel 312` will cancel all tasks of job array with `SLURM_ARRAY_JOB_ID=312`.
200-
201-
202-
203-
## Scheduling of job arrays
213+
!!! example "io_test.sh"
214+
```
215+
#!/bin/bash --login
216+
#SBATCH --job-name=array_script
217+
#SBATCH --array=0-255%16
218+
#SBATCH --partition=batch
219+
#SBATCH --qos=normal
220+
#SBATCH --nodes=1
221+
#SBATCH --ntasks-per-node=1
222+
#SBATCH --cpus-per-task=16
223+
#SBATCH --time=00:30:00
224+
#SBATCH --output=%x-%A_%a.out
225+
#SBATCH --error=%x-%A_%a.err
204226

205-
The jobs of an array are submitted in batches, according to the state of the Slurm job manager and
227+
declare test_duration=20m
206228

229+
srun \
230+
stress-ng \
231+
--timeout "${test_duration}" \
232+
--iomix "${SLURM_CPUS_PER_TASK}" \
233+
--temp-path "${FILE_SYSTEM_PATH_PREFIX}_${SLURM_ARRAY_TASK_ID}" \
234+
--verify \
235+
--metrics
236+
```
207237

208-
Job arrays will not create job records immediately for all the tasks in the array. Only jobs for which records are created will count towards the maximum job limit of the user and will be considered for resource scheduling. Thus more jobs can be submitted without encumbering the scheduler.
238+
This job script a job array with `256` tasks, where up to `16` tasks will run in parallel. Job arrays provide two extra [filename patterns](https://slurm.schedmd.com/sbatch.html#SECTION_FILENAME-PATTERN) that can be used to name output files (defined with the `--output` and `--error` options). This patterns are,
209239

240+
- `%A` that contains the master job allocation number `SLURM_ARRAY_JOB_ID`, and
241+
- `%a` that contains the task index number `SLURM_ARRAY_TASK_ID`.
210242

243+
### Launch rate calculations
211244

245+
The `io_test.sh` script launches $16$ jobs in parallel, and each job has a duration of $20$ minutes. This results in a job launch rate of
212246

213-
The option has the form
247+
$$
248+
\frac{16 ~ \text{jobs}}{20 ~ \text{min}} = 0.8 ~ \text{jobs per minute}
249+
$$
214250

251+
that is lower than the rule of thumb limit of $10$ jobs per minute. Imagine for instance that we do not limit the maximum number of tasks that can run in parallel by overriding the `--array` option.
252+
```console
253+
sbatch --array=0-255 io_test.sh
215254
```
216-
--array=<min ID>-<max ID>:<increment>
217-
```
218-
where
219-
220-
sed "${SLURM_ARRAY_TASK_ID}"'!d' input_file.csv
255+
Then, up to all $256$ can run in parallel and each job has a duration of $20$ minutes, which would result in a peak allocation rate of
221256

257+
$$
258+
\frac{256}{20} = 12.8 ~ \text{jobs per minute}
259+
$$
222260

223-
[Job Arrays](https://slurm.schedmd.com/job_array.html) are supported for `batch` jobs by specifying array index values using `--array` or `-a`option either as a comment inside the SLURM script `#SBATCH --array=<start_index>-<end_index>:<increment>` or by specifying the array range directly when you run the `sbatch` command `sbatch --array=1-100 array_script.sh`
261+
a lunch rate that is momentarily above the rule of rule of thumbs limit of $10$ jobs per minute. Therefore, a limit in the maximum number of parallel running jobs should be considered.
224262

263+
!!! warning "Limiting the job launch rate"
264+
The [`MaxArraySize`](#using-job-arrays) limit in UL HPC systems makes it difficult to exceed the suggested limit of job launches per minute. However, in case you need to launch more that 1000 jobs or you expect a job launch rate that is more that 10 jobs per minute, please consider using [GNU parallel](/jobs/gnu-parallel/).
225265

226-
The option arguments can be either
266+
### Writing launch scripts
227267

228-
- specific array index values `--array=0-31`
229-
- a range of index values `--array=1,3,5,7`
230-
- optional step sizes `--array=1-7:2` (step size 2)
231-
- `<start_index>` an Integer > 0 that defines the Task ID for the first job in the array
232-
- `<end_index>` an Integer > `<start_index>` that defines the Task ID of the last job in the array
233-
- `<increment>` an Integer > 0 that specifies the increment or step size between the Task IDs it is default to '1' if not specified
268+
Array indices can be used to differentiate the input of a task. In the following example, a script creates programmatically a job array to run a parametric investigation on a 2-dimensional input, and then launches the job array.
234269

270+
!!! example "`launch_parammetric_analysis.sh`"
271+
```bash
272+
#!/usr/bin/bash --login
273+
274+
declare max_parallel_tasks=16
275+
declare speed_step=0.01
276+
277+
generate_commands() {
278+
local filename="${1}"
279+
280+
echo -n > ${filename}
281+
declare nx ny vx vy
282+
for nx in $(seq 1 10); do
283+
for ny in $(seq 1 10); do
284+
vx="$(echo "${nx}"*"${speed_step}" | bc --mathlib)"
285+
vy="$(echo "${ny}"*"${speed_step}" | bc --mathlib)"
286+
echo "simulate_with_drift.py '${vx}' '${vy}' --output-file='speed_idx_${nx}_${ny}.dat'" >> ${filename}
287+
done
288+
done
289+
}
290+
291+
generate_submission_script() {
292+
local submission_script="${1}"
293+
local command_script="${2}"
294+
295+
local n_commands="$(cat ${command_script} | wc --lines)"
296+
local max_task_id="$((${n_commands} - 1))"
297+
298+
cat > job_array_script.sh <<EOF
299+
#!/bin/bash --login
300+
#SBATCH --job-name=parametric_analysis
301+
#SBATCH --array=0-${max_task_id}%${max_parallel_tasks}
302+
#SBATCH --partition=batch
303+
#SBATCH --qos=normal
304+
#SBATCH --nodes=1
305+
#SBATCH --ntasks-per-node=1
306+
#SBATCH --cpus-per-task=16
307+
#SBATCH --time=0-10:00:00
308+
#SBATCH --output=%x-%A_%a.out
309+
#SBATCH --error=%x-%A_%a.err
310+
311+
module load lang/Python
312+
313+
declade command="\$(sed "\${SLURM_ARRAY_TASK_ID}"'!d' ${command_script})"
314+
315+
echo "Running commnand: \${command}"
316+
eval "srun python \${command}"
317+
EOF
318+
}
319+
320+
generate_commands 'commands.sh'
321+
generate_submission_script 'job_array_script.sh' 'commands.sh'
322+
323+
sbatch job_array_script.sh
324+
```
235325

236-
```
237-
#!/bin/bash --login
238-
#SBATCH --job-name=array_script
239-
#SBATCH --array=10-30:10
240-
#SBATCH --partition=batch
241-
#SBATCH --qos=normal
242-
#SBATCH --nodes=4
243-
#SBATCH --ntasks-per-node=8
244-
#SBATCH --cpus-per-task=16
245-
#SBATCH --time=02:00:00
246-
#SBATCH --output=%A_%a.out
247-
#SBATCH --error=%A_%a.err
248-
249-
declare test_duration=${SLURM_ARRAY_TASK_ID}
250-
251-
srun \
252-
--nodes=1 \
253-
--ntasks=1 \
254-
stress-ng \
255-
--cpu ${SLURM_CPUS_PER_TASK} \
256-
--timeout "${test_duration}"
326+
Run the `launch_parammetric_analysis.sh` script with the bash command.
257327

328+
```console
329+
bash launch_parammetric_analysis.sh
258330
```
259331

260-
Additionally you can specify the maximum number of concurrent running tasks from the job array by ising a `%` separator for example `--array=0-31%4` will limit the number of simultaneously running tasks from this job array to 4. Note that the minimum index value is zero and the maximum value is a Slurm configuration parameter (MaxArraySize minus one).
261-
262-
263-
264-
??? info "Additional enviroment variables for Job Arrays"
265-
Job arrays will have additional environment variables set
266-
267-
268-
269-
332+
!!! info "Avoiding script generation"
333+
Script generation is a complex and error prone command. In this example script generation is unavoidable, as the whole parametric analysis cannot run in a single job of the [`normal` QoS](/slurm/qos/#available-qoss) which has the default maximum wall time (`MaxWall`) of 2 days. The expected runtime on each simulation would be about $0.25$ to $0.5$ of the maximum wall time (`--time`) which is set at 10 hours.
270334

335+
If all the parametric analysis can run within the 2 day limit, then consider running the analysis in a single allocation using [GNU parallel](/jobs/gnu-parallel/). You can then generate the command file and lauch the simulation all from a single script in a single job allocation.

0 commit comments

Comments
 (0)