Skip to content

Commit 09bfdb0

Browse files
authored
Merge pull request #12 from stackhpc/partition-maxtime
Partition maxtime
2 parents 96c74b2 + 09d685e commit 09bfdb0

File tree

3 files changed

+19
-2
lines changed

3 files changed

+19
-2
lines changed

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,19 @@ Role Variables
1313

1414
`openhpc_slurm_control_host`: ansible host name of the controller e.g `"{{ groups['cluster_control'] | first }}"`
1515

16-
`openhpc_slurm_partitions`: list of slurm partitions
16+
`openhpc_slurm_partitions`: list of one or more slurm partitions. Each partition may contain the following values:
17+
* `groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here.
18+
Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object.
19+
* `name`: The name of the nodes within this group.
20+
* `cluster_name`: Optional. An override for the top-level definition `openhpc_cluster_name`.
21+
* `num_nodes`: Nodes within the group are assumed to number `0:num_nodes-1`.
22+
* `ram_mb`: Optional. The physical RAM available in each server of this group.
23+
Compute node hostnames are assumed to take the form: `cluster_name-group_name-{0..num_nodes-1}`
24+
* `default`: Optional. A boolean flag for whether this partion. Valid settings are `YES` and `NO`.
25+
* `maxtime`: Optional. A partition-specific time limit in hours, minutes and seconds. The default value is
26+
`openhpc_job_maxtime`, which defaults to `24:00:00`.
27+
28+
`openhpc_job_maxtime`: A maximum time job limit in hours, minutes and seconds. The default is `24:00:00`.
1729

1830
`openhpc_cluster_name`: name of the cluster
1931

defaults/main.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ openhpc_packages: []
88
openhpc_drain_timeout: 86400
99
openhpc_resume_timeout: 300
1010
openhpc_retry_delay: 10
11+
openhpc_job_maxtime: 24:00:00
1112
openhpc_enable:
1213
control: false
1314
batch: false

templates/slurm.conf.j2

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,11 @@ NodeName={{group.cluster_name|default(openhpc_cluster_name)}}-{{group.name}}-[0-
121121
{% endfor %}
122122
{% endfor %}
123123
{% for part in openhpc_slurm_partitions %}
124-
PartitionName={{part.name}} Nodes={% for group in part.get('groups', [part]) %}{{group.cluster_name|default(openhpc_cluster_name)}}-{{group.name}}-[0-{{group.num_nodes|int-1}}]{% if not loop.last %},{% endif %}{% endfor %} Default=YES MaxTime=24:00:00 State=UP
124+
PartitionName={{part.name}} \
125+
Nodes={% for group in part.get('groups', [part]) %}{{group.cluster_name|default(openhpc_cluster_name)}}-{{group.name}}-[0-{{group.num_nodes|int-1}}]{% if not loop.last %},{% endif %}{% endfor %} \
126+
Default={% if 'default' in part %}{{ part.default }}{% else %}YES{% endif %} \
127+
MaxTime={% if 'maxtime' in part %}{{ part.maxtime }}{% else %}{{ openhpc_job_maxtime }}{% endif %} \
128+
State=UP
125129
{% endfor %}
126130
# Want nodes that drop out of SLURM's configuration to be automatically
127131
# returned to service when they come back.

0 commit comments

Comments
 (0)