You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After fixing the configuration of the compute nodes in a Slurm cluster
and set the CPU as consumable resource we should also fix job submission
in the integration tests.
In order to properly test the scale up a single job submission should
allocate all the slots available in a compute node.
The fix has been tested.
Stage 1: two jobs submitted (one running and one pending)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3 compute job2.sh centos PD 0:00 1 (Resources)
2 compute job1.sh centos R 5:18 1 ip-10-0-82-245
- one nodes with the 2 CPUs allocated
[centos@ip-10-0-235-160 ~]$ scontrol show nodes --all
NodeName=ip-10-0-82-245 Arch=x86_64 CoresPerSocket=1
CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.11
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=ip-10-0-82-245 NodeHostName=ip-10-0-82-245 Version=16.05
OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3022 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
BootTime=2018-07-31T14:37:49 SlurmdStartTime=2018-07-31T14:41:31
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Stage 2: the second compute node join the cluster and the two jobs are
both running on two different hosts:
[centos@ip-10-0-235-160 ~]$ squeue --states=all
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 compute job1.sh centos R 6:14 1 ip-10-0-82-245
3 compute job2.sh centos R 0:34 1 ip-10-0-121-16
[centos@ip-10-0-235-160 ~]$ scontrol show nodes --all
NodeName=ip-10-0-82-245 Arch=x86_64 CoresPerSocket=1
CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.11
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=ip-10-0-82-245 NodeHostName=ip-10-0-82-245 Version=16.05
OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3022 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
BootTime=2018-07-31T14:37:49 SlurmdStartTime=2018-07-31T14:41:31
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
NodeName=ip-10-0-121-16 Arch=x86_64 CoresPerSocket=1
CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=0.37
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=ip-10-0-121-16 NodeHostName=ip-10-0-121-16 Version=(null)
OS=Linux RealMemory=3711 AllocMem=0 FreeMem=3035 Sockets=2 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=14989 Weight=1 Owner=N/A
MCS_label=N/A
BootTime=2018-07-31T14:43:46 SlurmdStartTime=2018-07-31T14:47:35
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Signed-off-by: Maurizio Melato <[email protected]>
0 commit comments