Description
Hey all,
This is just a thought for the SLURMCluster
for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, the cancel_command
in the SLURMJob
class is a bare "scancel"
.
dask-jobqueue/dask_jobqueue/slurm.py
Line 15 in 8713202
This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as CANCELLED
. Instead, if the command were scancel --signal=SIGTERM
the job would be marked as COMPLETED
. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.
In the simple case, however, I think this could be implmented with a simple change of cancel_command
to:
class SLURMJob(Job):
# Override class variables
submit_command = "sbatch"
cancel_command = "scancel --signal=SIGTERM"
config_name = "slurm"
It'd be great to get some more thoughts on the implications for this.