Skip to content

More graceful job cancellation  #640

Open
@AlecThomson

Description

@AlecThomson

Hey all,

This is just a thought for the SLURMCluster for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, the cancel_command in the SLURMJob class is a bare "scancel".

cancel_command = "scancel"

This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as CANCELLED. Instead, if the command were scancel --signal=SIGTERM the job would be marked as COMPLETED. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.

In the simple case, however, I think this could be implmented with a simple change of cancel_command to:

class SLURMJob(Job):
    # Override class variables
    submit_command = "sbatch"
    cancel_command = "scancel --signal=SIGTERM"
    config_name = "slurm"

It'd be great to get some more thoughts on the implications for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions