Skip to content

Conversation

m0n5t3r
Copy link
Contributor

@m0n5t3r m0n5t3r commented Feb 23, 2021

This fixes the issue from #3 (comment):
Screenshot from 2021-02-23 18-33-02

Basically, if you have 3 (or more) upstream workflows, each of them will trigger a run of the workflow calling this action (let's call it "Deploy", as in the image); the current code looks for other in progress or queued runs of the same workflow for the same hash as the current one, and cancels all but the first started one.

This creates a race condition, because:

  • when the last of the upstream workflows finishes, it fires up a Deploy workflow, which gets queued for a random amount of time, depending on runner availability
  • in the mean time, the oldest Deploy figures out it can move forward because it checks every 5 seconds by default, moves on and finishes (in this case it fires a webhook payload to a Jenkins machine, but it could be anything that doesn't take much time)
  • when the last started Deploy finally gets to run, it looks for other runs in progress or queued, figures out it's alone and happily fires up a second time, because all conditions are met

This PR adds completed successful workflows of the same SHA as the current run to the list of cancellable runs, allowing the action to cancel the current workflow if it already ran successfully for the same commit before.

@m0n5t3r
Copy link
Contributor Author

m0n5t3r commented Feb 25, 2021

is everything ok with this?

core.debug(inspect(prime))

for (const run of cancellable) {
if (run.status === 'completed') continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this would work ...

but, wouldn't it better to first check if there is a completed run FIRST (before getting this deep) and if one is already completed, exit early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went for the least amount of logic changes that would do the trick, my test version looked like this (sort and map no longer needed, just filter the runs and if the list is not empty just cancel the current workflow - it proved very reliable in my case, but I didn't want to force my luck :) ):

    // filter
    const current_runs = workflow_runs
      // exclude this one
      .filter(run => run.id !== run_id)
      // filter to only runs for the same commit
      .filter(run => run.head_sha === sha)
      // filter out unsuccessful completed runs (cancelled / failed)
      .filter(run => (run.status !== 'completed') || (run.conclusion === 'success'))
  
    core.info(`found ${current_runs.length} existing runs of workflow ${workflow_id} for sha ${sha}`)
  
    if(current_runs.length > 0) {
      core.info('successful or in-progress runs found, bailing out')
      await octokit.request('POST /repos/{owner}/{repo}/actions/runs/{run_id}/cancel', {
        ...github.context.repo,
        run_id: run_id
      })
    }

@ahmadnassri
Copy link
Owner

is everything ok with this?

sorry I missed the notification for this, reviewed now

@m0n5t3r m0n5t3r requested a review from ahmadnassri February 26, 2021 15:56
@GiuseppeChiesa-TomTom
Copy link

I'm also facing the same issue covered by this PR.
Can this implementation be merged?

@bickelj
Copy link

bickelj commented Jan 9, 2025

@m0n5t3r or @GiuseppeChiesa-TomTom did you find a better solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants