Skip to content

job is not resubmitted when instance is terminated #454

@prvg-sso

Description

@prvg-sso

Issue Details

Describe the bug
The cloud instance is configured to auto submit aborted builds due to node termination, but it doesn't.

To Reproduce

  1. create pipeline
node("windows-small") {
    while(true){
    echo "Retriger test!"
    }
}
  1. trigger it to run on AWS spot instance
  2. terminate AWS spot instance in AWS web portal while job is running

** Logs **
Jenkins job log:

09:56:58.232  Retriger test!
09:56:58.245  [Pipeline] echo
09:56:58.248  Retriger test!
09:56:58.263  EC2 instance for node AWS Windows i-0ec91fe6950f53b51 was terminated
09:56:58.270  [Pipeline] echo
09:56:58.275  Retriger test!
09:56:58.288  [Pipeline] echo
09:56:58.292  Retriger test!
09:56:58.301  [Pipeline] }
09:56:58.322  [Pipeline] // node
09:56:58.340  [Pipeline] End of Pipeline
09:56:58.513  org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 028d9fd0-de7d-4aac-b021-55f267dd9fef
09:56:58.526  Finished: ABORTED

Jenkins system log (note that timestamps are differente due to different time zones in slave and master)

Jan 17, 2025 8:56:58 AM INFO com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher afterDisconnect
DISCONNECTED: AWS Windows i-0ec91fe6950f53b51
Jan 17, 2025 8:56:58 AM INFO com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher afterDisconnect
Start retriggering executors for AWS Windows i-0ec91fe6950f53b51
Jan 17, 2025 8:56:58 AM SEVERE hudson.slaves.SlaveComputer$1 onClosed
Launcher com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher@2876def1s afterDisconnect method propagated an exception when {1}s connection was closed: Cannot invoke "org.jenkinsci.plugins.workflow.job.WorkflowRun.getActions(java.lang.Class)" because "failedBuild" is null
java.lang.NullPointerException: Cannot invoke "org.jenkinsci.plugins.workflow.job.WorkflowRun.getActions(java.lang.Class)" because "failedBuild" is null
	at PluginClassLoader for ec2-fleet//com.amazon.jenkins.ec2fleet.EC2FleetAutoResubmitComputerLauncher.afterDisconnect(EC2FleetAutoResubmitComputerLauncher.java:106)
	at hudson.slaves.SlaveComputer$1.onClosed(SlaveComputer.java:650)
	at hudson.remoting.Channel.terminate(Channel.java:1143)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:90)

Role attached in AWS:

"ec2:*": This grants full access to all EC2 actions, including DescribeInstances, TerminateInstances, and DescribeSpotFleetRequests. This would allow the Jenkins EC2 Fleet plugin to perform any action on EC2 resources.

Environment Details

Plugin Version?
EC2 Fleet 3.2.0

Jenkins Version?
Jenkins 2.462.1

Spot Fleet or ASG?
Spot Fleet

Label based fleet?
No

Linux or Windows?
identical behaviour for linux/windows slaves. Jenkins master runs linux.

EC2Fleet Configuration as Code

Cloud AWS Windows Configuration

Name
AWS Windows

Select AWS Credentials or leave set to none to use AWS EC2 Instance Role
AWS Credentials
- none -

Region
eu-central-1 EU (Frankfurt)

Endpoint like https://ec2.us-east-2.amazonaws.com
Endpoint
- empty -

Fleet list will be available once region and credentials are specified. Only maintain supported, see help
EC2 Fleet
Auto Scaling Group - jenkins-spot-agents-windows-small


[ ] Show all fleets
Launcher
Launch agents via SSH
Credentials
jenkins/****** (jenkins windows ssh)

Host Key Verification Strategy
Non verifying Verification Strategy

Connect to instances via private IP instead of public IP
[x] Private IP
Always reconnect to offline nodes after instance reboot or connection loss
[x] Always Reconnect

Only build jobs with label expressions matching this node
[x] Restrict Usage

Labels to add to instances in this fleet
Label
ec2-fleet windows-small

Default is /tmp/jenkins-
Jenkins Filesystem Root
C:\Jenkins

Testing Number of executors per instance
Number of Executors
4

Scale Executors
No scaling

How long to keep an idle node. If set to 0, never scale down
Max Idle Minutes Before Scaledown
0

Minimum Cluster Size
1

Maximum Cluster Size
1

Minimum Spare Size
0

Maximum Total Uses
-1

Disable auto resubmitting a build if it failed due to an EC2 instance termination like a Spot interruption
[ ] Disable Build Resubmit

Maximum time to wait for EC2 instance startup
Maximum Init Connection Timeout in sec
180

Interval for updating EC2 cloud status
Cloud Status Interval in sec
10

Enable faster provision when queue is growing
[ ] No Delay Provision Strategy

Anything else unique about your setup?
<Yes…/No>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugstaleIssues / PRs with no activity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions