[JENKINS-43781] Quickly detecting and restarting a job if the job's slave disconnects - Jenkins Jira

Type: Bug
Resolution: Duplicate
Priority: Minor
Component/s: core, remoting
Labels:
None
Environment:
Jenkins 2.46.1

Similar Issues:
Powered by SuggestiMate

Show

During the day, I run lots of Jenkins slaves. During the evening, I use AWS to autoscale down the number of slaves I'm using. AWS simply terminates the instances. Jenkins probably would call this a "channel disconnect".

I noticed that any jobs which are running when the slave is killed off hang for a really long time. For example, the link below shows a job which had a 10 minute timeout set. I kill the job off at the 24 second mark, but the job hangs up until the 10 minute mark where Jenkins timeout plugin detects a timeout.. but then it spends the next 7 minutes hanging until Jenkins realizes the channel is disconnected.

https://gist.github.com/blockjon/6358b4124935fa4e72ba8a7d5bd12291

What's a better way to have jobs be stopped and/or restarted if the slave they are running on is disconnected quickly?

Desired Behavior:

Jenkins detects the channel is disconnected within 30 seconds. It proceeds to restart the job via another healthy node.

duplicates

JENKINS-49707 Auto retry for elastic agents after channel closure

Resolved

Assignee:: Unassigned

Reporter:: Jon B

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2017-04-22 14:56

Updated:: 2021-12-22 21:21

Resolved:: 2021-12-22 21:21

Details

Description

Attachments

Issue Links

Activity

People

Dates