Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-43781

Quickly detecting and restarting a job if the job's slave disconnects

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor Minor
    • core, remoting
    • None
    • Jenkins 2.46.1

      During the day, I run lots of Jenkins slaves. During the evening, I use AWS to autoscale down the number of slaves I'm using. AWS simply terminates the instances. Jenkins probably would call this a "channel disconnect".

      I noticed that any jobs which are running when the slave is killed off hang for a really long time. For example, the link below shows a job which had a 10 minute timeout set. I kill the job off at the 24 second mark, but the job hangs up until the 10 minute mark where Jenkins timeout plugin detects a timeout.. but then it spends the next 7 minutes hanging until Jenkins realizes the channel is disconnected.

      https://gist.github.com/blockjon/6358b4124935fa4e72ba8a7d5bd12291

      What's a better way to have jobs be stopped and/or restarted if the slave they are running on is disconnected quickly?

      Desired Behavior:

      Jenkins detects the channel is disconnected within 30 seconds. It proceeds to restart the job via another healthy node.

            Unassigned Unassigned
            piratejohnny Jon B
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: