Quickly detecting and restarting a job if the job's slave disconnects

This issue is archived. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      During the day, I run lots of Jenkins slaves. During the evening, I use AWS to autoscale down the number of slaves I'm using. AWS simply terminates the instances. Jenkins probably would call this a "channel disconnect".

      I noticed that any jobs which are running when the slave is killed off hang for a really long time. For example, the link below shows a job which had a 10 minute timeout set. I kill the job off at the 24 second mark, but the job hangs up until the 10 minute mark where Jenkins timeout plugin detects a timeout.. but then it spends the next 7 minutes hanging until Jenkins realizes the channel is disconnected.

      https://gist.github.com/blockjon/6358b4124935fa4e72ba8a7d5bd12291

      What's a better way to have jobs be stopped and/or restarted if the slave they are running on is disconnected quickly?

      Desired Behavior:

      Jenkins detects the channel is disconnected within 30 seconds. It proceeds to restart the job via another healthy node.

            Assignee:
            Unassigned
            Reporter:
            Jon B
            Archiver:
            Jenkins Service Account

              Created:
              Updated:
              Resolved:
              Archived: