Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-35351

Rerun the running job to other availalbe slaves if connection channel lost in current slave

    • Icon: Improvement Improvement
    • Resolution: Won't Do
    • Icon: Critical Critical
    • core
    • None
    • Ubuntu 14.04 server 64bit
      Jenkins 1.651.1
      oracle-java7 1.7.0_80

      Sometimes during the build,
      we randomly run into issues of:
      java.io.IOException: Unexpected termination of the channel
      or
      hudson.remoting.ChannelClosedException: channel is already closed
      or
      java.io.EOFException

      where Jenkins master lost the connection channel
      to the Jenkins slaves which are running the build job.
      When this happened, the build job is failed.
      This is inconvenient for long-build nightly jobs.

      Instead of failing the job,
      the Jenkins master to re-dispatch the same job (same build number / parameters / configurations) to the next available Jenkins slaves,
      and re-start over.

          [JENKINS-35351] Rerun the running job to other availalbe slaves if connection channel lost in current slave

          Oleg Nenashev added a comment -

          This use-case is solved by https://wiki.jenkins-ci.org/display/JENKINS/Naginator+Plugin. I doubt it makes sense to have similar functionality in the core

          Oleg Nenashev added a comment - This use-case is solved by https://wiki.jenkins-ci.org/display/JENKINS/Naginator+Plugin . I doubt it makes sense to have similar functionality in the core

          Rick Liu added a comment - - edited

          But Naginator plugin would re-trigger the build after the build failure,

          so you would still get a build failure first.

          I was thinking if this can be handled in the core,

          then when it's a Jenkins system type of failure,

          it would automatic retry the next slave available or goes back to queue.

           

          And instead of let the build failed, reschedule the build to use next new build number,

          we get to use the same build number.

          Rick Liu added a comment - - edited But Naginator plugin would re-trigger the build after the build failure, so you would still get a build failure first. I was thinking if this can be handled in the core, then when it's a Jenkins system type of failure, it would automatic retry the next slave available or goes back to queue.   And instead of let the build failed, reschedule the build to use next new build number, we get to use the same build number.

            Unassigned Unassigned
            totoroliu Rick Liu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: