Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28759

Batch steps on slaves randomly hang when complete

      Batch steps that succeed are hanging, more frequently since the upgrade to Jenkins 1.6.16 + WF 1.7; I think this is recent, I do not recall encountering such issues with Jenkins 1.6.09 + WF 1.5. This is highly problematic for workflow scripts that rely on large numbers of batch steps. Note, the slave nodes in question may be considered "high-latency" with response times occasionally in seconds.

      Reproduced 2 out of 4 times using the following test idiom, increasing the below loop to 1000 will probably make it a 100% reproduction, parallelizing anecdotally seems to increase reproduction:

      node('slave') {
        for( int i = 0; i < 100; ++i ) {
          echo "i=${i}"
          bat *<some batch step that takes variable time to run, eg scm or make>*
        }
      }
      

          [JENKINS-28759] Batch steps on slaves randomly hang when complete

          Jesse Glick added a comment -

          One user reporting a similar symptom has found that the root cause was actually JENKINS-29924. That does seem to result in visible stack traces in the system log, though, so I doubt it explains most of the reports.

          Jesse Glick added a comment - One user reporting a similar symptom has found that the root cause was actually JENKINS-29924 . That does seem to result in visible stack traces in the system log, though, so I doubt it explains most of the reports.

          Jesse Glick added a comment -

          Everyone observing this, please try updating to Durable Task 1.7 in case the fix of JENKINS-27419 addressed this too. I sort of doubt it, but I have no real hypothesis for what is causing this so it is possible.

          Jesse Glick added a comment - Everyone observing this, please try updating to Durable Task 1.7 in case the fix of JENKINS-27419 addressed this too. I sort of doubt it, but I have no real hypothesis for what is causing this so it is possible.

          Jesse Glick added a comment -

          I discovered a potential bug in platform-independent code which might explain at least certain cases of this issue. Some other observations like jeremyriley’s do not sound related.

          Jesse Glick added a comment - I discovered a potential bug in platform-independent code which might explain at least certain cases of this issue. Some other observations like jeremyriley ’s do not sound related.

          Jeremy Riley added a comment -

          I tried the new Durable Task and the problem I described is still present.

          Jeremy Riley added a comment - I tried the new Durable Task and the problem I described is still present.

          Anthony Burns added a comment - - edited

          I'm able to reproduce this issue on Windows Server 2012 with Jenkins 2.3 and Pipeline 2.1. This machine is the master.

          Originally I was running 5 batch scripts spread over 3 stages, the first stage would hang on the first batch script (of 2) within the first stage. I was able to force the stage to continue by manually running the jenkins-wrap.bat file from the workspace. I've managed to work around this issue for now by splitting each batch file into individual stages.

          FYI, my first thought was that it may have something to do with the first two batch scripts running from the same directory-inside a dir()-in my Jenkinsfile. However, separating the two batch scripts into individual dir's did not fix the issue, and even after separating the first two batch scripts into separate stages, two later batch scripts (sharing a stage, but not a directory) also got hung up on the first batch in the stage.

          Anthony Burns added a comment - - edited I'm able to reproduce this issue on Windows Server 2012 with Jenkins 2.3 and Pipeline 2.1. This machine is the master. Originally I was running 5 batch scripts spread over 3 stages, the first stage would hang on the first batch script (of 2) within the first stage. I was able to force the stage to continue by manually running the jenkins-wrap.bat file from the workspace. I've managed to work around this issue for now by splitting each batch file into individual stages. FYI, my first thought was that it may have something to do with the first two batch scripts running from the same directory-inside a dir()-in my Jenkinsfile. However, separating the two batch scripts into individual dir's did not fix the issue, and even after separating the first two batch scripts into separate stages, two later batch scripts (sharing a stage, but not a directory) also got hung up on the first batch in the stage.

          Gijs Kuijer added a comment -

          This one is probably related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-34150

          Gijs Kuijer added a comment - This one is probably related to this issue: https://issues.jenkins-ci.org/browse/JENKINS-34150

          Gijs Kuijer added a comment - - edited

          https://issues.jenkins-ci.org/browse/JENKINS-34150 probably resolves this issue as well.

          Gijs Kuijer added a comment - - edited https://issues.jenkins-ci.org/browse/JENKINS-34150 probably resolves this issue as well.

          Jesse Glick added a comment -

          Any known way to reproduce from scratch? There are a lot of related issues which are all probably duplicates, but it is unclear what the trigger conditions are.

          Jesse Glick added a comment - Any known way to reproduce from scratch? There are a lot of related issues which are all probably duplicates, but it is unclear what the trigger conditions are.

          mcrooney added a comment - - edited

          I wonder if this is specific to batch steps, or a general Pipeline issue, as we rarely but regularly see Pipeline hang indefinitely after shell steps are finished at:

          + exit 0
          

          They always have to be hard-killed:

          + exit 0
          Aborted by Example User
          Click here to forcibly terminate running steps
          Terminating stage
          Click here to forcibly kill entire build
          Hard kill!
          Finished: ABORTED
          

          Would this be a different bug?

          mcrooney added a comment - - edited I wonder if this is specific to batch steps, or a general Pipeline issue, as we rarely but regularly see Pipeline hang indefinitely after shell steps are finished at: + exit 0 They always have to be hard-killed: + exit 0 Aborted by Example User Click here to forcibly terminate running steps Terminating stage Click here to forcibly kill entire build Hard kill! Finished: ABORTED Would this be a different bug?

          Daniel Aguado Araujo added a comment - - edited

          Workaround: run those steps with powershell

           

          I'm affected by this bug from few days after some changes on my VM builders. I use swarm client.

          Daniel Aguado Araujo added a comment - - edited Workaround: run those steps with powershell   I'm affected by this bug from few days after some changes on my VM builders. I use swarm client.

            Unassigned Unassigned
            sumdumgai A C
            Votes:
            19 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated: