The pipeline batch command failed 3 out of 4 times and hang mostly after a long command. Both master and slave node are waiting for each other. Not sure it's the same, but here's what I have:

      • Jenkins 2.51
      • Windows 10 slave
      • Linux Master (CentOS 7)
      • pipeline script from SCM
      • Build Trigger is Poll SCM (manual trigger build does not have this behavior and complete successfully)
      • Mercurial SCM
      • The session is lock during the job is executing (user is still logon and slave is still available)
      • Seem to always happen on long batch command (short one doesn't display this behavior or maybe it's just less likely)
      • The project is parametrized for pipeline script repos and revision (default value are provided and the proper checkout is made).
      • Seem like the command complete successfully I see the final data into the log but it look like the master/slave doesn't known the batch command have terminated
      • I use the following syntax:

       

      bat returnStatus: false, script: 'msbuild ...'

      I cannot stop/cancel the build. I have to restart the master to unjam the slave and master (killing the slave client doesn't do anything either).

      Here's the last things into the console log:

      18:00:58 
      18:00:58 Build succeeded.
      18:00:58     0 Warning(s)
      18:00:58     0 Error(s)
      18:00:58 
      18:00:58 Time Elapsed 00:15:41.55

      which is correct, indicate to me that the msbuild command finished properly.

      This is a total show stopper, we cannot have any more CI with this behavior, we always have to restart the master. Make us wonder if we should start looking for an alternative (I have report this issue into the forum thread, without any answer 3 times already). The batch command seem to hang for many people if I see the bug listing, we all have different system and setup, but they all are related to the batch command seem like a nightmare for hang. Some are marked as resolved and many are still open.

          [JENKINS-42988] Batch command hang upon completion

          Jerome Godbout created issue -
          Jerome Godbout made changes -
          Attachment New: Log [Jenkins].pdf [ 36721 ]
          Attachment New: System Information [Jenkins].pdf [ 36722 ]
          Attachment New: Thread dump [Jenkins].pdf [ 36723 ]

          I have attach the Master log, system info and thread dump. There's nothing special into the master host dmesg.

          Jerome Godbout added a comment - I have attach the Master log, system info and thread dump. There's nothing special into the master host dmesg.
          Andrew Bayer made changes -
          Component/s New: workflow-durable-task-step-plugin [ 21715 ]
          Component/s Original: batch-task-plugin [ 15505 ]
          Component/s Original: pipeline [ 21692 ]
          Andrew Bayer made changes -
          Component/s Original: windows-slaves-plugin [ 18327 ]
          Andrew Bayer made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ]

          Jesse Glick added a comment -

          Probably a duplicate of one of the existing issues in this component; awaiting steps to reproduce from scratch and/or a Windows expert.

          Jesse Glick added a comment - Probably a duplicate of one of the existing issues in this component; awaiting steps to reproduce from scratch and/or a Windows expert.
          Jesse Glick made changes -
          Component/s New: durable-task-plugin [ 18622 ]
          Component/s Original: workflow-durable-task-step-plugin [ 21715 ]
          Labels Original: bat batch jenkins pipeline slave windows New: windows

          Still happen on 2.56

          Yeah probably a duplicate of many Windows hang communication between master/slave, some of them have been open a long time ago.

          Maybe a quick workaround is to have a dead lock checker (is both, slave/master waiting for each other) and stop/cancel the build. At least until problem is resolved for real. Right now it put a slave into a busy state that no more can be used and it jam the CI totally, which render the system useless for CI. I have to reboot the master everyday.

          Jerome Godbout added a comment - Still happen on 2.56 Yeah probably a duplicate of many Windows hang communication between master/slave, some of them have been open a long time ago. Maybe a quick workaround is to have a dead lock checker (is both, slave/master waiting for each other) and stop/cancel the build. At least until problem is resolved for real. Right now it put a slave into a busy state that no more can be used and it jam the CI totally, which render the system useless for CI. I have to reboot the master everyday.

          related issue could be (still open into critical, major):

          https://issues.jenkins-ci.org/browse/JENKINS-28759   (since 2015/06)

          https://issues.jenkins-ci.org/browse/JENKINS-33164 (since 2016/02)

          They all seem to be related to batch command return. Either the slave doesn't catch it properly, doesn't communicate it properly or Master doesn't handle the answer properly. Also seem to happen for very long batch command it that might help.

          Jerome Godbout added a comment - related issue could be (still open into critical, major): https://issues.jenkins-ci.org/browse/JENKINS-28759    (since 2015/06) https://issues.jenkins-ci.org/browse/JENKINS-33164  (since 2016/02) They all seem to be related to batch command return. Either the slave doesn't catch it properly, doesn't communicate it properly or Master doesn't handle the answer properly. Also seem to happen for very long batch command it that might help.

            Unassigned Unassigned
            jerome_godbout Jerome Godbout
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: