When batch task is included in a Pipeline job it will hang on completion of the batch task. I can see in task manager that the job startup it logs data to jenkins-log.txt. The batch completes and I see in task manager that the batch is nolonger running. But Jenkins is still waiting for the task to complete. I do not see jenkins-result.txt writen to the workspace tmp durabletask directory. If I create the file manually or run the workflow-wrap.bat manually the task completes. This is an itermitent bug. Task might work 3 time then fail 5 times then work 8 times. No change to the system during this time. I am setting the job to run every min to see what the stats look like for longer run.

      job:

      node { 
       bat 'ping 127.0.0.1 -n 10'
       echo 'batch completed'
      }
      

      Could be any command you want ping is just an easy one to have it take a little bit of time. And require nothing else installed on machine.

      I see many other task like this I have tested on serveral differnt machines using a base install of Jenkins.

        1. jenkins2-pipeline-batch-nohang.png
          jenkins2-pipeline-batch-nohang.png
          98 kB
        2. jenkins2-pipeline-batch-hang.png
          jenkins2-pipeline-batch-hang.png
          109 kB
        3. How.png
          How.png
          321 kB
        4. durable-task.hpi
          41 kB
        5. durable-task.hpi
          41 kB

          [JENKINS-34150] Pipeline Batch hangs

          nitram Thanks for the info. I'm going to move the master to a Windows box too as it's the only thing I see my environment differing from yours.

          Antonio Muñiz added a comment - nitram Thanks for the info. I'm going to move the master to a Windows box too as it's the only thing I see my environment differing from yours.

          The issue is reproducible only if two batch durable tasks run concurrently on the master node.

          Antonio Muñiz added a comment - The issue is reproducible only if two batch durable tasks run concurrently on the master node.

          For some reason the result file is not being created, this line must be failing to execute (but I can not find any log):

          echo %ERRORLEVEL% > "[JENKINS_HOME]\my-job@tmp\durable-acc8d5a4\jenkins-result.txt"
          

          While the build was hanging, I manually executed jenkins-wrap.bat and it made the execution to finish (as the result file was created).

          Trying to see why the result file is not being written.

          Antonio Muñiz added a comment - For some reason the result file is not being created, this line must be failing to execute (but I can not find any log): echo %ERRORLEVEL% > "[JENKINS_HOME]\my-job@tmp\durable-acc8d5a4\jenkins-result.txt" While the build was hanging, I manually executed jenkins-wrap.bat and it made the execution to finish (as the result file was created). Trying to see why the result file is not being written.

          Daniel Beck added a comment -

          Wild guess: Spaces in JENKINS_HOME path when installing using the installer?

          Daniel Beck added a comment - Wild guess: Spaces in JENKINS_HOME path when installing using the installer?

          No. It's reproducible for me without spaces in the path (and didn't use the installer but direct java -jar mode).

          Antonio Muñiz added a comment - No. It's reproducible for me without spaces in the path (and didn't use the installer but direct java -jar mode).

          Not a regression in core at least, reproduced in 1.651.1 (Pipeline Durable Task Step 2.0 + Durable Task 1.9)

          Antonio Muñiz added a comment - Not a regression in core at least, reproduced in 1.651.1 (Pipeline Durable Task Step 2.0 + Durable Task 1.9)

          Antonio, If you apply my pull request (20) you will see the logs for the jenkins-wraper.bat to include the echo command run. But you will no longer experince the error. Just as you will not enconter the error if you apply Martins pull (21). The main action that both change is that a reference to Launcher.ProcStarter ps is maintained after doLaunch() is called. This to me speaks of a possible GC issue where the Proc is destroyed before task completes. And that causes the wraper to not finish. But when I watch performance monitor in windows as the job is runing I don't see the command prompts being killed early. So may locking issue as Martin mentioned earlier. Where checking for the existince of the result file prevents the result from being created. But that does not explain why keeping a reference to the PS instance causes the bug to no longer happen. That speaks more to something happening due to GC. If this was C I would call it a use after free errror. Where PS is nolonger reference but expected to continue to do things.

          Daniel Daugherty added a comment - Antonio, If you apply my pull request (20) you will see the logs for the jenkins-wraper.bat to include the echo command run. But you will no longer experince the error. Just as you will not enconter the error if you apply Martins pull (21). The main action that both change is that a reference to Launcher.ProcStarter ps is maintained after doLaunch() is called. This to me speaks of a possible GC issue where the Proc is destroyed before task completes. And that causes the wraper to not finish. But when I watch performance monitor in windows as the job is runing I don't see the command prompts being killed early. So may locking issue as Martin mentioned earlier. Where checking for the existince of the result file prevents the result from being created. But that does not explain why keeping a reference to the PS instance causes the bug to no longer happen. That speaks more to something happening due to GC. If this was C I would call it a use after free errror. Where PS is nolonger reference but expected to continue to do things.

          Martin Karing added a comment -

          dpd_30: Actually it does explain why it works when maintaining the reference to the process causes the bug to disappear. My pull requests works because it does not monitor the presence of the file, but it waits until the process is terminated and only after the process is no longer present is looks for the file. This way there is no file checking done as long as the process is active and the batch file is able to create the result file without any problems.

          Martin Karing added a comment - dpd_30 : Actually it does explain why it works when maintaining the reference to the process causes the bug to disappear. My pull requests works because it does not monitor the presence of the file, but it waits until the process is terminated and only after the process is no longer present is looks for the file. This way there is no file checking done as long as the process is active and the batch file is able to create the result file without any problems.

          This way there is no file checking done as long as the process is active

          Right. As I noted in the PR, keeping that reference is just what this plugin is trying to avoid.

          I think the GC theory is probably the culprit. Perhaps if we keep a transient instance field private transient Launcher.ProcStarter ps in WindowsBatchScript it is prevented to be collected. I'm currently testing this option.

          Antonio Muñiz added a comment - This way there is no file checking done as long as the process is active Right. As I noted in the PR, keeping that reference is just what this plugin is trying to avoid. I think the GC theory is probably the culprit. Perhaps if we keep a transient instance field private transient Launcher.ProcStarter ps in WindowsBatchScript it is prevented to be collected. I'm currently testing this option.

          No, it does not work.

          Antonio Muñiz added a comment - No, it does not work.

          Martin Karing added a comment -

          This issue is really annoying. I tried to track it with the SysInternals Process Monitor. As soon as the monitor runs the issue does not happen any more.

          Also I tried to alter the wrapping batch file to check if the file was created after writing the error code and if not try again. This does not resolve the issue. It seems like the batch file "sees" the jenkins-result.txt during it's execution.

          Martin Karing added a comment - This issue is really annoying. I tried to track it with the SysInternals Process Monitor. As soon as the monitor runs the issue does not happen any more. Also I tried to alter the wrapping batch file to check if the file was created after writing the error code and if not try again. This does not resolve the issue. It seems like the batch file "sees" the jenkins-result.txt during it's execution.

          Martin Karing added a comment -

          Okay, the reason why the loop in the batch file did not work is that something is killing the entire batch file structure before it finishes. I have no idea why this happens and I can't track it because process monitor seems to do something that stops this from happening.

          I was able to get it running by changing the script so the execution of the wrapper is done by a additional "start" command. This cause command line windows to popup all over my desktop, but it allowed the entire thing to execute properly. This approach has the massive disadvantage that there is no way to terminate the script process in case the script itself hangs or something like this because it runs fully detached and there is no reference to the process.

          On the other hand the purpose of this plugin is to allow scripts to run across restarts of Jenkins. So it has to run as a detached process from Jenkins, so the JVM doesn't tear it down along with it, but we need a serializable reference to the process so it's possible to locate it again. Just in case it is required to terminate it after a Jenkins reboot.

          Martin Karing added a comment - Okay, the reason why the loop in the batch file did not work is that something is killing the entire batch file structure before it finishes. I have no idea why this happens and I can't track it because process monitor seems to do something that stops this from happening. I was able to get it running by changing the script so the execution of the wrapper is done by a additional " start " command. This cause command line windows to popup all over my desktop, but it allowed the entire thing to execute properly. This approach has the massive disadvantage that there is no way to terminate the script process in case the script itself hangs or something like this because it runs fully detached and there is no reference to the process. On the other hand the purpose of this plugin is to allow scripts to run across restarts of Jenkins. So it has to run as a detached process from Jenkins, so the JVM doesn't tear it down along with it, but we need a serializable reference to the process so it's possible to locate it again. Just in case it is required to terminate it after a Jenkins reboot.

          KK G added a comment -

          It's related to https://issues.jenkins-ci.org/browse/JENKINS-33164. Just attach a simple repro from that bug:
          Pipeline code is:
          node('master') {
          for(int i=0; i < 100; ++i)
          { bat('echo "Hello from batch file."' + i.toString()) }}
          Click "build now" 5 times.
          All 5 jobs got stuck on windows OS. Please help. Thanks.

          KK G added a comment - It's related to https://issues.jenkins-ci.org/browse/JENKINS-33164 . Just attach a simple repro from that bug: Pipeline code is: node('master') { for(int i=0; i < 100; ++i) { bat('echo "Hello from batch file."' + i.toString()) }} Click "build now" 5 times. All 5 jobs got stuck on windows OS. Please help. Thanks.

          I was not able to reproduce the issue in a debug session and did not manage to diagnose why jenkins-wrapper.bat is not fully executed (so jenkins-result.txt is not created) and the bat step never finishes. If someone with more Windows background can throw some light here, it would be great.

          Perhaps the additional start command proposed by nitram is the less ugly fix, what others think? jglick ?

          In the meantime, the workaround is to use a build agent (other than master), even being in the same physical machine.

          Antonio Muñiz added a comment - I was not able to reproduce the issue in a debug session and did not manage to diagnose why jenkins-wrapper.bat is not fully executed (so jenkins-result.txt is not created) and the bat step never finishes. If someone with more Windows background can throw some light here, it would be great. Perhaps the additional start command proposed by nitram is the less ugly fix, what others think? jglick ? In the meantime, the workaround is to use a build agent (other than master), even being in the same physical machine.

          Martin Karing added a comment -

          I was able to track down that the batch process is forcefully terminated.
          If you run the wrapper batch by hand and close the command line window before the process finishes you get exactly the same behaviour. The main and the child processes are terminated and no files are created. The only thing that is attached to the command line actually is java. So the termination has to come from there.

          The thing I wonder is: Can all this even work across a jenkins restart? If Java terminates its child processes this would kill no command line execution no matter what.

          I think there are solutions to work around this using powershell or the scripting host. But those may be blocked on the host system.

          Martin Karing added a comment - I was able to track down that the batch process is forcefully terminated. If you run the wrapper batch by hand and close the command line window before the process finishes you get exactly the same behaviour. The main and the child processes are terminated and no files are created. The only thing that is attached to the command line actually is java. So the termination has to come from there. The thing I wonder is: Can all this even work across a jenkins restart? If Java terminates its child processes this would kill no command line execution no matter what. I think there are solutions to work around this using powershell or the scripting host. But those may be blocked on the host system.

          KK G added a comment -

          Just gave a try. "workaround is to use a build agent (other than master), even being in the same physical machine." really works! Thanks. At least, I can proceed.

          BTW, I notice that for the same machine, master node has info, "Windows Server 2012 R2 (x86)", while client has info, "Windows Server 2012 R2 (amd64)". I doubt if the bug trigger corner case related to machine architecture.

          KK G added a comment - Just gave a try. "workaround is to use a build agent (other than master), even being in the same physical machine." really works! Thanks. At least, I can proceed. BTW, I notice that for the same machine, master node has info, "Windows Server 2012 R2 (x86)", while client has info, "Windows Server 2012 R2 (amd64)". I doubt if the bug trigger corner case related to machine architecture.

          Jesse Glick added a comment -

          Can all this even work across a jenkins restart?

          If you are using a master executor, not generally. (On Unix, it works under some conditions but not others.)

          You are strongly recommended to use an agent rather than master executors in general. In particular, if you have any kind of layered security on your Jenkins installation—whereby people configuring jobs (or permitted to edit build scripts in SCM) are not Jenkins administrators—you must not have a master executor, or any pretense at security is gone. Even if only one physical computer is available, you must configure a separate service account for builds.

          All that said, if the problem can be fixed—or at least clearly diagnosed and reported—without breaking anything for the more general use case of an agent on another machine, obviously we want to apply a fix.

          Jesse Glick added a comment - Can all this even work across a jenkins restart? If you are using a master executor, not generally. (On Unix, it works under some conditions but not others.) You are strongly recommended to use an agent rather than master executors in general. In particular, if you have any kind of layered security on your Jenkins installation—whereby people configuring jobs (or permitted to edit build scripts in SCM) are not Jenkins administrators—you must not have a master executor, or any pretense at security is gone. Even if only one physical computer is available, you must configure a separate service account for builds. All that said, if the problem can be fixed—or at least clearly diagnosed and reported—without breaking anything for the more general use case of an agent on another machine, obviously we want to apply a fix.

          Lübbe Onken added a comment -

          This hanging batch bug has bitten me heavily too. Failing batch jobs always terminated, successful jobs never did.
          A working solution for me is to explicitely return an exit code from any batch call. I'm on Windows 7 Professional.

          So:
          {{
          echo 'Successful step'
          bat '''dir
          exit /B %ERRORLEVEL%'''

          echo 'Failing step'
          bat '''find /c "_no.file" "_no.file"
          exit /B %ERRORLEVEL%'''

          echo 'Never execute step'
          bat '''dir"
          exit /B %ERRORLEVEL%'''
          }}

          successfully terminates step one and returns from the batch execution, the second step fails the build and the third step never gets executed.

          Can somebody please confirm that this solution works for them too?

          Lübbe Onken added a comment - This hanging batch bug has bitten me heavily too. Failing batch jobs always terminated, successful jobs never did. A working solution for me is to explicitely return an exit code from any batch call. I'm on Windows 7 Professional. So: {{ echo 'Successful step' bat '''dir exit /B %ERRORLEVEL%''' echo 'Failing step' bat '''find /c "_no.file" "_no.file" exit /B %ERRORLEVEL%''' echo 'Never execute step' bat '''dir" exit /B %ERRORLEVEL%''' }} successfully terminates step one and returns from the batch execution, the second step fails the build and the third step never gets executed. Can somebody please confirm that this solution works for them too?

          Lübbe Onken added a comment - - edited

          Looks like I was too optimistic. The solution always worked with short running batch jobs, like dir, but it didn't with long running jobs, like a NAnt build.
          Is it possible that there is a race condition? Some state is checked very quickly after a task is started. A simple "dir" is quick enough to deliver the result in time and a slower task isn't?

          Lübbe Onken added a comment - - edited Looks like I was too optimistic. The solution always worked with short running batch jobs, like dir, but it didn't with long running jobs, like a NAnt build. Is it possible that there is a race condition? Some state is checked very quickly after a task is started. A simple "dir" is quick enough to deliver the result in time and a slower task isn't?

          That would explain the iconsistent results during my tests. Weird.
          Anyway, I've encountered this both on failed and successful jobs.

          Christophe Carpentier added a comment - That would explain the iconsistent results during my tests. Weird. Anyway, I've encountered this both on failed and successful jobs.

          Rens Hoskens added a comment - - edited

          Have the same issue on Windows Server 2008 R2. Hope it will get fixed soon (or a plain maven command would be usefull as well)

          node {
              mvn 'clean package -DskipTests=true'
          }
          
          def mvn(args) {
              bat "${tool 'Maven 3.3.9'}/bin/mvn ${args}"
          }
          

          Rens Hoskens added a comment - - edited Have the same issue on Windows Server 2008 R2. Hope it will get fixed soon (or a plain maven command would be usefull as well) node { mvn 'clean package -DskipTests=true' } def mvn(args) { bat "${tool 'Maven 3.3.9'}/bin/mvn ${args}" }

          Nick Sonneveld added a comment - - edited

          I have mentioned in other related tickets but I just want to point out that I have seen this behaviour with a linux master and multiple windows agents (with 5-10 executors on each). You could try this example code. I haven't tested it but it's similar to our Jenkinsfile where we have branches doing chunks of a test. Cancelling the job in the middle of execution sometimes puts the agents in a weird state too.

          def branches = [:]
          
          for (int i = 0; i < 64; i++) {
          	def id = "branch-${i}"
          	branches[id] = {
          		node ('windows') {
          			for (int j = 0; j < 8; j++) {
          			    bat 'ping 127.0.0.1 -n 10' 
          			}
          		}
          	}
          }
          
          parallel branches
          

          Nick Sonneveld added a comment - - edited I have mentioned in other related tickets but I just want to point out that I have seen this behaviour with a linux master and multiple windows agents (with 5-10 executors on each). You could try this example code. I haven't tested it but it's similar to our Jenkinsfile where we have branches doing chunks of a test. Cancelling the job in the middle of execution sometimes puts the agents in a weird state too. def branches = [:] for ( int i = 0; i < 64; i++) { def id = "branch-${i}" branches[id] = { node ( 'windows' ) { for ( int j = 0; j < 8; j++) { bat 'ping 127.0.0.1 -n 10' } } } } parallel branches

          Gijs Kuijer added a comment -

          I have the same exact issue on a windows server 2012 R2 with a Jenkins 2.1 installations and all plugins fully updated.
          I have installed the Github organization folder plugin to scan my organization.

          My jenkins file has a simple batch job to use MSBuild to build our project and a batch job for analysis of sonar.
          The job randomly hangs after one of these two jobs.

          Is there any progress on this issue?

          Gijs Kuijer added a comment - I have the same exact issue on a windows server 2012 R2 with a Jenkins 2.1 installations and all plugins fully updated. I have installed the Github organization folder plugin to scan my organization. My jenkins file has a simple batch job to use MSBuild to build our project and a batch job for analysis of sonar. The job randomly hangs after one of these two jobs. Is there any progress on this issue?

          I have this issue also. my batch invocation is:

          bat '''
          call %BUILD_CONFIG_PATH%
          setenv.cmd
          perl <custom build manager script that typically runs for 90 minutes>
          '''

          This has broken the job completely. I cannot terminate the job, nor can i start another (this is an incremental build with a fixed workspace location so i dont want to run concurrently). If i restart jenkins the job restarts and immediately hangs again.

          Are there any workaround short of full uninstall/reinstall of jenkins to recover this job setup?

          maaltan natlaam added a comment - I have this issue also. my batch invocation is: bat ''' call %BUILD_CONFIG_PATH% setenv.cmd perl <custom build manager script that typically runs for 90 minutes> ''' This has broken the job completely. I cannot terminate the job, nor can i start another (this is an incremental build with a fixed workspace location so i dont want to run concurrently). If i restart jenkins the job restarts and immediately hangs again. Are there any workaround short of full uninstall/reinstall of jenkins to recover this job setup?

          Wilson Tian added a comment - - edited

          I encounter this issue too. I'm running a maven job using

          bat "${mavenHome}\\bin\\mvn clean package"

          . But the job always hangs at last and never exits.
          Is there any workaround?

          Wilson Tian added a comment - - edited I encounter this issue too. I'm running a maven job using bat "${mavenHome}\\bin\\mvn clean package " . But the job always hangs at last and never exits. Is there any workaround?

          I found a workaround (better than reinstalling at least).

          1. Shutdown jenkins service
          2. go to <install>/jobs/<jobname>/ and delete the <jobnumber> folder.
          3. Restart jenkins.

          There is probably a flag somewhere in that folder you can set to prevent the job from "restarting" after restart.

          Since ive hit this bug about 80% of the runs I've tried so far. this workaround is unusable in any kind of production environment. At least you can run the job again though. I guess another workaround would be allow multiple instances of the job to run and clean up the zombies once a day or something?

          Also, it seems this happens more when i view the console output via jenkins ui while the job is running.

          maaltan natlaam added a comment - I found a workaround (better than reinstalling at least). 1. Shutdown jenkins service 2. go to <install>/jobs/<jobname>/ and delete the <jobnumber> folder. 3. Restart jenkins. There is probably a flag somewhere in that folder you can set to prevent the job from "restarting" after restart. Since ive hit this bug about 80% of the runs I've tried so far. this workaround is unusable in any kind of production environment. At least you can run the job again though. I guess another workaround would be allow multiple instances of the job to run and clean up the zombies once a day or something? Also, it seems this happens more when i view the console output via jenkins ui while the job is running.

          Nick Sonneveld added a comment - - edited

          Another workaround that doesn't involve deleting jobs (but also isn't a long term solution) is to realise that batch steps create two batch files in the @tmp directory (which is relative to where batch is run, so it might be in the workspace if you've changed the directory, or just outside it): a jenkins-main.bat and a jenkins-wrap.bat. The main bat file contains your commands. The wrap bat file will run the main bat, pipe output to a log file and finally writes a result file.

          The bug concerns the wrap bat file not completing so the result file is never written. You can search for the file and run the final line manually (looks like echo %errorlevel% > ...\jenkins-result.txt), or run the wrap batch file again if you don't mind it performing the same operation again.

          Nick Sonneveld added a comment - - edited Another workaround that doesn't involve deleting jobs (but also isn't a long term solution) is to realise that batch steps create two batch files in the @tmp directory (which is relative to where batch is run, so it might be in the workspace if you've changed the directory, or just outside it): a jenkins-main.bat and a jenkins-wrap.bat. The main bat file contains your commands. The wrap bat file will run the main bat, pipe output to a log file and finally writes a result file. The bug concerns the wrap bat file not completing so the result file is never written. You can search for the file and run the final line manually (looks like echo %errorlevel% > ...\jenkins-result.txt), or run the wrap batch file again if you don't mind it performing the same operation again.

          cmd /c ""<script> > ".../jenkins-log.txt"" 2>&1
          echo %ERRORLEVEL% > "...\..@tmp\durable-cf7a3b23\jenkins-result.txt"

          Perhaps using "call" will work better. that will leverage the current cmd shell to execute the batch. Ive found it more stable than launching a second cmd from a batch file. "start" is another option. That gives you a subshell that is detached from the main shell. There are parameters that prevent that though. other bonuses of start is ability to set process priorities,etc.

          If you know of the jar/class i need to hack off hand to make this change, i'll try to give it a shot today.

          I am probably going to end up grabbing the jenkins source at some point but that will probably be later next week if then.

          maaltan natlaam added a comment - cmd /c ""<script> > ".../jenkins-log.txt"" 2>&1 echo %ERRORLEVEL% > "...\..@tmp\durable-cf7a3b23\jenkins-result.txt" Perhaps using "call" will work better. that will leverage the current cmd shell to execute the batch. Ive found it more stable than launching a second cmd from a batch file. "start" is another option. That gives you a subshell that is detached from the main shell. There are parameters that prevent that though. other bonuses of start is ability to set process priorities,etc. If you know of the jar/class i need to hack off hand to make this change, i'll try to give it a shot today. I am probably going to end up grabbing the jenkins source at some point but that will probably be later next week if then.

          There is a pull request being worked on by Martin Karing you might want to look at and comment on. Link is attached to this issue https://github.com/jenkinsci/durable-task-plugin/pull/21

          Nick Sonneveld added a comment - There is a pull request being worked on by Martin Karing you might want to look at and comment on. Link is attached to this issue https://github.com/jenkinsci/durable-task-plugin/pull/21

          First off, i found a better workaround. First go to the console screen for the job. click the abort button in upper right area. scroll to bottom and wait about 10 seconds. You will see a link allowing you to force kill the job.

          Started At: 05-06-2016 20:00:08
          Ended At: 05-06-2016 20:02:19
          Build Lasted: 2 minutes 10 seconds
          Highest Error Code: 0
          <hang here>
          Aborted by admin
          Sending interrupt signal to process
          Click here to forcibly terminate running steps
          Terminating bat
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] End of Pipeline
          Finished: ABORTED

          Unfortunately I'm seeing an almost 100% chance of hangs on my machine so still pretty useless.

          -------------------------

          I acquired Karing's code and tested it. it terminates my bat build steps after about 3-5 seconds no matter what the state is. I didnt dig too much into that.

          I reverted to baseline and tried my suggestions. None of them work. In fact, i can't prove that the script call from jenkins-wrap.bat ever returns...

          here is my current attempt at jenkins-wrap.bat:
          cmd /c ""...\jenkins-main.bat"" > "...\jenkins-log.txt" 2>&1
          :retry
          echo writing jenkins.results >> "...\jenkins-log.txt"
          echo %ERRORLEVEL% > "...\jenkins-result.txt"
          if not exist "...\jenkins-result.txt" goto retry

          It is supposed to jackhammer that results file until it is created. I see no "writing jenkins.results" in the logs, therefore the wrapper script is terminating early. The same thing happens if i replace cmd /c with call or start.
          (note: "..." is a placeholder for my real paths not some kind of relative path thing. sorry for confusion.)

          maaltan natlaam added a comment - First off, i found a better workaround. First go to the console screen for the job. click the abort button in upper right area. scroll to bottom and wait about 10 seconds. You will see a link allowing you to force kill the job. Started At: 05-06-2016 20:00:08 Ended At: 05-06-2016 20:02:19 Build Lasted: 2 minutes 10 seconds Highest Error Code: 0 <hang here> Aborted by admin Sending interrupt signal to process Click here to forcibly terminate running steps Terminating bat [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Finished: ABORTED Unfortunately I'm seeing an almost 100% chance of hangs on my machine so still pretty useless. ------------------------- I acquired Karing's code and tested it. it terminates my bat build steps after about 3-5 seconds no matter what the state is. I didnt dig too much into that. I reverted to baseline and tried my suggestions. None of them work. In fact, i can't prove that the script call from jenkins-wrap.bat ever returns... here is my current attempt at jenkins-wrap.bat: cmd /c ""...\jenkins-main.bat"" > "...\jenkins-log.txt" 2>&1 :retry echo writing jenkins.results >> "...\jenkins-log.txt" echo %ERRORLEVEL% > "...\jenkins-result.txt" if not exist "...\jenkins-result.txt" goto retry It is supposed to jackhammer that results file until it is created. I see no "writing jenkins.results" in the logs, therefore the wrapper script is terminating early. The same thing happens if i replace cmd /c with call or start. (note: "..." is a placeholder for my real paths not some kind of relative path thing. sorry for confusion.)

          James Nord added a comment -

          I believe I have a 100% reproducible test-case for this issue

          It seems that in the case that the parent process has been killed (e.g. the slave dies) then all though the script terminates successfully and the wrapper terminates successfully (checked with process monitor) there is no attempt to create the result file.
          where the parent process has not been killed I never see this issue.

          All I needed to do to fix the issue I was observing is add @echo off as the first line of the wrapper script. Basically I believe it is trying to echo the commands to be run before running the commands and as there is no longer anything consuming the wrappers input/output when echoing the command it is doomed to fail (but strangely not with an exit code that implies something died!??!

          James Nord added a comment - I believe I have a 100% reproducible test-case for this issue It seems that in the case that the parent process has been killed (e.g. the slave dies) then all though the script terminates successfully and the wrapper terminates successfully (checked with process monitor) there is no attempt to create the result file. where the parent process has not been killed I never see this issue. All I needed to do to fix the issue I was observing is add @echo off as the first line of the wrapper script. Basically I believe it is trying to echo the commands to be run before running the commands and as there is no longer anything consuming the wrappers input/output when echoing the command it is doomed to fail (but strangely not with an exit code that implies something died!??!

          I'm also having a build hang with simple pipeline that uses bat script to run Gradle tasks:

          node {
              timeout(time: 10, unit: 'MINUTES') {
                  timestamps {
                      stage 'Checkout'
                      git ...
                      
                      stage 'Tests'
                      bat 'gradlew test'
                      step([$class: 'JUnitResultArchiver', testResults: 'build/test-results/*.xml'])
                  }
              }
          }
          

          Dmitry Vyazelenko added a comment - I'm also having a build hang with simple pipeline that uses bat script to run Gradle tasks: node { timeout(time: 10, unit: 'MINUTES' ) { timestamps { stage 'Checkout' git ... stage 'Tests' bat 'gradlew test' step([$class: 'JUnitResultArchiver' , testResults: 'build/test-results/*.xml' ]) } } }

          James Nord added a comment - - edited

          For anyone observing the issue you can try installing the build from PR24 or PR21 and see if this resolves your issue. (I would start with PR24 first as it is a much smaller change, but then I am biased!)

          James Nord added a comment - - edited For anyone observing the issue you can try installing the build from PR24 or PR21 and see if this resolves your issue. (I would start with PR24 first as it is a much smaller change, but then I am biased!)

          PR24 fixes my particular issue.

          Christophe Carpentier added a comment - PR24 fixes my particular issue.

          Code changed in jenkins
          User: James Nord
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/WindowsBatchScript.java
          http://jenkins-ci.org/commit/durable-task-plugin/d156ebfbcdb70666757ff48127d0597bd5891a61
          Log:
          JENKINS-34150 Fixes my observed issue.

          I have a reproducable tests case in a propratary implementation using this
          code that is 100% reproducable.
          The simple "@echo off" fixes the failing test for me.

          It seems that in the case that the parent process has been killed (e.g.
          the slave dies) then all though the script terminates successfully and the
          wrapper terminates successfully (checked with process monitor) there is no
          attempt to create the result file.
          where the parent process has not been killed I never see this issue.
          All I needed to do to fix the issue I was observing is add @echo off as
          the first line of the wrapper script. Basically I believe it is trying to
          echo the commands to be run before running the commands and as there is no
          longer anything consuming the wrappers input/output when echoing the
          command it is doomed to fail (but strangely not with an exit code that
          implies something died!??!

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: James Nord Path: src/main/java/org/jenkinsci/plugins/durabletask/WindowsBatchScript.java http://jenkins-ci.org/commit/durable-task-plugin/d156ebfbcdb70666757ff48127d0597bd5891a61 Log: JENKINS-34150 Fixes my observed issue. I have a reproducable tests case in a propratary implementation using this code that is 100% reproducable. The simple "@echo off" fixes the failing test for me. It seems that in the case that the parent process has been killed (e.g. the slave dies) then all though the script terminates successfully and the wrapper terminates successfully (checked with process monitor) there is no attempt to create the result file. where the parent process has not been killed I never see this issue. All I needed to do to fix the issue I was observing is add @echo off as the first line of the wrapper script. Basically I believe it is trying to echo the commands to be run before running the commands and as there is no longer anything consuming the wrappers input/output when echoing the command it is doomed to fail (but strangely not with an exit code that implies something died!??!

          Code changed in jenkins
          User: Jesse Glick
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/WindowsBatchScript.java
          http://jenkins-ci.org/commit/durable-task-plugin/8a2537cf28c826ad91d4ce14cd657712364c8953
          Log:
          Merge pull request #24 from jtnord/jenkins-34150

          [FIXED JENKINS-34150] Fixes my observed issue.

          Compare: https://github.com/jenkinsci/durable-task-plugin/compare/0f09bb54a1b7...8a2537cf28c8

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: src/main/java/org/jenkinsci/plugins/durabletask/WindowsBatchScript.java http://jenkins-ci.org/commit/durable-task-plugin/8a2537cf28c826ad91d4ce14cd657712364c8953 Log: Merge pull request #24 from jtnord/jenkins-34150 [FIXED JENKINS-34150] Fixes my observed issue. Compare: https://github.com/jenkinsci/durable-task-plugin/compare/0f09bb54a1b7...8a2537cf28c8

          Sam Van Oort added a comment -

          Most important single-line-of-code change I've seen recently.

          Sam Van Oort added a comment - Most important single-line-of-code change I've seen recently.

          Junichi Kimura added a comment - - edited

          The test job below used to consistently get stuck at the 8mins stage. After upgrading the Durable Task Plugin to 1.10, the job passed successfully (once so far).

          node {
             stage '1min'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,60) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
             stage '5mins'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,300) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
             stage '8mins'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,480) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
             stage '10mins'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,600) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
             stage '30mins'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,1800) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
             stage '1hr'
             bat '''
          @ECHO OFF
          FOR /L %%A IN (0,1,3600) DO (
            ECHO %%A
            PING 192.0.2.1 -n 1 -w 1000 >NUL
          )
          EXIT /B 0
          '''
          }
          

          Junichi Kimura added a comment - - edited The test job below used to consistently get stuck at the 8mins stage. After upgrading the Durable Task Plugin to 1.10, the job passed successfully (once so far). node { stage '1min' bat ''' @ECHO OFF FOR /L %%A IN (0,1,60) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' stage '5mins' bat ''' @ECHO OFF FOR /L %%A IN (0,1,300) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' stage '8mins' bat ''' @ECHO OFF FOR /L %%A IN (0,1,480) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' stage '10mins' bat ''' @ECHO OFF FOR /L %%A IN (0,1,600) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' stage '30mins' bat ''' @ECHO OFF FOR /L %%A IN (0,1,1800) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' stage '1hr' bat ''' @ECHO OFF FOR /L %%A IN (0,1,3600) DO ( ECHO %%A PING 192.0.2.1 -n 1 -w 1000 >NUL ) EXIT /B 0 ''' }

          Gijs Kuijer added a comment -

          Great job! Totally solves my issues!

          Gijs Kuijer added a comment - Great job! Totally solves my issues!

          Initial testing here show issue is resolved
          Just getting back from vacation so nice to see this resolved. Thanks all for the good work.

          Daniel Daugherty added a comment - Initial testing here show issue is resolved Just getting back from vacation so nice to see this resolved. Thanks all for the good work.

          Marc Rufer added a comment -

          Thanks guys. Great job. Updating to the newest version of the durable-task-plugin solved the issue for me as well!

          Marc Rufer added a comment - Thanks guys. Great job. Updating to the newest version of the durable-task-plugin solved the issue for me as well!

          Sven Brosi added a comment -

          Hello,

          since the version 1.20 of the Durable Task plugin we encounter the same behavior described above. 

          Our worker nodes have Win2012 and Win2018 environment.

          A quick solution was to downgrade the version of this plugin to 1.18.

          Then everything works again in the DSL Jenkinsfiles with bat(ch) step.

           

           

           

           

          Sven Brosi added a comment - Hello, since the version 1.20 of the Durable Task plugin we encounter the same behavior described above.  Our worker nodes have Win2012 and Win2018 environment. A quick solution was to downgrade the version of this plugin to 1.18. Then everything works again in the DSL Jenkinsfiles with bat(ch) step.        

          Steven Foster added a comment -

          I'm encountering the same issue after updating to 1.20 from 1.18

          Steven Foster added a comment - I'm encountering the same issue after updating to 1.20 from 1.18

          Sam Van Oort added a comment -

          stevenfoster Does it work if you set 'returnStdOut: true'? If so, I have an attached hotfix for you to try – please let us know if this resolves it. durable-task.hpi

          Sam Van Oort added a comment - stevenfoster Does it work if you set 'returnStdOut: true'? If so, I have an attached hotfix for you to try – please let us know if this resolves it. durable-task.hpi

          Sam Van Oort added a comment -

          lidl Does it work if you set 'returnStdOut: true', and if so, please try the attached hotfix and let us know if this resolves it.

          Sam Van Oort added a comment - lidl Does it work if you set 'returnStdOut: true', and if so, please try the attached hotfix and let us know if this resolves it.

          Steven Foster added a comment -

          returnStdout: true has the same result. confirmed the process is finished on the machine.

          Steven Foster added a comment - returnStdout: true has the same result. confirmed the process is finished on the machine.

          Sam Van Oort added a comment -

          stevenfoster In the control directory in the build agent's workspace for this job (where there's a jenkins-main.bat and jenkins-wrap.bat) do you see a jenkins-result.txt file, a jenkins-result.txt.tmp, or both?

          I'm trying to figure out what actually triggered this because it did not fail any of our unit tests that should have explicitly covered this functionality.

          Sam Van Oort added a comment - stevenfoster In the control directory in the build agent's workspace for this job (where there's a jenkins-main.bat and jenkins-wrap.bat) do you see a jenkins-result.txt file, a jenkins-result.txt.tmp, or both? I'm trying to figure out what actually triggered this because it did not fail any of our unit tests that should have explicitly covered this functionality.

          Steven Foster added a comment -

          just the .tmp

          Steven Foster added a comment - just the .tmp

          Sam Van Oort added a comment -

          stevenfoster Okay, that means the "move" operation failed to rename the file, which should basically never happen. Would you be able to hop on #jenkins IRC briefly to discuss (I'm svanoort there)? It should be a quick and trivial fix but since we can't reproduce the issue in our own environment, it would be super-helpful to be able to see a case where it happened.

          Sam Van Oort added a comment - stevenfoster Okay, that means the "move" operation failed to rename the file, which should basically never happen. Would you be able to hop on #jenkins IRC briefly to discuss (I'm svanoort there)? It should be a quick and trivial fix but since we can't reproduce the issue in our own environment, it would be super-helpful to be able to see a case where it happened.

          Sven Brosi added a comment - - edited

          svanoort: Can not test it again, as it is a corporate jenkins and i can not  up- and downgrade the plugins on the fly.

          How can i still help you?

          Sven Brosi added a comment - - edited svanoort : Can not test it again, as it is a corporate jenkins and i can not  up- and downgrade the plugins on the fly. How can i still help you?

          Sam Van Oort added a comment -

          lidl Don't worry about it, stevenfoster was thankfully available to help debug in an environment where this is reproducible (thanks!).

          I'm attaching one more hotfix version for him to try out which should fully resolve the issue. durable-task.hpi

          Sam Van Oort added a comment - lidl Don't worry about it, stevenfoster was thankfully available to help debug in an environment where this is reproducible (thanks!). I'm attaching one more hotfix version for him to try out which should fully resolve the issue. durable-task.hpi

          Sam Van Oort added a comment -

          This issue duplicates symptoms of JENKINS-50025 but the root causes are significantly different, so that is being tracked separately.

          Sam Van Oort added a comment - This issue duplicates symptoms of JENKINS-50025 but the root causes are significantly different, so that is being tracked separately.

          Sam Van Oort added a comment -

          lidl stevenfoster I'm closing THIS issue because while the result is the same this has a different cause and is resolved in JENKINS-50025.

          Sam Van Oort added a comment - lidl stevenfoster I'm closing THIS issue because while the result is the same this has a different cause and is resolved in JENKINS-50025 .

          Sam Van Oort added a comment -

          lidl stevenfoster Released fix after review and testing as durable-task-plugin 1.21

          Sam Van Oort added a comment - lidl stevenfoster Released fix after review and testing as durable-task-plugin 1.21

            amuniz Antonio Muñiz
            dpd_30 Daniel Daugherty
            Votes:
            18 Vote for this issue
            Watchers:
            34 Start watching this issue

              Created:
              Updated:
              Resolved: