In our instances pipeline jobs stuck very often after restart. It seems like pipeline detects that it needs to continue and tries to start execution where was interrupted but nothings is executed, it only looks like it is executed from UI - computer is occupied by job, build is in running state, but it seems that it is in endless waiting cycle.

      The interesting is that this state is probably saved persistently because I do not catch it every time, but when it is caught, I can restart jenkins as many times as I want and it will never  recover successfully (always stuck).

      I believe that problem is somewhere in program.dat, but it is not easily readable as xml for human so I am not sure where the difference is. 

      I did the same with the previous 2 runs and they were able to recover, but the third one did not.

      A add the screen of the successful recovers (or unsuccessful - one ended by failure but it did not stuck, which is success in this case) and the screen of build with issue. Since it seems to be persistent (as I described) I archived jenkins home and add it as well and the jenkins war as well. Archived jenkins home contains the versions of pipeline as well.

       

      Thank you for help!

       

          [JENKINS-50892] Pipeline jobs stuck after restart

          Jesse Glick added a comment -

          Anyone consistently seeing this problem, please try installing this build (using Plugin Manager » Advanced) and let me know if it helps.

          Jesse Glick added a comment - Anyone consistently seeing this problem, please try installing this build (using Plugin Manager » Advanced ) and let me know if it helps.

          Thanks for suggestion jglick .

          I upgraded the durable-tasks-plugin on saturday night, and unfortunately we still running into neverending jobs.

           

          These 4 jobs will never finish, and cannot be aborted:

           

          Our only way to clear them is to restart that (dockerized) Jenkins instance. We are also running into startup performance issues (that we still have to diagnose) that makes each Jenkins restart take up to 30 minutes...

           

           

          André Leruitte added a comment - Thanks for suggestion  jglick  . I upgraded the durable-tasks-plugin on saturday night, and unfortunately we still running into neverending jobs.   These 4 jobs will never finish, and cannot be aborted:   Our only way to clear them is to restart that (dockerized) Jenkins instance. We are also running into startup performance issues (that we still have to diagnose) that makes each Jenkins restart take up to 30 minutes...    

          Jesse Glick added a comment -

          andreler your issue may or may not have anything to do with the issue reported here. There are dozens of reasons why a build might hang. It is necessary to perform detailed diagnostics to confirm a particular problem.

          Jesse Glick added a comment - andreler your issue may or may not have anything to do with the issue reported here. There are dozens of reasons why a build might hang. It is necessary to perform detailed diagnostics to confirm a particular problem.

          Jesse Glick added a comment -

          Patched merged but as yet unreleased.

          Jesse Glick added a comment - Patched merged but as yet unreleased.

          Sam Van Oort added a comment -

          Released as durable-task 2.23

          Sam Van Oort added a comment - Released as durable-task 2.23

          Thank you Jesse. I did a quick test and it seems that it solved our problem (maybe there can be another issues, but not this one).

          Lucie Votypkova added a comment - Thank you Jesse. I did a quick test and it seems that it solved our problem (maybe there can be another issues, but not this one).

          Robin Roth added a comment -

           The change broke the usage of the official jnlp-slave image based on alpine, since this images uses busybox for ps, which does not support -p.

          The image was fixed now to include a working ps version: https://github.com/jenkinsci/docker-jnlp-slave/issues/65

          Robin Roth added a comment -  The change broke the usage of the official jnlp-slave image based on alpine, since this images uses busybox for ps, which does not support -p . The image was fixed now to include a working ps version: https://github.com/jenkinsci/docker-jnlp-slave/issues/65

          Damien Merlin added a comment - - edited

          Hi, I noticed an issue with this fix, it generates errors message on my windows nodes ( where I use cygwin ) :

          [e:\jenkins\workspace\test-node-windows-ps] Running shell script
          ps: unknown option – o
          Try `ps --help' for more information.

          So my troubles are with code :

          cmd = String.format("pid=$$;{{
          Unknown macro: { while ps -o pid -p $pid | grep -q $pid && [ -d '%s' -a ! -f '%s' ]; do touch '%s'; sleep 3; done }
          }}& jsc=%s; %s=$jsc '%s' > '%s' 2> '%s'; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait",

          On cygwin I would expect : ps -p $pid

          {{$ ps -p 12588}}
            PID PPID PGID WINPID TTY UID STIME COMMAND
            12588 10484 12588 1836 pty1 1053975 09:58:55 /usr/bin/bash

          Because following code is not working

          $ ps -o pid -p 12588
          ps: unknown option – o
          Try `ps --help' for more information.

           

          This code seems fine for debian jessie for example :

          $ ps -o pid -p 102008
          {{ PID}}
          102008

          So that's explain my error message and so the fix is not working for cygwin configuration.

          My concerns were addressed by https://issues.jenkins-ci.org/browse/JENKINS-52881

           

          Damien Merlin added a comment - - edited Hi, I noticed an issue with this fix, it generates errors message on my windows nodes ( where I use cygwin ) : [e:\jenkins\workspace\test-node-windows-ps] Running shell script ps: unknown option – o Try `ps --help' for more information. So my troubles are with code : cmd = String.format("pid=$$; {{ Unknown macro: { while ps -o pid -p $pid | grep -q $pid && [ -d '%s' -a ! -f '%s' ]; do touch '%s'; sleep 3; done } }} & jsc=%s; %s=$jsc '%s' > '%s' 2> '%s'; echo $? > '%s.tmp'; mv '%s.tmp' '%s'; wait", On cygwin I would expect : ps -p $pid {{ $ ps -p 12588 }}   PID PPID PGID WINPID TTY UID STIME COMMAND   12588 10484 12588 1836 pty1 1053975 09:58:55 /usr/bin/bash Because following code is not working $ ps -o pid -p 12588 ps: unknown option – o Try `ps --help' for more information.   This code seems fine for debian jessie for example : $ ps -o pid -p 102008 {{ PID}} 102008 So that's explain my error message and so the fix is not working for cygwin configuration. My concerns were addressed by  https://issues.jenkins-ci.org/browse/JENKINS-52881  

          svanoort Durable task is currently on 1.26 does 2.23 imply the jenkins version itself?

          Marley Kudiabor added a comment - svanoort Durable task is currently on 1.26 does 2.23 imply the jenkins version itself?

          Jesse Glick added a comment - - edited

          No, the durable-task plugin. For most Pipeline issues, the version of Jenkins core is irrelevant. I presume svanoort meant something else, since there is no such version of this plugin.

          Jesse Glick added a comment - - edited No, the durable-task plugin. For most Pipeline issues, the version of Jenkins core is irrelevant. I presume svanoort meant something else, since there is no such version of this plugin.

            jglick Jesse Glick
            lvotypkova Lucie Votypkova
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: