Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52881

durable-task plugin v1.23 kills jobs on Cygwin/MSys agents

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • durable-task-plugin
    • Jenkins master 2.135 on CentOS 6; slave agents on Windows 7/64 and 10/64bit; durable-task-plugin 1.23
    • durable-task 1.25

      Upgrading the durable-task plugin from 1.22 to 1.23 made most of our build jobs fail with

      ps: unknown option -- o
      Try `ps --help' for more information.
      
      wrapper script does not seem to be touching the log file in ...
      (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
      

      Reason for this is probably that "ps -o pid" doesn't work in either MSys or Cygwin.

      Current workaround is to downgrade to 1.22.

          [JENKINS-52881] durable-task plugin v1.23 kills jobs on Cygwin/MSys agents

          Andreas Engel created issue -

          John Forrest added a comment -

          I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting:

          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600

          in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to.

          The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.

          John Forrest added a comment - I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting: -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600 in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to. The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.

          Andreas Engel added a comment -

          As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround.

          Git on Windows is based on MSys btw. We use that too.

          Andreas Engel added a comment - As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround. Git on Windows is based on MSys btw. We use that too.
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-52847 [ JENKINS-52847 ]
          Jesse Glick made changes -
          Assignee New: Jesse Glick [ jglick ]
          Jesse Glick made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          Jesse Glick added a comment -

          I think I managed to reproduce a similar problem in openjdk:10-jre-slim.

          Jesse Glick added a comment - I think I managed to reproduce a similar problem in openjdk:10-jre-slim .
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-50892 [ JENKINS-50892 ]
          Jesse Glick made changes -
          Remote Link New: This issue links to "durable-task PR 79 (Web Link)" [ 21280 ]
          Jesse Glick made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]

            jglick Jesse Glick
            anen Andreas Engel
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: