Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52881

durable-task plugin v1.23 kills jobs on Cygwin/MSys agents

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • durable-task-plugin
    • Jenkins master 2.135 on CentOS 6; slave agents on Windows 7/64 and 10/64bit; durable-task-plugin 1.23
    • durable-task 1.25

      Upgrading the durable-task plugin from 1.22 to 1.23 made most of our build jobs fail with

      ps: unknown option -- o
      Try `ps --help' for more information.
      
      wrapper script does not seem to be touching the log file in ...
      (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
      

      Reason for this is probably that "ps -o pid" doesn't work in either MSys or Cygwin.

      Current workaround is to downgrade to 1.22.

          [JENKINS-52881] durable-task plugin v1.23 kills jobs on Cygwin/MSys agents

          John Forrest added a comment -

          I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting:

          -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600

          in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to.

          The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.

          John Forrest added a comment - I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting: -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600 in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to. The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.

          Andreas Engel added a comment -

          As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround.

          Git on Windows is based on MSys btw. We use that too.

          Andreas Engel added a comment - As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround. Git on Windows is based on MSys btw. We use that too.

          Jesse Glick added a comment -

          I think I managed to reproduce a similar problem in openjdk:10-jre-slim.

          Jesse Glick added a comment - I think I managed to reproduce a similar problem in openjdk:10-jre-slim .

          Devin Nusbaum added a comment -

          Should be fixed in durable-task 1.25.

          Devin Nusbaum added a comment - Should be fixed in durable-task 1.25.

          WOW that was a really fast fix, many thanks for your work on this! I will try to update our Jenkins in a while to check it out  Will add another comment after testing it.

          Martin Bornhold added a comment - WOW that was a really fast fix, many thanks for your work on this! I will try to update our Jenkins in a while to check it out  Will add another comment after testing it.

          Everything works fine after upgrading durable-task plugin to 1.25. Thanks

          Martin Bornhold added a comment - Everything works fine after upgrading durable-task plugin to 1.25. Thanks

          Jesse Glick added a comment -

          Good to know. If nothing else, between this and JENKINS-52847 we have expanded our regression tests to include a couple of widely used Linux lightweight container environments. Since automated test coverage in CI for sh relies on an Ubuntu Docker host, there is still zero coverage for non-Linux platforms (notably Mac OS X, FreeBSD, OpenSolaris offshoots, or the multitude of POSIXish subsystems for Windows), so support for sh on non-Linux agents is best-effort only and any particular software version may or may not work in a given environment. Users of Windows agents are advised to use bat or powershell.

          Jesse Glick added a comment - Good to know. If nothing else, between this and JENKINS-52847 we have expanded our regression tests to include a couple of widely used Linux lightweight container environments. Since automated test coverage in CI for sh relies on an Ubuntu Docker host, there is still zero coverage for non-Linux platforms (notably Mac OS X, FreeBSD, OpenSolaris offshoots, or the multitude of POSIXish subsystems for Windows), so support for sh on non-Linux agents is best-effort only and any particular software version may or may not work in a given environment. Users of Windows agents are advised to use bat or powershell .

          Daniel Beck added a comment -

          jglick Is this the same issue as JENKINS-50379?

          Daniel Beck added a comment - jglick Is this the same issue as JENKINS-50379 ?

          Jesse Glick added a comment -

          Unlikely.

          Jesse Glick added a comment - Unlikely.

            jglick Jesse Glick
            anen Andreas Engel
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: