Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52881

durable-task plugin v1.23 kills jobs on Cygwin/MSys agents

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: durable-task-plugin
    • Labels:
    • Environment:
      Jenkins master 2.135 on CentOS 6; slave agents on Windows 7/64 and 10/64bit; durable-task-plugin 1.23
    • Similar Issues:
    • Released As:
      durable-task 1.25

      Description

      Upgrading the durable-task plugin from 1.22 to 1.23 made most of our build jobs fail with

      ps: unknown option -- o
      Try `ps --help' for more information.
      
      wrapper script does not seem to be touching the log file in ...
      (JENKINS-48300: if on a laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=300)
      

      Reason for this is probably that "ps -o pid" doesn't work in either MSys or Cygwin.

      Current workaround is to downgrade to 1.22.

        Attachments

          Issue Links

            Activity

            Hide
            john_fo John Forrest added a comment -

            I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting:

            -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600

            in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to.

            The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.

            Show
            john_fo John Forrest added a comment - I'm seeing this too. Jenkins master is on centos7. Slave is running on a Windows Server 2016 but (among other things) I'm using the docker plugin which in turn using git shell. I do have downgraded the durable task plugin - I tried the suggestion of setting: -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=600 in /etc/sysconfig/jenkins, and it made no difference. My guess is that despite the message it is the error on the ps that is being reacted to. The alternative workaround for me would be to stop using the docker plugin. Originally, before I worked out how to use "sh" from Jenkins on a PC (ensure git is installed to C:\git and add C:\git\usr\bin to the windows path), I did have some code that tried to build on windows via bat - basically doing it all myself. I'd prefer not to have to - problem cases to handle etc etc. If this does not get fixed I probably will.
            Hide
            anen Andreas Engel added a comment -

            As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround.

            Git on Windows is based on MSys btw. We use that too.

            Show
            anen Andreas Engel added a comment - As far as I understand it, for shell scripts, the durable task plugin starts a background process which regularly touches the mentioned log file. The failing "ps" command prevents this, so after HEARTBEAT_CHECK_INTERVAL seconds, the job gets killed. I initially tried to set it to 10 hours, i.e. 36000, which was sufficient for our current jobs, but seems to be a very crude workaround. Git on Windows is based on MSys btw. We use that too.
            Hide
            jglick Jesse Glick added a comment -

            I think I managed to reproduce a similar problem in openjdk:10-jre-slim.

            Show
            jglick Jesse Glick added a comment - I think I managed to reproduce a similar problem in openjdk:10-jre-slim .
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Should be fixed in durable-task 1.25.

            Show
            dnusbaum Devin Nusbaum added a comment - Should be fixed in durable-task 1.25.
            Hide
            mbld Martin Bornhold added a comment -

            WOW that was a really fast fix, many thanks for your work on this! I will try to update our Jenkins in a while to check it out  Will add another comment after testing it.

            Show
            mbld Martin Bornhold added a comment - WOW that was a really fast fix, many thanks for your work on this! I will try to update our Jenkins in a while to check it out  Will add another comment after testing it.
            Hide
            mbld Martin Bornhold added a comment -

            Everything works fine after upgrading durable-task plugin to 1.25. Thanks

            Show
            mbld Martin Bornhold added a comment - Everything works fine after upgrading durable-task plugin to 1.25. Thanks
            Hide
            jglick Jesse Glick added a comment -

            Good to know. If nothing else, between this and JENKINS-52847 we have expanded our regression tests to include a couple of widely used Linux lightweight container environments. Since automated test coverage in CI for sh relies on an Ubuntu Docker host, there is still zero coverage for non-Linux platforms (notably Mac OS X, FreeBSD, OpenSolaris offshoots, or the multitude of POSIXish subsystems for Windows), so support for sh on non-Linux agents is best-effort only and any particular software version may or may not work in a given environment. Users of Windows agents are advised to use bat or powershell.

            Show
            jglick Jesse Glick added a comment - Good to know. If nothing else, between this and JENKINS-52847 we have expanded our regression tests to include a couple of widely used Linux lightweight container environments. Since automated test coverage in CI for sh relies on an Ubuntu Docker host, there is still zero coverage for non-Linux platforms (notably Mac OS X, FreeBSD, OpenSolaris offshoots, or the multitude of POSIXish subsystems for Windows), so support for sh on non-Linux agents is best-effort only and any particular software version may or may not work in a given environment. Users of Windows agents are advised to use bat or powershell .
            Hide
            danielbeck Daniel Beck added a comment -

            Jesse Glick Is this the same issue as JENKINS-50379?

            Show
            danielbeck Daniel Beck added a comment - Jesse Glick Is this the same issue as JENKINS-50379 ?
            Hide
            jglick Jesse Glick added a comment -

            Unlikely.

            Show
            jglick Jesse Glick added a comment - Unlikely.

              People

              Assignee:
              jglick Jesse Glick
              Reporter:
              anen Andreas Engel
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: