Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38682

Pipeline plugin sh script returned exit code -1 on windows node after a short while

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      I've got many nodes running on different Linux distribution and also few windows nodes running with SSH slaves and Cygwin (https://wiki.jenkins-ci.org/display/JENKINS/SSH+slaves+and+Cygwin)

      Using the following pipeline script:

      _node('linuxHost') {
      stage('linuxHost linux')

      { sh '''sleep 30''' }

      }
      node('windowsHost') {
      stage('windowsHost cygwin')

      { bat 'sleep 30' sh '''sleep 30''' }

      }_

      On my windows node I always got an error "ERROR: script returned exit code -1" if my script execution time is greater than about 250ms no matter what the command content is (that's why I'm using a basic sleep for this example). I've got the exact same behavior for all of my windows nodes.

      Here are the execution logs:
      _Started by user Administrateur Jenkins
      [Pipeline] node
      Running on linuxHost in /var/jenkins_home/workspace/Unstable/TestShInPipeline
      [Pipeline] {
      [Pipeline] stage
      [Pipeline]

      { (linuxHost linux) [Pipeline] sh [TestShInPipeline] Running shell script + sleep 30 [Pipeline] }

      [Pipeline] // stage
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] node
      Running on windowsHost in C:/Users/localuser/jenkins_home/workspace/Unstable/TestShInPipeline
      [Pipeline] {
      [Pipeline] stage
      [Pipeline]

      { (windowsHost cygwin) [Pipeline] bat [C:/Users/localuser/jenkins_home/workspace/Unstable/TestShInPipeline] Running batch script C:\Users\localuser\jenkins_home\workspace\Unstable\TestShInPipeline>sleep 30 [Pipeline] sh [TestShInPipeline] Running shell script + sleep 30 [Pipeline] }

      [Pipeline] // stage
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE_

      May be related to JENKINS-32017 ?
      It works fine using the same node and the "Execute Shell" build step in a freestyle project.

        Attachments

          Issue Links

            Activity

            Hide
            julrich Jochen Ulrich added a comment -

            Ensure nohup is on the %PATH%. See JENKINS-33708

            Show
            julrich Jochen Ulrich added a comment - Ensure nohup is on the %PATH% . See JENKINS-33708
            Hide
            bgamari Ben Gamari added a comment - - edited

            I am also seeing this behavior on Windows 10 running a msys sshd. I have added C:\msys64\usr\bin to the Windows %PATH% environment variable and have confirmed that I can run nohup from a standard cmd.exe instance. I can also confirm that nohup is in %PATH% by running the following test Jenkinsfile,

            node(label: 'windows') {
              stage('Hello') {
                sh 'echo $PATH'
                sh "which nohup"
                sh "nohup echo Hello world"
                sh "sleep 5; echo Hello world"
                sh "echo Hello world"
              }
            }
            

            The echo $PATH step confirms that /usr/bin is in $PATH,

            [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script
            
            + echo /usr/bin:/usr/bin:/c/ProgramData/Oracle/Java/javapath:/c/Program Files '(x86)/I...
            

            which nohup again confirms that nohup is present,

            [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script
            
            + which nohup
            
            /usr/bin/nohup
            

            Moreover, nohup echo shows that nohup is indeed functional,

            [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script
            
            + nohup echo Hello world
            
            Hello world
            

            However, {{sleep}}ing in a shell script nevertheless fails,

            [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script
            
            + sleep 5
            
            script returned exit code -1
            
            Show
            bgamari Ben Gamari added a comment - - edited I am also seeing this behavior on Windows 10 running a msys sshd . I have added C:\msys64\usr\bin to the Windows %PATH% environment variable and have confirmed that I can run nohup from a standard cmd.exe instance. I can also confirm that nohup is in %PATH% by running the following test Jenkinsfile , node(label: 'windows') { stage('Hello') { sh 'echo $PATH' sh "which nohup" sh "nohup echo Hello world" sh "sleep 5; echo Hello world" sh "echo Hello world" } } The echo $PATH step confirms that /usr/bin is in $PATH , [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script + echo /usr/bin:/usr/bin:/c/ProgramData/Oracle/Java/javapath:/c/Program Files '(x86)/I... which nohup again confirms that nohup is present, [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script + which nohup /usr/bin/nohup Moreover, nohup echo shows that nohup is indeed functional, [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script + nohup echo Hello world Hello world However, {{sleep}}ing in a shell script nevertheless fails, [C:\msys64\home\ben\jenkins\workspace\gitlab-test_master-GOUXDG2ZFVA4TFFLTYLXH3EXZCZTQQEMXKUO2E4QRK5ESVKIB7UQ] Running shell script + sleep 5 script returned exit code -1
            Hide
            bgamari Ben Gamari added a comment -

            I've bumped the priority of this as it renders Jenkins' sh task completely unusable on Windows.

            Show
            bgamari Ben Gamari added a comment - I've bumped the priority of this as it renders Jenkins' sh task completely unusable on Windows.
            Hide
            bgamari Ben Gamari added a comment - - edited

            I believe I have worked out what is going on here. The issue was introduced by 3c10cc0cb0a4738646966494fbeaab54bca115e1, which switches from JNR to JNA for access to getpgid. Unfortunately, JNA isn't able to load libc under Windows (as it looks for libc instead of msvcrt; moreover, msys2 (the Unix-like environment I'm running under) only provides a static libc archive). This results in ProcessLiveness failing to load on the slave. This was quite difficult to track down due to the poor error messages provided by the slave (manifesting as a vague "remote call failed" IOException; it would make debugging much easier if remoting exceptions were consistently thrown as a distinct exception type).

            Anyways, reverting 3c10cc0cb0a4738646966494fbeaab54bca115e1 appears to fix the issue. Moreover, I believe this should be safe to do since JNR has apparently seen some BSD fixes in the past months. I have confirmed that with the reverted patch my simple sh testcase works on both Windows and FreeBSD slaves.

            Show
            bgamari Ben Gamari added a comment - - edited I believe I have worked out what is going on here. The issue was introduced by 3c10cc0cb0a4738646966494fbeaab54bca115e1, which switches from JNR to JNA for access to getpgid . Unfortunately, JNA isn't able to load libc under Windows (as it looks for libc instead of msvcrt ; moreover, msys2 (the Unix-like environment I'm running under) only provides a static libc archive). This results in ProcessLiveness failing to load on the slave. This was quite difficult to track down due to the poor error messages provided by the slave (manifesting as a vague "remote call failed" IOException ; it would make debugging much easier if remoting exceptions were consistently thrown as a distinct exception type). Anyways, reverting 3c10cc0cb0a4738646966494fbeaab54bca115e1 appears to fix the issue. Moreover, I believe this should be safe to do since JNR has apparently seen some BSD fixes in the past months. I have confirmed that with the reverted patch my simple sh testcase works on both Windows and FreeBSD slaves.
            Hide
            bgamari Ben Gamari added a comment - - edited

            See this PR for what I believe should be a rather robust solution to this given in-progress upstream support in JNR.

            Show
            bgamari Ben Gamari added a comment - - edited See this PR for what I believe should be a rather robust solution to this given in-progress upstream support in JNR.
            Hide
            jglick Jesse Glick added a comment -

            In JNR/JNA analysis is in fact correct, then JENKINS-47791 should have obsoleted this.

            Show
            jglick Jesse Glick added a comment - In JNR/JNA analysis is in fact correct, then  JENKINS-47791 should have obsoleted this.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              bdesvages Benoît Desvages
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: