Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59907

sh steps stuck indefinitely on uncommon architectures (e.g. s390x)

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • durable-task-plugin
    • None
    • Jenkins ver. 2.201 (yum installed, master node only)
      durable-task plugin v1.31

      os.arch: s390x
      os.name: Linux (RedHat)
      os.version: 3.10.0-327.el7.s390x
    • 1.33

      After upgrading to v1.31, the first sh step in a pipeline gets stuck. After few minutes Console Output shows:

      [Pipeline] sh (Get email of the author of last commit)
      process apparently never started in /data/jenkins/workspace/TG2_PTG2_-_pipeline_build_master@tmp/durable-be2cf2a6
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      Cannot contact : java.io.FileNotFoundException: File '/data/jenkins/workspace/TG2_PTG2_-_pipeline_build_master@tmp/durable-be2cf2a6/output.txt' does not exist

       

      Eventually, I discovered that a new binary was added in the latest version of this plugin. The script compile-binaries.sh in GitHub suggests that the binary is only built for Linux and MacOS.

       

      Sure enough, when I try to execute the binary myself on an architecture other than amd64, I get:

      -bash: /data/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: cannot execute binary file

       

      Are other architectures or operating systems (Windows) not supported anymore?

          [JENKINS-59907] sh steps stuck indefinitely on uncommon architectures (e.g. s390x)

          Rahul Raj added a comment - - edited

          The issue is still there with v1.33

          logs:

          https://gist.github.com/rahul-raj/ddeaa1407827f191e0d1b94966a58a0b

          Version from my Jenkins plugin list:
          https://user-images.githubusercontent.com/517415/75545150-8a721200-5a4b-11ea-8b49-02d69669184f.png

           

          Please suggest on a fix for this. 

           

           

           

           

           

          Rahul Raj added a comment - - edited The issue is still there with v1.33 logs: https://gist.github.com/rahul-raj/ddeaa1407827f191e0d1b94966a58a0b Version from my Jenkins plugin list: https://user-images.githubusercontent.com/517415/75545150-8a721200-5a4b-11ea-8b49-02d69669184f.png   Please suggest on a fix for this.           

          Carroll Chiou added a comment -

          rahulraj90 it appears you are using x86, and not an "uncommon architecture" like this ticket is describing. I would probably advise setting LAUNCH_DIAGNOSTICS=true as suggested in the output log and that can tell us better. The default behavior for this plugin should be using the original script wrappers. If you can't ascertain what is going there, I would probably post this to jenkinsci-users mailing list while we're still investigating.

          Carroll Chiou added a comment - rahulraj90 it appears you are using x86, and not an "uncommon architecture" like this ticket is describing. I would probably advise setting LAUNCH_DIAGNOSTICS=true as suggested in the output log and that can tell us better. The default behavior for this plugin should be using the original script wrappers. If you can't ascertain what is going there, I would probably post this to jenkinsci-users mailing list while we're still investigating.

          Don L added a comment -

          I've found this also reproduces when using build agents in Kubernetes. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl, but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container.

          In the JNLP container:

          bash-4.4$ cd /home/jenkins/agent/caches
          bash-4.4$ ls -l
          total 0
          drwxr-xr-x    2 jenkins  jenkins          6 Mar  6 15:47 durable-task 

          In the kubectl container:

          I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l
          total 0
          drwxr-xr-x 2 1000 1000 6 Mar  6 15:47 durable-task
          
          I have no name!@<REDACTED>:/home/jenkins/agent/caches$ id
          uid=1001 gid=0(root) groups=0(root)

          Don L added a comment - I've found this also reproduces when using build agents in Kubernetes. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl , but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container. In the JNLP container: bash-4.4$ cd /home/jenkins/agent/caches bash-4.4$ ls -l total 0 drwxr-xr-x 2 jenkins jenkins 6 Mar 6 15:47 durable-task In the kubectl container: I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l total 0 drwxr-xr-x 2 1000 1000 6 Mar 6 15:47 durable-task I have no name!@<REDACTED>:/home/jenkins/agent/caches$ id uid=1001 gid=0(root) groups=0(root)

          Carroll Chiou added a comment -

          don_code can we move this over to JENKINS-59903? That would be the relevant ticket. Could you also confirm that you are running v1.33? and not v1.31-32

          Carroll Chiou added a comment - don_code can we move this over to JENKINS-59903 ? That would be the relevant ticket. Could you also confirm that you are running v1.33? and not v1.31-32

          Don L added a comment -

          Sure, I've cross-posted to that ticket.

          Don L added a comment - Sure, I've cross-posted to that ticket.

          I'm still running into this same issue, whilst trying to run a pyinstaller docker image.

          The Sh command gives me the following:

          ```

          ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements).
          Alternatively you can force image entrypoint to be disabled by adding option `--entrypoint=''`.
          [Pipeline]

          { [Pipeline] sh process apparently never started in /var/jenkins_home/workspace/simple-python-pyinstaller-app@tmp/durable-87eb5f90 (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }

          $ docker stop --time=1 0d2e194b04bb5a12016da7f4dd92019127837debf082d18ba9fdf4cbbf6abbd7
          $ docker rm -f 0d2e194b04bb5a12016da7f4dd92019127837debf082d18ba9fdf4cbbf6abbd7
          [Pipeline] // withDockerContainer
          [Pipeline] }
          [Pipeline] // withEnv
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] End of Pipeline
          ERROR: script returned exit code -2
          Finished: FAILURE

          ```

          Running latest jenkins/blueocean image and I'm currently using the following set up for my Jenkinsfile.

           

          ```

          pipeline {
          agent none
          options {| |skipStagesAfterUnstable()| |}
          stages {
          stage('Build') {
          agent {
          docker {| |image 'python:2-alpine'| |}
          }
          steps {| |sh 'python -m py_compile sources/add2vals.py sources/calc.py'| |}
          }
          stage('Test') {
          agent {
          docker {| |image 'qnib/pytest'| |}
          }
          steps {| |sh 'py.test --verbose --junit-xml test-reports/results.xml sources/test_calc.py'| |}
          post {
          always {| |junit 'test-reports/results.xml'| |}
          }
          }
          stage('Deliver') {
          agent {
          docker {| |image 'cdrx/pyinstaller-linux:python2'| |args 'docker run -v "/var/jenkins_home/workspace/simple-python-pyinstaller-app/sources:/src/" --name pyinstaller --entrypoint= cdrx/pyinstaller-linux:python2'| |}
          }
          steps {| |sh 'pyinstaller --onefile sources/add2vals.py'| |}
          post {
          success {| |archiveArtifacts 'dist/add2vals'| |}
          }
          }
          }

          }

          ```

           

          Matthew Pigram added a comment - I'm still running into this same issue, whilst trying to run a pyinstaller docker image. The Sh command gives me the following: ``` ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements). Alternatively you can force image entrypoint to be disabled by adding option `--entrypoint=''`. [Pipeline] { [Pipeline] sh process apparently never started in /var/jenkins_home/workspace/simple-python-pyinstaller-app@tmp/durable-87eb5f90 (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] } $ docker stop --time=1 0d2e194b04bb5a12016da7f4dd92019127837debf082d18ba9fdf4cbbf6abbd7 $ docker rm -f 0d2e194b04bb5a12016da7f4dd92019127837debf082d18ba9fdf4cbbf6abbd7 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // stage [Pipeline] End of Pipeline ERROR: script returned exit code -2 Finished: FAILURE ``` Running latest jenkins/blueocean image and I'm currently using the following set up for my Jenkinsfile.   ``` pipeline { agent none options {| |skipStagesAfterUnstable()| |} stages { stage('Build') { agent { docker {| |image 'python:2-alpine'| |} } steps {| |sh 'python -m py_compile sources/add2vals.py sources/calc.py'| |} } stage('Test') { agent { docker {| |image 'qnib/pytest'| |} } steps {| |sh 'py.test --verbose --junit-xml test-reports/results.xml sources/test_calc.py'| |} post { always {| |junit 'test-reports/results.xml'| |} } } stage('Deliver') { agent { docker {| |image 'cdrx/pyinstaller-linux:python2'| |args 'docker run -v "/var/jenkins_home/workspace/simple-python-pyinstaller-app/sources:/src/" --name pyinstaller --entrypoint= cdrx/pyinstaller-linux:python2'| |} } steps {| |sh 'pyinstaller --onefile sources/add2vals.py'| |} post { success {| |archiveArtifacts 'dist/add2vals'| |} } } } } ```  

          I managed to hit this using dir(). This is clearly a bug and causes people to write workarounds.

          At the second stage, using dir(), it gets stuck for around 6-7 minutes and eventually it fails with "process apparently never started in /opt@tmp/durable-5a20a76a".

          pipeline {
              agent {
                  docker {
                      label '********'
                      image '**********'
                      registryUrl '************'
                      registryCredentialsId '*******'
                      args '--user root:root'
                  }
              }
              stages {
                  stage('dir-testing') {
                      
                      stages {
                          stage('without dir') {
                              steps {
                                  sh 'cd /opt && ls -l'
                              }
                          }
                          stage('with dir') {
                              steps {
                                  dir('/opt') {
                                      sh 'ls -l'
                                  }
                              }
                          }
                      }
                      post {
                          always {
                              cleanWs()
                          }
                      }
                  }
              }
          }
          
          Started by user **********
          Running in Durability level: MAX_SURVIVABILITY
          [Pipeline] Start of Pipeline
          [Pipeline] node
          Running on ************ in /var/jenkins/workspace/test-cwd-bug
          [Pipeline] {
          [Pipeline] withEnv
          [Pipeline] {
          [Pipeline] withDockerRegistry
          Using the existing docker config file.Removing blacklisted property: auths$ docker login -u ******** -p ******** *********
          WARNING! Using --password via the CLI is insecure. Use --password-stdin.
          Login Succeeded
          [Pipeline] {
          [Pipeline] isUnix
          [Pipeline] sh
          + docker inspect -f . **********
          
          Error: No such object: ***********
          [Pipeline] isUnix
          [Pipeline] sh
          + docker inspect -f . ****************
          .
          [Pipeline] withDockerContainer
          ************* does not seem to be running inside a container
          $ docker run -t -d -u 0:0 --user root:root -w /var/jenkins/workspace/test-cwd-bug -v /var/jenkins/workspace/test-cwd-bug:/var/jenkins/workspace/test-cwd-bug:rw,z -v /var/jenkins/workspace/test-cwd-bug@tmp:/var/jenkins/workspace/test-cwd-bug@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** ********************* cat
          $ docker top 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16 -eo pid,comm
          [Pipeline] {
          [Pipeline] stage
          [Pipeline] { (dir-testing)
          [Pipeline] stage
          [Pipeline] { (without dir)
          [Pipeline] sh
          + cd /opt
          + ls -l
          total 8
          drwxr-xr-x 4 root root 4096 Jul 24 15:29 artifactory-scripts
          drwxr-xr-x 1  608  500 4096 Jul 24 15:39 cv25_linux_sdk_2.5
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] stage
          [Pipeline] { (with dir)
          [Pipeline] dir
          Running in /opt
          [Pipeline] {
          [Pipeline] sh
          process apparently never started in /opt@tmp/durable-5a20a76a
          (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
          [Pipeline] }
          [Pipeline] // dir
          [Pipeline] }
          [Pipeline] // stage
          Post stage
          [Pipeline] cleanWs
          [WS-CLEANUP] Deleting project workspace...
          [WS-CLEANUP] Deferred wipeout is used...
          [WS-CLEANUP] done
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          $ docker stop --time=1 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16
          $ docker rm -f 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16
          [Pipeline] // withDockerContainer
          [Pipeline] }
          [Pipeline] // withDockerRegistry
          [Pipeline] }
          [Pipeline] // withEnv
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] End of Pipeline
          ERROR: script returned exit code -2
          Finished: FAILURE
          

          Bogomil Vasilev added a comment - I managed to hit this using dir(). This is clearly a bug and causes people to write workarounds. At the second stage, using dir(), it gets stuck for around 6-7 minutes and eventually it fails with "process apparently never started in /opt@tmp/durable-5a20a76a". pipeline { agent { docker { label '********' image '**********' registryUrl '************' registryCredentialsId '*******' args '--user root:root' } } stages { stage( 'dir-testing' ) { stages { stage( 'without dir' ) { steps { sh 'cd /opt && ls -l' } } stage( 'with dir' ) { steps { dir( '/opt' ) { sh 'ls -l' } } } } post { always { cleanWs() } } } } } Started by user ********** Running in Durability level: MAX_SURVIVABILITY [Pipeline] Start of Pipeline [Pipeline] node Running on ************ in / var /jenkins/workspace/test-cwd-bug [Pipeline] { [Pipeline] withEnv [Pipeline] { [Pipeline] withDockerRegistry Using the existing docker config file.Removing blacklisted property: auths$ docker login -u ******** -p ******** ********* WARNING! Using --password via the CLI is insecure. Use --password-stdin. Login Succeeded [Pipeline] { [Pipeline] isUnix [Pipeline] sh + docker inspect -f . ********** Error: No such object: *********** [Pipeline] isUnix [Pipeline] sh + docker inspect -f . **************** . [Pipeline] withDockerContainer ************* does not seem to be running inside a container $ docker run -t -d -u 0:0 --user root:root -w / var /jenkins/workspace/test-cwd-bug -v / var /jenkins/workspace/test-cwd-bug:/ var /jenkins/workspace/test-cwd-bug:rw,z -v / var /jenkins/workspace/test-cwd-bug@tmp:/ var /jenkins/workspace/test-cwd-bug@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** ********************* cat $ docker top 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16 -eo pid,comm [Pipeline] { [Pipeline] stage [Pipeline] { (dir-testing) [Pipeline] stage [Pipeline] { (without dir) [Pipeline] sh + cd /opt + ls -l total 8 drwxr-xr-x 4 root root 4096 Jul 24 15:29 artifactory-scripts drwxr-xr-x 1 608 500 4096 Jul 24 15:39 cv25_linux_sdk_2.5 [Pipeline] } [Pipeline] // stage [Pipeline] stage [Pipeline] { (with dir) [Pipeline] dir Running in /opt [Pipeline] { [Pipeline] sh process apparently never started in /opt@tmp/durable-5a20a76a (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS= true might make the problem clearer) [Pipeline] } [Pipeline] // dir [Pipeline] } [Pipeline] // stage Post stage [Pipeline] cleanWs [WS-CLEANUP] Deleting project workspace... [WS-CLEANUP] Deferred wipeout is used... [WS-CLEANUP] done [Pipeline] } [Pipeline] // stage [Pipeline] } $ docker stop --time=1 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16 $ docker rm -f 969a08a99a24c314d5d80f2cbf77920db4e269524d1af6738c0ddc5417da3f16 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // withDockerRegistry [Pipeline] } [Pipeline] // withEnv [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: script returned exit code -2 Finished: FAILURE

          Carroll Chiou added a comment - - edited

          smirky can you tell me what architecture your agent is running on? The thing is, if you are using 1.33 or greater, you should be running the traditional shell-based durable-task where there would not be any issues in running on a non-windows/non-unix architecture.

          It appears you are using a Unix architecture so this actually might not be the right ticket you're looking for.

          This ticket was also incorrectly reopened by a user reporting an issue with an x86 architecture that was not related to this ticket. Closing this ticket again. Will reopen if we discover new issues relating to non-unix/non-windows architectures.

          Carroll Chiou added a comment - - edited smirky can you tell me what architecture your agent is running on? The thing is, if you are using 1.33 or greater, you should be running the traditional shell-based durable-task where there would not be any issues in running on a non-windows/non-unix architecture. It appears you are using a Unix architecture so this actually might not be the right ticket you're looking for. This ticket was also incorrectly reopened by a user reporting an issue with an x86 architecture that was not related to this ticket. Closing this ticket again. Will reopen if we discover new issues relating to non-unix/non-windows architectures.

          Carroll Chiou added a comment -

          The issue was incorrectly reopened in the past. Closing again until we encounter a bug related to non-windows/non-unix architecture

          Carroll Chiou added a comment - The issue was incorrectly reopened in the past. Closing again until we encounter a bug related to non-windows/non-unix architecture

          Bogomil Vasilev added a comment - - edited

          carroll - Indeed, it is x64 Ubuntu 18.04. Since you mentioned that this ticket is not suitable for my use-case, I opened a new one, describing the problem:
          https://issues.jenkins-ci.org/browse/JENKINS-63253

          Bogomil Vasilev added a comment - - edited carroll - Indeed, it is x64 Ubuntu 18.04. Since you mentioned that this ticket is not suitable for my use-case, I opened a new one, describing the problem: https://issues.jenkins-ci.org/browse/JENKINS-63253

            carroll Carroll Chiou
            jloucky Jakub L
            Votes:
            11 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: