Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59903

durable-task v1.31 breaks sh steps in pipeline when running in a Docker container

    XMLWordPrintable

Details

    • 1.33

    Description

      A pipeline like this:

      pipeline {
          agent {
              docker {
                  label 'docker'
                  image 'busybox'
              }
          }
          stages {
              stage("Test sh script in container") {
                  steps {
                    sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                  }
              }
          }
      }
      

      Fails with this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline (hide)
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      process apparently never started in /...
      (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      $ docker rm -f 645fd28fda5fa3c61a4b49e8a38e46e0eec331ddf6037d3f77821dd6984a185f
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Adding the -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true parameter gives this log:

      Running in Durability level: PERFORMANCE_OPTIMIZED
      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on docker-node in /...
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] sh
      + docker inspect -f . busybox
      .
      [Pipeline] withDockerContainer
      got-legaci-3 does not seem to be running inside a container
      $ docker run -t -d -u 1002:1002 -w <<hidden>> busybox cat
      $ docker top 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e -eo pid,comm
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test sh script in container)
      [Pipeline] sh (Echo "Hello World...)
      OCI runtime exec failed: exec failed: container_linux.go:346: starting container process caused "exec: \"/var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64\": stat /var/jenkins/caches/durable-task/durable_task_monitor_1.31_unix_64: no such file or directory": unknown
      process apparently never started in /...
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      $ docker stop --time=1 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      $ docker rm -f 31b7474756f8ff5b1f0d12d0df952347e584b47113108d1f965adeeb0ee78e5e
      [Pipeline] // withDockerContainer
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -2
      Finished: FAILURE
      

      Tested on three different Jenkins masters with similar, but no identical, configurations.

      Reverting to Durable Task Plugin v. 1.30 "solves" the problem.

      Attachments

        Issue Links

          Activity

            njesper Jesper Andersson added a comment - - edited

            A different workaround is adding args '-v /var/jenkins-legaci-lab/caches:/var/jenkins-legaci-lab/caches' to the docker{...} declaration in the pipeline.
            Like this:

            pipeline {
                agent {
                    docker {
                        label 'docker'
                        image 'busybox'
                        args '-v /var/jenkins/caches:/var/jenkins/caches'
                    }
                }
                stages {
                    stage("Test sh script in container") {
                        steps {
                          sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                        }
                    }
                }
            }
            
            • Perhaps this should be solved in some declarative pipeline component?
            njesper Jesper Andersson added a comment - - edited A different workaround is adding args '-v /var/jenkins-legaci-lab/caches:/var/jenkins-legaci-lab/caches' to the docker{... } declaration in the pipeline. Like this: pipeline { agent { docker { label 'docker' image 'busybox' args '-v /var/jenkins/caches:/var/jenkins/caches' } } stages { stage( "Test sh script in container" ) { steps { sh label: 'Echo "Hello World...' , script: 'echo " Hello World!"' } } } } Perhaps this should be solved in some declarative pipeline component?
            zerkms Ivan Kurnosov added a comment -

            I confirm it affects me as well:

            Jenkins ver. 2.190.1, docker, kubernetes and other plugins: latest stable version.

            Jenkins runs on linux (inside a docker container)

            zerkms Ivan Kurnosov added a comment - I confirm it affects me as well: Jenkins ver. 2.190.1, docker, kubernetes and other plugins: latest stable version. Jenkins runs on linux (inside a docker container)

            Also a problem when running Jenkins not in Docker and executing pipeline on Jenkins master (=> psst!); problem is definitely caused by new "Durable Task" plugin v1.31 bug:

            • Latest version of Jenkins core (v2.201) running on Ubuntu 16.04
            • The logs seem to indicate the problem already appears when trying to start the Docker container in the pipeline, but maybe the logs are just mangled?
              • See "// !!!" comment in belows build log...
            • Build log (proprietary):
              ...
              [Pipeline] // stage
              [Pipeline] stage
              [Pipeline] { (linkchecker)
              [Pipeline] script
              [Pipeline] {
              [Pipeline] withEnv
              [Pipeline] {
              [Pipeline] withDockerRegistry
              [Pipeline] {
              [Pipeline] isUnix
              [Pipeline] sh
              06:37:20  + docker inspect -f . ACME/linkchecker:5
              06:37:20  
              06:37:20  Error: No such object: ACME/linkchecker:5
              06:37:20  06:37:20.408218 durable_task_monitor.go:63: exit status 1              // !!!
              [Pipeline] isUnix
              [Pipeline] sh
              06:37:20  + docker inspect -f . dockerregistry.ACME.com/ACME/linkchecker:5
              06:37:20  .
              [Pipeline] withDockerContainer
              06:37:20  Jenkins does not seem to be running inside a container
              06:37:20  $ docker run -t -d -u 10112:10005 -w /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline -v /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:/var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:rw,z -v /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:/var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** dockerregistry.ACME.com/ACME/linkchecker:5 cat
              06:37:21  $ docker top 39f784ea27cbf6593fd40c1faaf04948daae94e97eb8ba42517f7c2f5e40c21e -eo pid,comm
              [Pipeline] {
              [Pipeline] sh
              06:42:27  process apparently never started in /var/lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp/durable-aed939a9      // !!!
              06:42:27  (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
              [Pipeline] }
              ...
              ERROR: script returned exit code -2
              Finished: FAILURE
              
            • Pipeline code (in shared Jenkins pipeline library):
              ...
                void execute(Closure configBody) {
                  LinkCheckerRunDSL config = calcConfiguration(configBody)
              
                  // Then build, based on the configuration provided:
                  script.docker.withRegistry(Constants.ACME_DOCKER_REGISTRY_URL) {
                    script.docker.image(config.dockerImage).inside() { c ->
                      script.sh 'linkchecker --version'
              ...
              
            reinholdfuereder Reinhold Füreder added a comment - Also a problem when running Jenkins not in Docker and executing pipeline on Jenkins master (=> psst!); problem is definitely caused by new "Durable Task" plugin v1.31 bug: Latest version of Jenkins core (v2.201) running on Ubuntu 16.04 The logs seem to indicate the problem already appears when trying to start the Docker container in the pipeline, but maybe the logs are just mangled? See " // !!! " comment in belows build log... Build log (proprietary): ... [Pipeline] // stage [Pipeline] stage [Pipeline] { (linkchecker) [Pipeline] script [Pipeline] { [Pipeline] withEnv [Pipeline] { [Pipeline] withDockerRegistry [Pipeline] { [Pipeline] isUnix [Pipeline] sh 06:37:20 + docker inspect -f . ACME/linkchecker:5 06:37:20 06:37:20 Error: No such object: ACME/linkchecker:5 06:37:20 06:37:20.408218 durable_task_monitor.go:63: exit status 1 // !!! [Pipeline] isUnix [Pipeline] sh 06:37:20 + docker inspect -f . dockerregistry.ACME.com/ACME/linkchecker:5 06:37:20 . [Pipeline] withDockerContainer 06:37:20 Jenkins does not seem to be running inside a container 06:37:20 $ docker run -t -d -u 10112:10005 -w / var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline -v / var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:/ var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline:rw,z -v / var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:/ var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** dockerregistry.ACME.com/ACME/linkchecker:5 cat 06:37:21 $ docker top 39f784ea27cbf6593fd40c1faaf04948daae94e97eb8ba42517f7c2f5e40c21e -eo pid,comm [Pipeline] { [Pipeline] sh 06:42:27 process apparently never started in / var /lib/jenkins/workspace/Sandbox/ACME.linkCheckerPipeline@tmp/durable-aed939a9 // !!! 06:42:27 (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS= true might make the problem clearer) [Pipeline] } ... ERROR: script returned exit code -2 Finished: FAILURE Pipeline code (in shared Jenkins pipeline library): ... void execute(Closure configBody) { LinkCheckerRunDSL config = calcConfiguration(configBody) // Then build, based on the configuration provided: script.docker.withRegistry(Constants.ACME_DOCKER_REGISTRY_URL) { script.docker.image(config.dockerImage).inside() { c -> script.sh 'linkchecker --version' ...

            Having the same issue with the 1.31 version of durable-task plugin. Had to rollback to 1.30.

            cosbug Constantin Bugneac added a comment - Having the same issue with the 1.31 version of durable-task plugin. Had to rollback to 1.30.
            gehrmanator Eric Gehrman added a comment -

            Also having the same issue with the 1.31 version of durable-task plugin. Also fixed by rolling back to 1.30

            gehrmanator Eric Gehrman added a comment - Also having the same issue with the 1.31 version of durable-task plugin. Also fixed by rolling back to 1.30

            Here we are also affected (rolled back to 1.30) for docker.inside steps. 

            The problems is that  the new wrapper binary is at a location that is (on purpose) not exposed inside the docker container.

            Only the workspace and auxilliary workspace (workspace@tmp ) are currently  mapped into the container by default.

            pedersen Björn Pedersen added a comment - Here we are also affected (rolled back to 1.30) for docker.inside steps.  The problems is that  the new wrapper binary is at a location that is (on purpose) not exposed inside the docker container. Only the workspace and auxilliary workspace (workspace@tmp ) are currently  mapped into the container by default.

            Faced the same issue on Jenkins 2.201. Downgrading to durable-task plugin v1.30 helped to resolve the issue

            nitrogear Oleksii Grinko added a comment - Faced the same issue on Jenkins 2.201. Downgrading to durable-task plugin v1.30 helped to resolve the issue
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            the same, we face it with docker.inside after upgrade to 2.201, the works thing it is that you do not see the error until you enable `org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS`

              

            pipeline {
                agent {label 'linux && immutable'}
                stages {
                    stage("Test sh script in container") {
                        steps {
                          script {
                            docker.image('node:12').inside(){
                              echo "Docker inside"
                              sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                            }
                          }
                        }
                    }
                }
            }
             

            I'd share the script to change the property from the Jenkins console

             

            import static org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS
            
            println("LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS)
            LAUNCH_DIAGNOSTICS = true
            println("LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS)
            
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited the same, we face it with docker.inside after upgrade to 2.201, the works thing it is that you do not see the error until you enable ` org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS `    pipeline { agent {label 'linux && immutable' } stages { stage( "Test sh script in container" ) { steps { script { docker.image( 'node:12' ).inside(){ echo "Docker inside" sh label: 'Echo "Hello World...' , script: 'echo " Hello World!"' } } } } } }   I'd share the script to change the property from the Jenkins console   import static org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS println( "LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS) LAUNCH_DIAGNOSTICS = true println( "LAUNCH_DIAGNOSTICS=" + LAUNCH_DIAGNOSTICS)
            chizcw Chisel Wright added a comment -

            We hit the same issue after upgrading this morning. Downgrading to 1.30 resolved the issue for us too.

            chizcw Chisel Wright added a comment - We hit the same issue after upgrading this morning. Downgrading to 1.30 resolved the issue for us too.
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            in our case, it seems related to docker.inside, if we run a similar docker command it works

            pipeline {
                agent {label 'linux && immutable'}
                stages {
                    stage("Test sh script in container") {
                        steps {
                            sh label: 'This works', script: """
                              docker run -t -v ${env.WORKSPACE}:${env.WORKSPACE} -u \$(id -u):\$(id -g) -w ${env.WORKSPACE} -e HOME=${env.WORKSPACE} node:12 echo 'Hello World!'
                            """
                            script {
                              docker.image('node:12').inside(){
                                echo "Docker inside"
                                sh label: 'Im gonna fail', script: 'echo "Hello World!"'
                              }
                            }
                        }
                    }
                }
            }
            
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited in our case, it seems related to docker.inside, if we run a similar docker command it works pipeline { agent {label 'linux && immutable' } stages { stage( "Test sh script in container" ) { steps { sh label: 'This works' , script: """ docker run -t -v ${env.WORKSPACE}:${env.WORKSPACE} -u \$(id -u):\$(id -g) -w ${env.WORKSPACE} -e HOME=${env.WORKSPACE} node:12 echo 'Hello World!' """ script { docker.image( 'node:12' ).inside(){ echo "Docker inside" sh label: 'Im gonna fail' , script: 'echo "Hello World!" ' } } } } } }
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            I found a workaround but it is horrible for some reason the durable task is looking for the Jenkins cache inside the docker container that obviously is not there so if you mount the cache folder you resolve the issue but this means I have to change every docker.inside, I think we can back to 1.29 before the latest changes on the way to manage sh step

            pipeline {
                agent {label 'linux && immutable'}
                stages {
                    stage("Test sh script in container") {
                        steps {
                            script {
                              docker.image('node:12').inside("-v /var/lib/jenkins/caches/durable-task:/var/lib/jenkins/caches/durable-task"){
                                echo "Docker inside"
                                sh label: 'Im gonna fail', script: 'echo "Hello World!"'
                              }
                            }
                        }
                    }
                }
            }
            
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited I found a workaround but it is horrible for some reason the durable task is looking for the Jenkins cache inside the docker container that obviously is not there so if you mount the cache folder you resolve the issue but this means I have to change every docker.inside, I think we can back to 1.29 before the latest changes on the way to manage sh step pipeline { agent {label 'linux && immutable' } stages { stage( "Test sh script in container" ) { steps { script { docker.image( 'node:12' ).inside( "-v / var /lib/jenkins/caches/durable-task:/ var /lib/jenkins/caches/durable-task" ){ echo "Docker inside" sh label: 'Im gonna fail' , script: 'echo "Hello World!" ' } } } } } }
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            I have found the cause the changes in this commit https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c it is related to a new binary launcher

            I can confirm it if you disable the new binary launcher it works, you can disable the property on runtime by executing this script in the Jenkins console

            import static org.jenkinsci.plugins.durabletask. BourneShellScript.FORCE_SHELL_WRAPPER
            
            println("FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER)
            FORCE_SHELL_WRAPPER = true
            println("FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER)
            
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited I have found the cause the changes in this commit https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c it is related to a new binary launcher I can confirm it if you disable the new binary launcher it works, you can disable the property on runtime by executing this script in the Jenkins console import static org.jenkinsci.plugins.durabletask. BourneShellScript.FORCE_SHELL_WRAPPER println( "FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER) FORCE_SHELL_WRAPPER = true println( "FORCE_SHELL_WRAPPER=" + FORCE_SHELL_WRAPPER)

            I can confirm downgrading "Durable Task Plugin" to v1.30 fixed the issue for us as well. We're running Jenkins 2.176.2.

            pzozobrado Philip Zozobrado added a comment - I can confirm downgrading "Durable Task Plugin" to v1.30 fixed the issue for us as well. We're running Jenkins 2.176.2.
            pzozobrado Philip Zozobrado added a comment - - edited

            We're running Alpine linux builds. I briefly saw an error with a failure about not being able to run `ps` – it could be related to this block: https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R525

             switch (platform) {
                        case SLIM:
                        // (See JENKINS-58656) Running in a container with no init process is guaranteed to leave a zombie. Just let this test pass.
                            // Debian slim does not have ps
                           // [...]
            }
            
               do {
                     // //[...]
               } while (psString.contains(exitString));
            
            pzozobrado Philip Zozobrado added a comment - - edited We're running Alpine linux builds. I briefly saw an error with a failure about not being able to run `ps` – it could be related to this block: https://github.com/jenkinsci/durable-task-plugin/commit/1f59c5229b9ff83709add3e202f8e49ff463106c#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R525 switch (platform) { case SLIM: // (See JENKINS-58656) Running in a container with no init process is guaranteed to leave a zombie. Just let this test pass. // Debian slim does not have ps // [...] } do { // //[...] } while (psString.contains(exitString));
            aaaustin10 Austin Stewart added a comment - - edited

            I feel that I should mention that I also have the issue with 1.31 (downgrade to 1.30 fixes it) without using Docker.

            Borrowing from a comment above:

            pipeline {
                agent {
                    label 'raspberry-build'
                }
                stages {
                    stage("Test sh script in container") {
                        steps {
                          sh label: 'Echo "Hello World...', script: 'echo "Hello World!"'
                        }
                    }
                }
            }
            

             

             

            aaaustin10 Austin Stewart added a comment - - edited I feel that I should mention that I also have the issue with 1.31 (downgrade to 1.30 fixes it) without using Docker. Borrowing from a comment above: pipeline { agent { label 'raspberry-build' } stages { stage( "Test sh script in container" ) { steps { sh label: 'Echo "Hello World...' , script: 'echo " Hello World!"' } } } }    
            coolson Corey Olson added a comment -

            I just ran into this issue today too.  Anyone know where I can get the .hpi file in order to downgrade to v 1.30?

            coolson Corey Olson added a comment - I just ran into this issue today too.  Anyone know where I can get the .hpi file in order to downgrade to v 1.30?
            chizcw Chisel Wright added a comment -

            Usually you can just downgrade from Manage Plugins > Installed.

            If that doesn't work for you, try this:

            http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/durable-task/1.30/durable-task-1.30.hpi

            Via: https://javalibs.com/artifact/org.jenkins-ci.plugins/durable-task

            chizcw Chisel Wright added a comment - Usually you can just downgrade from Manage Plugins > Installed. If that doesn't work for you, try this: http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/durable-task/1.30/durable-task-1.30.hpi Via: https://javalibs.com/artifact/org.jenkins-ci.plugins/durable-task

            aaaustin10 From the looks of the label in your example, you might be having the JENKINS-59907 problem, where the new wrapper doesn't work on all platforms.

            pzozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{...}} or docker.inside()} pipeline instructions.

            njesper Jesper Andersson added a comment - aaaustin10 From the looks of the label in your example, you might be having the JENKINS-59907 problem, where the new wrapper doesn't work on all platforms. pzozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{... }} or docker.inside() } pipeline instructions.
            coolson Corey Olson added a comment -

            chizcw this was a fresh install today, so I didn't have that downgrade option.  Thanks for the link; that worked.

            coolson Corey Olson added a comment - chizcw this was a fresh install today, so I didn't have that downgrade option.  Thanks for the link; that worked.

            Also confirming that 

            org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER = true

            is a working workaround, and that it can be set when Jenkins starts, just as LAUNCH_DIAGNOSTICS.

            It would be great if the durable-task plugin could detect that it's running inside a container started by Jenkins, and only disable the wrapper in those steps, if that is to be the solution. "durable_task_monitor_1.31_unix_64" probably contains something of value, so disabling it system-wide doesn't feel like a solution.

            njesper Jesper Andersson added a comment - Also confirming that  org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER = true is a working workaround, and that it can be set when Jenkins starts, just as LAUNCH_DIAGNOSTICS. It would be great if the durable-task plugin could detect that it's running inside a container started by Jenkins, and only disable the wrapper in those steps, if that is to be the solution. "durable_task_monitor_1.31_unix_64" probably contains something of value, so disabling it system-wide doesn't feel like a solution.
            albers Harald Albers added a comment -

            I also have a usecase where no docker.inside is involved.

            • Jenkins master is the official docker image jenkins/jenkins:2.190.1-alpine
            • Agent is based on the adoptopenjdk/openjdk11:x86_64-ubuntu-jdk-11.0.4_11 image and connects to the master via 
              swarm plugin
            • Master and agent running on Docker 19.03.4 in swarm mode. The hosts are Ubuntu 18.04 LTS on VMware.

            This pipeline code:

            node('jdk11') {
                stage('test') {
                    sh 'echo hi.'
                }
            }
            • works on both master and agent with durable-task-plugin 1.30.
            • works on the master with durable-task-plugin 1.31.
            • fails on the agent when durable-task-plugin 1.31 is installed:
            [Pipeline] Start of Pipeline
            [Pipeline] node
            Running on build_agent-java11-docker4 in /workspace/hugo
            [Pipeline] {
            [Pipeline] stage (hide)
            [Pipeline] { (test)
            [Pipeline] sh
            [Pipeline] }
            [Pipeline] // stage
            [Pipeline] }
            [Pipeline] // node
            [Pipeline] End of Pipeline
            Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from 192.168.0.6/192.168.0.6:35540
            		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743)
            		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
            		at hudson.remoting.Channel.call(Channel.java:957)
            		at hudson.FilePath.act(FilePath.java:1072)
            		at hudson.FilePath.act(FilePath.java:1061)
            		at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:169)
            		at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99)
            		at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:317)
            		at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286)
            		at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179)
            		at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
            		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            		at java.lang.reflect.Method.invoke(Method.java:498)
            		at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
            		at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
            		at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
            		at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
            		at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
            		at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
            		at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
            		at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:160)
            		at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
            		at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:157)
            		at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:158)
            		at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:162)
            		at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132)
            		at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132)
            		at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
            		at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:84)
            		at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
            		at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
            		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            		at java.lang.reflect.Method.invoke(Method.java:498)
            		at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
            		at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
            		at com.cloudbees.groovy.cps.Next.step(Next.java:83)
            		at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
            		at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
            		at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
            		at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
            		at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
            		at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
            		at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
            		at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186)
            		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370)
            		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93)
            		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282)
            		at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270)
            		at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66)
            		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            		at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131)
            		at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
            		at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
            		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            		at java.lang.Thread.run(Thread.java:748)
            java.nio.file.AccessDeniedException: /caches
            	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
            	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
            	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
            	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:385)
            	at java.nio.file.Files.createDirectory(Files.java:689)
            	at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796)
            	at java.nio.file.Files.createDirectories(Files.java:782)
            	at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:473)
            	at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:440)
            	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052)
            	at hudson.remoting.UserRequest.perform(UserRequest.java:212)
            	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
            	at hudson.remoting.Request$2.run(Request.java:369)
            	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
            	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93)
            	at java.lang.Thread.run(Thread.java:834)
            Finished: FAILURE
            

            Looks like a problem accessing some cache: java.nio.file.AccessDeniedException: /caches.

            albers Harald Albers added a comment - I also have a usecase where no docker.inside is involved. Jenkins master is the official docker image jenkins/jenkins:2.190.1-alpine Agent is based on the adoptopenjdk/openjdk11:x86_64-ubuntu-jdk-11.0.4_11 image and connects to the master via  swarm plugin Master and agent running on Docker 19.03.4 in swarm mode. The hosts are Ubuntu 18.04 LTS on VMware. This pipeline code: node( 'jdk11' ) { stage( 'test' ) { sh 'echo hi.' } } works on both master and agent with durable-task-plugin 1.30. works on the master with durable-task-plugin 1.31. fails on the agent when durable-task-plugin 1.31 is installed: [Pipeline] Start of Pipeline [Pipeline] node Running on build_agent-java11-docker4 in /workspace/hugo [Pipeline] { [Pipeline] stage (hide) [Pipeline] { (test) [Pipeline] sh [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from 192.168.0.6/192.168.0.6:35540 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1743) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:957) at hudson.FilePath.act(FilePath.java:1072) at hudson.FilePath.act(FilePath.java:1061) at org.jenkinsci.plugins.durabletask.BourneShellScript.launchWithCookie(BourneShellScript.java:169) at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:99) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:317) at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:286) at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:179) at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:160) at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23) at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:157) at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:158) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:162) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132) at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:132) at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17) at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:84) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113) at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72) at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21) at com.cloudbees.groovy.cps.Next.step(Next.java:83) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174) at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163) at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129) at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268) at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:186) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:370) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$200(CpsThreadGroup.java:93) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:282) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:270) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:66) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:131) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.nio.file.AccessDeniedException: /caches at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:385) at java.nio.file.Files.createDirectory(Files.java:689) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:796) at java.nio.file.Files.createDirectories(Files.java:782) at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:473) at org.jenkinsci.plugins.durabletask.BourneShellScript$GetAgentInfo.invoke(BourneShellScript.java:440) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3052) at hudson.remoting.UserRequest.perform(UserRequest.java:212) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:93) at java.lang.Thread.run(Thread.java:834) Finished: FAILURE Looks like a problem accessing some cache: java.nio.file.AccessDeniedException: /caches .

            njesper

            Philip Zozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{...}} or docker.inside()} pipeline instructions.

            Yes. This is what we're using:

            withDockerContainer([image: "php:latest", args: "-v ${WORKSPACE}:/project"]) {
                sh "echo 'started a container'"
            }
            
            pzozobrado Philip Zozobrado added a comment - njesper Philip Zozobrado I suspect you might be having some similar problem, if you are not using the agent{docker{...}} or docker.inside()} pipeline instructions. Yes. This is what we're using: withDockerContainer([image: "php:latest" , args: "-v ${WORKSPACE}:/project" ]) { sh "echo 'started a container' " }
            lifeofguenter Günter Grodotzki added a comment - - edited

            Not wanting to sound awful in the comments section, and I highly appreciate the endless efforts of (mostly?) volunteering developers. However it is a really bad experience if a minor version introduces breaking changes.

             

            Why not just bump the version in such cases?

             

            BTW a more portable fix piggy backing on njesper's solution:

             

            args '--user=root --privileged -v ${HOME}/caches:${WORKSPACE}/../../caches'

             

            This should fix it no matter what your configuration is, because it uses the same logic as implemented in the plugin: https://github.com/jenkinsci/durable-task-plugin/pull/106/files#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R485

            lifeofguenter Günter Grodotzki added a comment - - edited Not wanting to sound awful in the comments section, and I highly appreciate the endless efforts of (mostly?) volunteering developers. However it is a really bad experience if a minor version introduces breaking changes.   Why not just bump the version in such cases?   BTW a more portable fix piggy backing on njesper 's solution:   args  '--user=root --privileged -v ${HOME}/caches:${WORKSPACE}/../../caches'   This should fix it no matter what your configuration is, because it uses the same logic as implemented in the plugin:  https://github.com/jenkinsci/durable-task-plugin/pull/106/files#diff-b7cdd655e1fb1fd95154b2fbcb20e8e3R485
            pyssling Nils Carlson added a comment -

            I noted on the merge request for this stuff now that the approach isn't that great either - they are shipping a statically compiled go binary to use as an execution wrapper. This is bad as it breaks Jenkins on other architectures than x86 such as arm and ppc.

             

            pyssling Nils Carlson added a comment - I noted on the merge request for this stuff now that the approach isn't that great either - they are shipping a statically compiled go binary to use as an execution wrapper. This is bad as it breaks Jenkins on other architectures than x86 such as arm and ppc.  
            jaapcrezee Jaap Crezee added a comment - - edited

            > Reverting to Durable Task Plugin v. 1.30 "solves" the problem.

            This + restart (Jenkins) works for me (for now).

            jaapcrezee Jaap Crezee added a comment - - edited > Reverting to Durable Task Plugin v. 1.30 "solves" the problem. This + restart (Jenkins) works for me (for now).
            carroll Carroll Chiou added a comment -

            So, apologies for this taking so long to address. There is currently a fix in the works right now for this issue and JENKINS-59907 as well. I will also update the changelog that is currently being migrated to github. Caching will be disabled when the cache directory is unavailable to the agent.

            The PR can be found here: https://github.com/jenkinsci/durable-task-plugin/pull/114
            ci.jenkins.io is quite unstable right now. Hopefully things will get better sooner.

            carroll Carroll Chiou added a comment - So, apologies for this taking so long to address. There is currently a fix in the works right now for this issue and JENKINS-59907 as well. I will also update the changelog that is currently being migrated to github. Caching will be disabled when the cache directory is unavailable to the agent. The PR can be found here: https://github.com/jenkinsci/durable-task-plugin/pull/114 ci.jenkins.io is quite unstable right now. Hopefully things will get better sooner.
            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited

            IMHO the introduction of the new binary is an error, I do not see a reason to do not make the same behavior on Java, also it copies a binary on the workspace that it is not mentioned anywhere and it comes from outside of the workspace I see potential security issues on that behavior wfollonier WDYT

            The compatibility of the libc whit the binary was compiled could be a potential issue on different Linux distributions

            if the stream is not open, launcherCmd woudl be "" What happened with the script in that case? I think that will do not execute anything and does not show any error
            https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L222

            the interpreter would be launch with '-xe' options this could leak commands that you don't want to show
            https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L86

            I dunno how this command behaves in Cygwin for example
            https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L109

            ifernandezcalvo Ivan Fernandez Calvo added a comment - - edited IMHO the introduction of the new binary is an error, I do not see a reason to do not make the same behavior on Java, also it copies a binary on the workspace that it is not mentioned anywhere and it comes from outside of the workspace I see potential security issues on that behavior wfollonier WDYT The compatibility of the libc whit the binary was compiled could be a potential issue on different Linux distributions if the stream is not open, launcherCmd woudl be "" What happened with the script in that case? I think that will do not execute anything and does not show any error https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L222 the interpreter would be launch with '-xe' options this could leak commands that you don't want to show https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L86 I dunno how this command behaves in Cygwin for example https://github.com/jenkinsci/durable-task-plugin/blob/b122d6b0f924c533b0a26d99a71779bbafc3c543/src/main/go/org/jenkinsci/plugins/durabletask/durable_task_monitor.go#L109

            Great news, carroll!

            Apart from this issue ("cache folder not available"), and JENKINS-59907 ("binary cant run"), I've seen some comments that indicate that there might be a third kind of problems (which might not have it's own issue yet): Odd/rare distros or special configurations/installations, where the new binary wrapper can't find all the libs/tools it needs. Judging from the description of PR 114, this third type of problems might not be addressed.

            If this use of a cache folder on the node is following Jenkins design guidelines I think it would be a good idea to bring up the question with relevant Docker integration plugin(s) why the cache folder isn't mounted when Jenkins starts the container. - I guess the new wrapper brings some kind of value, so it should be made to work in containers as well, if relevant.

            njesper Jesper Andersson added a comment - Great news, carroll ! Apart from this issue ("cache folder not available"), and JENKINS-59907 ("binary cant run"), I've seen some comments that indicate that there might be a third kind of problems (which might not have it's own issue yet): Odd/rare distros or special configurations/installations, where the new binary wrapper can't find all the libs/tools it needs. Judging from the description of PR 114, this third type of problems might not be addressed. If this use of a cache folder on the node is following Jenkins design guidelines I think it would be a good idea to bring up the question with relevant Docker integration plugin(s) why the cache folder isn't mounted when Jenkins starts the container. - I guess the new wrapper brings some kind of value, so it should be made to work in containers as well, if relevant.

            We've also seen this essential issue on Solaris, AIX, and IBM i (AS/400). It was OK on Linux AMD64 and Windows x64. Reverting to Durable Task Plugin 1.30 resolved the issues.

            whittlec William Whittle added a comment - We've also seen this essential issue on Solaris, AIX, and IBM i (AS/400). It was OK on Linux AMD64 and Windows x64. Reverting to Durable Task Plugin 1.30 resolved the issues.
            haridsv Hari Dara added a comment -

            I got this issue resolved by reverting to the previous version, but I am wondering why the console or Jenkins log had no information on what the underlying issue is. Isn't there a lack of sufficient logging and perhaps some sort of error handling here?

            haridsv Hari Dara added a comment - I got this issue resolved by reverting to the previous version, but I am wondering why the console or Jenkins log had no information on what the underlying issue is. Isn't there a lack of sufficient logging and perhaps some sort of error handling here?
            jonathanb1 Jonathan B added a comment - - edited

            Since upgrading durable-task 1.30 to 1.31, we're seeing a lot of intermittent

            Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job"): error=26, Text file busy
            

            and, less often,

            Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job"): error=13, Permission denied
            

            This is all running on standard amd64 Ubuntu (no exotic OS or architecture) and not in Docker agents. Should I file a separate issue?

            jonathanb1 Jonathan B added a comment - - edited Since upgrading durable-task 1.30 to 1.31, we're seeing a lot of intermittent Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job" ): error=26, Text file busy and, less often, Cannot run program "/home/ubuntu/caches/durable-task/durable_task_monitor_1.31_unix_64" (in directory "/home/ubuntu/workspace/path/to/job" ): error=13, Permission denied This is all running on standard amd64 Ubuntu (no exotic OS or architecture) and not in Docker agents. Should I file a separate issue?
            dnusbaum Devin Nusbaum added a comment -

            Hi everyone, sorry for the issues. I filed https://github.com/jenkins-infra/update-center2/pull/305 to suspend durable-task 1.31 from distribution for now. As a workaround, you can roll back to 1.30 or add org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER=true as a system property to the JVM running Jenkins (or set the same Groovy variable to true dynamically via the script console, though that will be unset if you restart Jenkins).

            For some context, the new binary was intended to improve some long-standing robustness issues with the existing shell wrapper by being able to use utilities like setsid, and to make it more maintainable going forward. The code that detects whether to use the binary or the existing shell wrapper obviously needs to handle additional cases, and we need to add some more testing for other platforms where possible, in particular the Docker-based workflows that were broken by the change. Ideally, changes to implementation details like this would be transparent to users and wouldn't cause breaking changes, but this plugin handles lot of subtly different platforms at the same time and can only test on some of them, so changes always seem to cause problems.

            dnusbaum Devin Nusbaum added a comment - Hi everyone, sorry for the issues. I filed  https://github.com/jenkins-infra/update-center2/pull/305  to suspend durable-task 1.31 from distribution for now. As a workaround, you can roll back to 1.30 or add org.jenkinsci.plugins.durabletask.BourneShellScript.FORCE_SHELL_WRAPPER=true as a system property to the JVM running Jenkins (or set the same Groovy variable to true dynamically via the script console, though that will be unset if you restart Jenkins). For some context, the new binary was intended to improve some long-standing robustness issues with the existing shell wrapper by being able to use utilities like setsid, and to make it more maintainable going forward. The code that detects whether to use the binary or the existing shell wrapper obviously needs to handle additional cases, and we need to add some more testing for other platforms where possible, in particular the Docker-based workflows that were broken by the change. Ideally, changes to implementation details like this would be transparent to users and wouldn't cause breaking changes, but this plugin handles lot of subtly different platforms at the same time and can only test on some of them, so changes always seem to cause problems.
            carroll Carroll Chiou added a comment -

            jonathanb1 I ran into this issue when I was testing out the fix and running tests on it the very first time. It was solved immediately by running a mvn clean install, so unfortunately I was not able to investigate deeper into the issue as I still can't reproduce it. I think this issue will be solved by reverting to 1.30 and installing again. If that does not solve it, I think it warrants a separate issue.

            njesper I don't think there has been anything official on using caches on the agent. Not many plugins use caching, but i think is something that we should explore further since I think most people want to reduce the workload of the masters.

            haridsv Yes, there should be more error handling involved. I am looking to add that in. The tricky part is the script is supposed to be launched as a fire and forget, this includes the original shell wrapper as well. But of course, it's one thing if your shell fails to launch vs this binary.

            carroll Carroll Chiou added a comment - jonathanb1 I ran into this issue when I was testing out the fix and running tests on it the very first time. It was solved immediately by running a mvn clean install, so unfortunately I was not able to investigate deeper into the issue as I still can't reproduce it. I think this issue will be solved by reverting to 1.30 and installing again. If that does not solve it, I think it warrants a separate issue. njesper I don't think there has been anything official on using caches on the agent. Not many plugins use caching, but i think is something that we should explore further since I think most people want to reduce the workload of the masters. haridsv Yes, there should be more error handling involved. I am looking to add that in. The tricky part is the script is supposed to be launched as a fire and forget, this includes the original shell wrapper as well. But of course, it's one thing if your shell fails to launch vs this binary.
            theandrewlane Andrew Lane added a comment -

            Ubuntu 18.04 Jenkins 2.202 - I can confirm downgrading durable-task-plugin resolved this issue 

            theandrewlane Andrew Lane added a comment - Ubuntu 18.04 Jenkins 2.202 - I can confirm downgrading durable-task-plugin resolved this issue 

            carroll if all the thing is related to reduce the load on the master I think there are simpler and better ways to make it, first, the basics with console output avoid to have insane verbose console output, I mean, have a 5-10GB of console logs is stupid, if you need to review this file probably you drive crazy trying to open the file, so reduce the console output is a key, if you need verbose output for some command to redirect the output to files and at the end of the job archive them on Jenkins. If after all you still think that this cache is needed, make it with something standard, do not reinvent the wheel, named pipes works on every Unix implementation, there is also an implementation for windows, they are plain files easy to manage from Java so you would avoid a ton of problems related with the platform.

            Because durable-task-plugin is something you could not rid of it if you use pipelines is a critical component, maybe this cache would go in another plugin and keep the durable-task as it is, or allow to rid completely from durable-task it causes more issues than benefits if you do not want to restart pipelines from any point after a failure, that it is an antipattern IMHO the pipeline should pass on one round if not it is not well designed and you should split it.

            ifernandezcalvo Ivan Fernandez Calvo added a comment - carroll if all the thing is related to reduce the load on the master I think there are simpler and better ways to make it, first, the basics with console output avoid to have insane verbose console output, I mean, have a 5-10GB of console logs is stupid, if you need to review this file probably you drive crazy trying to open the file, so reduce the console output is a key, if you need verbose output for some command to redirect the output to files and at the end of the job archive them on Jenkins. If after all you still think that this cache is needed, make it with something standard, do not reinvent the wheel, named pipes works on every Unix implementation, there is also an implementation for windows, they are plain files easy to manage from Java so you would avoid a ton of problems related with the platform. Because durable-task-plugin is something you could not rid of it if you use pipelines is a critical component, maybe this cache would go in another plugin and keep the durable-task as it is, or allow to rid completely from durable-task it causes more issues than benefits if you do not want to restart pipelines from any point after a failure, that it is an antipattern IMHO the pipeline should pass on one round if not it is not well designed and you should split it.
            carroll Carroll Chiou added a comment -

            So a new release 1.32 is out. Until we have a fix out resolving this ticket and, at least, JENKINS-59907, the binary will be disabled by default.

            ifernandezcalvoHi Ivan, actually the caching was added as a way to reduce the number of times the master is transmitting the binary over to the agent. What was not taken into account was that the cache directory chosen may not be accessible to the job. A fix is in the works.

            The binary wrapper itself was added to make the original shell wrapper script more maintainable rather than mystical. There was also an attempt to reduce the issues where the script itself was being terminated for unknown reasons. One of the ways to do this was to use setsid instead of nohup (See JENKINS-25503). The reason the launched script's output is being redirected to a file is so that the output can be transmitted to master in order to display the script's output.

            carroll Carroll Chiou added a comment - So a new release 1.32 is out. Until we have a fix out resolving this ticket and, at least, JENKINS-59907 , the binary will be disabled by default. ifernandezcalvo Hi Ivan, actually the caching was added as a way to reduce the number of times the master is transmitting the binary over to the agent. What was not taken into account was that the cache directory chosen may not be accessible to the job. A fix is in the works. The binary wrapper itself was added to make the original shell wrapper script more maintainable rather than mystical . There was also an attempt to reduce the issues where the script itself was being terminated for unknown reasons. One of the ways to do this was to use setsid instead of nohup (See JENKINS-25503 ). The reason the launched script's output is being redirected to a file is so that the output can be transmitted to master in order to display the script's output.
            albers Harald Albers added a comment -

            carroll 1.32 does not resolve the issue for me.

            When running a sh step remotely on a dockerized agent as described above, I still get java.nio.file.AccessDeniedException: /caches, see details above.

            albers Harald Albers added a comment - carroll 1.32 does not resolve the issue for me. When running a sh step remotely on a dockerized agent as described above, I still get java.nio.file.AccessDeniedException: /caches , see details above.

            albers How are you running your container?

            I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container.

            With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper.

            Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).

            njesper Jesper Andersson added a comment - albers How are you running your container? I'm guessing wildly here, but to me it looks like your node config is setting "Remote root directory" to /. And I'm also guessing that you are running the container as a specific user, e.g. '-u jenkins:jenkins' and probably mount the workspace like e.g. '-v /home/jenkins/workspace:/workspace'. And then start the agent inside the container. With such a setup the Jenkins agent will probably not have enough permissions to create '/cache', which the plugin perhaps still is trying to do even if it's set to not use the new wrapper. Try adding e.g. '-v /home/jenkins/cache:/cache' (modified to your config) or pre-creating a /cache folder in your image that is owned by 'jenkins:jenkins' (the user you run the container as).
            albers Harald Albers added a comment -

            njesper Your questions pointed me to a solution, thanks a lot.

            But first the answers:

            The Docker image of the agent runs as the user jenkins. The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins. The user jenkins obviously does not have sufficient permissions to create a directory in /.

            The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory.

            Another solution would be to pre-create the /caches directory in the image as well.

            I'm fine with this solution.

            But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.

            albers Harald Albers added a comment - njesper Your questions pointed me to a solution, thanks a lot. But first the answers: The Docker image of the agent runs as the user jenkins . The swarm client plugin sets the "Remote root directory" to "/" when connecting to the master and dynamically creating an agent. The image has an existing /workspace directory that is writable for the user jenkins . The user jenkins obviously does not have sufficient permissions to create a directory in / . The swarm client can be configured to use a specific root directory. If I set it to a directory where the user jenkins has write permission, the build will successfully create a directory caches alongside the workspace directory. Another solution would be to pre-create the /caches directory in the image as well. I'm fine with this solution. But the bottom line is that we need documentation that the user who performs the build must have sufficient permissions to create directories in the build root, or that specific directories need to exist with appropriate permissions.
            carroll Carroll Chiou added a comment -

            I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix (https://github.com/jenkinsci/durable-task-plugin/pull/114) into master.

            albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.

            carroll Carroll Chiou added a comment - I apologize, what 1.31 did was disable the binary wrapper as default, but it did not resolve the caching issue because the cache dir is stil trying to be created. I am in the process of merging in my current fix ( https://github.com/jenkinsci/durable-task-plugin/pull/114 ) into master. albers once the fix gets through, those users who do not have permissions to create directories in the build root will have caching disabled.
            carroll Carroll Chiou added a comment -

            So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.

            carroll Carroll Chiou added a comment - So version 1.33 has now been released. This includes the fix for disabling cache when there are insufficient permissions to access the cache dir. The binary is still disabled by default.
            albers Harald Albers added a comment -

            carroll 1.33 works for my usecase (build root in /, user not having permissions to create /caches directory)

            albers Harald Albers added a comment - carroll 1.33 works for my usecase (build root in / , user not having permissions to create /caches directory)
            don_code Don L added a comment -

            I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl, but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container.

            Tests run on the latest (v1.33) of durable-task.

            Logs with LAUNCH_DIAGNOSTICS set:

            sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied
            sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied
            touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
            mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp': No such file or directory
            touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt': Permission denied
            [ last line repeated ~100 times ]                                                                                          
            process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47  

            In the JNLP container:

            bash-4.4$ cd /home/jenkins/agent/caches
            bash-4.4$ ls -l
            total 0
            drwxr-xr-x    2 jenkins  jenkins          6 Mar  6 15:47 durable-task 

            In the kubectl container:

            I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l
            total 0
            drwxr-xr-x 2 1000 1000 6 Mar  6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id
            uid=1001 gid=0(root) groups=0(root) 

            I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.:

            kind: Pod                                                                                                                  
            metadata:                                                                                                                  
              name: kubectl                                                                                                            
            spec:                                                                                                                      
              containers:                                                                                                              
              - command:                                                                                                               
                - cat                                                                                                                  
                image: bitnami/kubectl:1.14                                     
                imagePullPolicy: Always                                                                                                
                name: kubectl                                                                                                          
                tty: true                                                                                                              
              securityContext:                                                                                                         
                runAsUser: 1000 
            don_code Don L added a comment - I've found this also reproduces when using build agents in Kubernetes, not just Docker. The problem here is that Kubernetes launches two containers into a pod with a shared mount: a JNLP slave container, which Jenkins does have permission to write the cache directory in, and a build container (in my case kubectl , but could be any container without a Jenkins user) where it does not necessarily have the same permission, in which code actually runs. The plugin runs its test inside the JNLP container, enables the wrapper, and then exhibits the same hanging behavior when commands are run in the kubectl container. Tests run on the latest (v1.33) of durable-task . Logs with LAUNCH_DIAGNOSTICS set: sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt: Permission denied sh: 1: cannot create /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp: Permission denied touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied mv: cannot stat '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-result.txt.tmp' : No such file or directory touch: cannot touch '/home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47/jenkins-log.txt' : Permission denied [ last line repeated ~100 times ] process apparently never started in /home/jenkins/agent/workspace/REDACTED_PR-5140@tmp/durable-cca9ec47 In the JNLP container: bash-4.4$ cd /home/jenkins/agent/caches bash-4.4$ ls -l total 0 drwxr-xr-x 2 jenkins jenkins 6 Mar 6 15:47 durable-task In the kubectl container: I have no name!@<REDACTED>:/home/jenkins/agent/caches$ ls -l total 0 drwxr-xr-x 2 1000 1000 6 Mar 6 15:47 durable-taskI have no name!@<REDACTED>:/home/jenkins/agent/caches$ id uid=1001 gid=0(root) groups=0(root) I've had some success today working around this by adding a security context to my pods, forcing a run as Jenkins's UID (which for me is 1000 - YMMV depending on how Jenkins is running), e.g.: kind: Pod metadata: name: kubectl spec: containers: - command: - cat image: bitnami/kubectl:1.14 imagePullPolicy: Always name: kubectl tty: true securityContext: runAsUser: 1000
            dnusbaum Devin Nusbaum added a comment -

            don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.

            dnusbaum Devin Nusbaum added a comment - don_code Please open a new issue instead of commenting here. In durable-task 1.33, the caches directory is not actually used by default, so I think you can ignore it. The problem in your case looks like permissions on the control directory for the script, and I think that you would run into the same problems on durable-task 1.30 or older, so I would check for similar bugs reported against Durable Task Plugin and/or Kubernetes Plugin.
            komalb08 Komal Bardia added a comment -

            After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues.

            [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }

            komalb08 Komal Bardia added a comment - After upgrading to version 1.33 My all Jenkins job failed with error. Please suggest the solution. Before upgrade there were no issues. [Pipeline] shprocess apparently never started in XYZ/durable-xyz (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer) [Pipeline] }
            dnusbaum Devin Nusbaum added a comment - - edited

            komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

            dnusbaum Devin Nusbaum added a comment - - edited komalb08 The code that caused the issue described in this ticket is now disabled by default. Please open a new ticket and fully describe the issue you are seeing, and run Jenkins with the system property mentioned in the error message to get more details on the specific problem.

            People

              carroll Carroll Chiou
              njesper Jesper Andersson
              Votes:
              34 Vote for this issue
              Watchers:
              47 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: