Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59668

Run wrapper process in the background fails with the latest changes

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • durable-task-plugin
    • Jenkins 2.197
      durable-task 1.30
      Docker version 19.03.1, build 74b1e89e8a

      Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

       

      [2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
      [2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
      script returned exit code -2  

       

      Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

      https://github.com/elastic/apm-agent-nodejs/pull/1393

      Apparently it happens when running a docker inside of a worker.

       

      Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around

       

      Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

       

       

          [JENKINS-59668] Run wrapper process in the background fails with the latest changes

          Victor Martinez created issue -

          I've just got some interesting stacktrace when running the below snippet

           

                post {
                  always {
                    sh label: 'Docker ps', script: 'docker ps -a || true'
                  }
                } 

           

          12:25:19  [Pipeline] sh (Docker ps)
          12:30:26  process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5ddcccd6
          12:30:26  (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
          12:30:26  Error when executing always post condition:
          12:30:26  hudson.AbortException: script returned exit code -2
          12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658)
          12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604)
          12:30:26  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548)
          12:30:26  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          12:30:26  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          12:30:26  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          12:30:26  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          12:30:26  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          12:30:26  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          12:30:26  	at java.lang.Thread.run(Thread.java:748)
          12:30:26   

          Victor Martinez added a comment - I've just got some interesting stacktrace when running the below snippet   post { always { sh label: 'Docker ps' , script: 'docker ps -a || true ' } }   12:25:19 [Pipeline] sh (Docker ps) 12:30:26 process apparently never started in / var /lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5ddcccd6 12:30:26 (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS= true might make the problem clearer) 12:30:26 Error when executing always post condition: 12:30:26 hudson.AbortException: script returned exit code -2 12:30:26 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658) 12:30:26 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604) 12:30:26 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548) 12:30:26 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 12:30:26 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 12:30:26 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 12:30:26 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 12:30:26 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 12:30:26 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 12:30:26 at java.lang. Thread .run( Thread .java:748) 12:30:26
          Victor Martinez made changes -
          Link New: This issue is caused by JENKINS-58290 [ JENKINS-58290 ]
          Victor Martinez made changes -
          Description Original: Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

           
          {code:java}
          [2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
          [2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
          script returned exit code -2 {code}
           

          Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

          - [https://github.com/elastic/apm-agent-nodejs/pull/1393]

          Apparently it happens when running a docker inside of worker.

           

          Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around :(

           

          Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

           

           
          New: Some erratic errors started to happen as a consequence of https://issues.jenkins-ci.org/browse/JENKINS-58290 

           
          {code:java}
          [2019-09-30T15:00:13.698Z] process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-3a70569b
          [2019-09-30T15:00:13.698Z] (running Jenkins temporarily with -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true might make the problem clearer)
          script returned exit code -2 {code}
           

          Unfortunately, it seems the error to fail, for some reason, just in one particular PR rather than been affecting the whole CI!

          - [https://github.com/elastic/apm-agent-nodejs/pull/1393]

          Apparently it happens when running a docker inside of a worker.

           

          Besides, I'd expect the behavior of using LAUNCH_DIAGNOSTICS as the default one for backward compatibilities rather than the other way around :(

           

          Please let me know if you need further details, my only and big concern is what the heck happens to only being failing in one particular PR of an MBP rather than in all of them... that's really weird.

           

           

          Victor Martinez added a comment - - edited

          Not sure if it helps but in a nutshell the below pseudo-declarative represents the workflow that triggers this particular issue.

          pipeline {
             agent { label 'linux' }
             stages {
                stage('foo') {
                    steps {
                       sh 'docker run .....'
                       scripts {
                            node('windows') {
                                  bat ....
                            }
                       }
                    } 
                    post {
                         clean {
                             sh 'docker ps -a || true'
                         }
                    }
                }
             } 
          } 

          Victor Martinez added a comment - - edited Not sure if it helps but in a nutshell the below pseudo-declarative represents the workflow that triggers this particular issue. pipeline {   agent { label 'linux' }   stages {      stage( 'foo' ) {          steps {             sh 'docker run .....'             scripts {                  node( 'windows' ) {                        bat ....                  }             }          }          post {               clean {                   sh 'docker ps -a || true '               }          }      }   } }
          Victor Martinez made changes -
          Environment Original: Jenkins 2.197
          durable-task 1.30
          New: Jenkins 2.197
          durable-task 1.30
          Docker version 19.03.1, build 74b1e89e8a

          When enabling the flag LAUNCH_DIAGNOSTICS:

          16:19:29  Post stage
          16:19:29  [Pipeline] sh (Docker ps)
          16:19:29  nohup: failed to run command 'sh': No such file or directory
          16:24:35  process apparently never started in /var/lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5dd00fe7
          16:24:35  Error when executing always post condition:
          16:24:35  hudson.AbortException: script returned exit code -2
          16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658)
          16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604)
          16:24:35  	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548)
          16:24:35  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          16:24:35  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          16:24:35  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          16:24:35  	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          16:24:35  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          16:24:35  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          16:24:35  	at java.lang.Thread.run(Thread.java:748)
          16:24:35  
          16:24:35  [Pipeline] }
           

           

          Apparently the error seems to be related to running a post step within the stage context which uses the top-level agent defined at the very beginning of the declarative pipeline.

          Victor Martinez added a comment - When enabling the flag LAUNCH_DIAGNOSTICS: 16:19:29 Post stage 16:19:29 [Pipeline] sh (Docker ps) 16:19:29 nohup: failed to run command 'sh' : No such file or directory 16:24:35 process apparently never started in / var /lib/jenkins/workspace/ejs_apm-agent-nodejs-mbp_PR-1393@tmp/durable-5dd00fe7 16:24:35 Error when executing always post condition: 16:24:35 hudson.AbortException: script returned exit code -2 16:24:35 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.handleExit(DurableTaskStep.java:658) 16:24:35 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:604) 16:24:35 at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:548) 16:24:35 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 16:24:35 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 16:24:35 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 16:24:35 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 16:24:35 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 16:24:35 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 16:24:35 at java.lang. Thread .run( Thread .java:748) 16:24:35 16:24:35 [Pipeline] }   Apparently the error seems to be related to running a post step within the stage context which uses the top-level agent defined at the very beginning of the declarative pipeline.
          Devin Nusbaum made changes -
          Labels New: pipeline

          Devin Nusbaum added a comment -

          IIRC, using LAUNCH_DIAGNOSTICS restores the previous behavior, so if enabling that does not fix the problem, I don't think JENKINS-58290 is the cause. The other change in that release was PR 95, which changes how the default shell is selected. I would check how the default shell is configured on the agents in your examples, since "nohup: failed to run command 'sh': No such file or directory" definitely seems like the problem is related to the shell being used, which I believe previously would have defaulted to /bin/sh, now defaults to sh, so maybe your agents need sh on PATH or something?

          Devin Nusbaum added a comment - IIRC, using LAUNCH_DIAGNOSTICS restores the previous behavior, so if enabling that does not fix the problem, I don't think JENKINS-58290 is the cause. The other change in that release was PR 95 , which changes how the default shell is selected. I would check how the default shell is configured on the agents in your examples, since "nohup: failed to run command 'sh': No such file or directory" definitely seems like the problem is related to the shell being used, which I believe previously would have defaulted to /bin/sh , now defaults to sh , so maybe your agents need sh on PATH or something?

          Jesse Glick added a comment -

          I presume the issue is trying to run sh on Windows and it not existing. BTW the full Pipeline script does not seem to make much sense: you are grabbing a Linux node, then holding an executor lock while also grabbing a Windows node?

          Jesse Glick added a comment - I presume the issue is trying to run sh on Windows and it not existing. BTW the full Pipeline script does not seem to make much sense: you are grabbing a Linux node, then holding an executor lock while also grabbing a Windows node?

            Unassigned Unassigned
            v2v Victor Martinez
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: