• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • kubernetes plugin v0.12 or current master
      Jenkins 2.62
      container step running in a _debian_ container

      podTemplate(name: "mypod", label: "label", containers: [
                              containerTemplate(name: 'debian',
                                      image: 'debian',
                                      ttyEnabled: true,
                                      command: 'cat',
                              )
      ]) {
          node("label") {
              container('debian') {
                  sh 'for i in $(seq 1 1000); do echo $i; sleep 0.3; done'
              }
          }
      }
      

      leads to

      [Pipeline] podTemplate
      [Pipeline] {
      [Pipeline] node
      Running on horst-nwmsn-32h5h in /home/jenkins/workspace/full
      [Pipeline] {
      [Pipeline] container
      [Pipeline] {
      [Pipeline] sh
      [full] Running shell script
      + seq 1 1000
      + echo 1
      1
      + sleep 0.3
      + echo 2
      2
      + sleep 0.3
      [Pipeline] }
      [Pipeline] // container
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // podTemplate
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      Sometimes it fails a bit faster.

      Might be related to the script being started with "nohup" now.

          [JENKINS-46651] container step "script returned exit code -1"

          Martin Sander added a comment -

          Note:

          Example works with busybox container.

          Martin Sander added a comment - Note: Example works with busybox container.

          Martin Sander added a comment -

          Also works in ubuntu..

          Martin Sander added a comment - Also works in ubuntu..

          Martin Sander added a comment -

          Seems that the debian:latest container does not have ps by default, and durable-task uses it to check for the process.

          Should we document this?

          Martin Sander added a comment - Seems that the debian:latest container does not have ps by default, and durable-task uses it to check for the process. Should we document this?

          Martin Sander added a comment -

          Btw. workaround is to derive your own image from "debian" and install the procps package in it.

          Martin Sander added a comment - Btw. workaround is to derive your own image from "debian" and install the procps package in it.

          Max k added a comment - - edited

          I have similar problem with exit -1 error but i do have procps package installed and its happens only with 10+ concurrent build of same project.

          kubernetes plugin v1.0
          Jenkins 2.37
          Here is my pipeline:

          tag = env.BUILD_TAG
          podTemplate(label: tag,
              volumes: [
              hostPathVolume(mountPath: '/var/lib/docker', hostPath: '/tmp/'+tag)
              ],
              containers: [
                  containerTemplate(name: 'jnlp', image: 'localreg:5000/jenkinsci/jnlp-slave', args: '${computer.jnlpmac} ${computer.name}'),
                  containerTemplate(name: 'java', image: 'localreg:5000/base/java', ttyEnabled: true, command: 'cat'),
                  containerTemplate(name: 'dind', image: 'localreg:5000/base-builders/dind:1.0.0', ttyEnabled: true, privileged: true, alwaysPullImage: true)
            ]) {
              node(tag) {
                  try {
                      container('jnlp') {
                          stage('Preparing') {
                              checkout scm
                              setGitEnvironmentVariables()
                              setSimpleEnvironmentVariables()
                          }
                      }
                      container('java') {
                          stage('Build a Maven project') {
                              sh 'env'
                          }
                      }
                      container('dind') {
                          stage('Build Docker') {
                              sh "docker build . || echo real bad exit code !"
                          }
                      }
                  } catch (error) {
                      throw error
                  }
              }
          }
          

          Sometimes i receive this error while 10+ concurrent builds are running:

          /home/jenkins/workspace/docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ
          [Pipeline] sh
          [docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ] Running shell script
          + docker build .
          Sending build context to Docker daemon 74.24 kB
          
          Step 1 : FROM localreg:5000/base/ubuntu-14.04
          latest: Pulling from base/ubuntu-14.04
          96c6a1f3c3b0: Pulling fs layer
          e8c945afff4f: Pulling fs layer
          b46adc58e13f: Pulling fs layer
          e8c945afff4f: Verifying Checksum
          e8c945afff4f: Download complete
          b46adc58e13f: Download complete
          96c6a1f3c3b0: Verifying Checksum
          96c6a1f3c3b0: Download complete
          [Pipeline] }
          [Pipeline] // stage
          [Pipeline] }
          [Pipeline] // container
          [Pipeline] }
          [Pipeline] // node
          [Pipeline] }
          [Pipeline] // podTemplate
          [Pipeline] End of Pipeline
          ERROR: script returned exit code -1
          Finished: FAILURE
          

          Any suggestions what can it be ?

          Max k added a comment - - edited I have similar problem with exit -1 error but i do have procps package installed and its happens only with 10+ concurrent build of same project. kubernetes plugin v1.0 Jenkins 2.37 Here is my pipeline: tag = env.BUILD_TAG podTemplate(label: tag, volumes: [ hostPathVolume(mountPath: '/ var /lib/docker' , hostPath: '/tmp/' +tag) ], containers: [ containerTemplate(name: 'jnlp' , image: 'localreg:5000/jenkinsci/jnlp-slave' , args: '${computer.jnlpmac} ${computer.name}' ), containerTemplate(name: 'java' , image: 'localreg:5000/base/java' , ttyEnabled: true , command: 'cat' ), containerTemplate(name: 'dind' , image: 'localreg:5000/base-builders/dind:1.0.0' , ttyEnabled: true , privileged: true , alwaysPullImage: true ) ]) { node(tag) { try { container( 'jnlp' ) { stage( 'Preparing' ) { checkout scm setGitEnvironmentVariables() setSimpleEnvironmentVariables() } } container( 'java' ) { stage( 'Build a Maven project' ) { sh 'env' } } container( 'dind' ) { stage( 'Build Docker' ) { sh "docker build . || echo real bad exit code !" } } } catch (error) { throw error } } } Sometimes i receive this error while 10+ concurrent builds are running: /home/jenkins/workspace/docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ [Pipeline] sh [docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ] Running shell script + docker build . Sending build context to Docker daemon 74.24 kB Step 1 : FROM localreg:5000/base/ubuntu-14.04 latest: Pulling from base/ubuntu-14.04 96c6a1f3c3b0: Pulling fs layer e8c945afff4f: Pulling fs layer b46adc58e13f: Pulling fs layer e8c945afff4f: Verifying Checksum e8c945afff4f: Download complete b46adc58e13f: Download complete 96c6a1f3c3b0: Verifying Checksum 96c6a1f3c3b0: Download complete [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: script returned exit code -1 Finished: FAILURE Any suggestions what can it be ?

          Martin Sander added a comment -

          omegam: Can you try setting org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT to something higher than the default 15 seconds, and see if that helps?

          Martin Sander added a comment - omegam : Can you try setting org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT to something higher than the default 15 seconds, and see if that helps?

          Max k added a comment -

          0x89: Thank you for suggestion but it's doesn't help 8(
          I tried to set parameter to 30,60,120 with same result.
          Here is my jenkins master launcher string:

          /usr/bin/java -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120 -Djava.awt.headless=true -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85 -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
          

          Max k added a comment - 0x89 : Thank you for suggestion but it's doesn't help 8( I tried to set parameter to 30,60,120 with same result. Here is my jenkins master launcher string: /usr/bin/java -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120 -Djava.awt.headless= true -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85 -jar /usr/share/jenkins/jenkins.war --webroot=/ var /cache/jenkins/war --httpPort=8080

          I'm having similar issues. I haven't managed to figure out what is the exact cause, but for example running "bundler install" in Ruby container triggers it fairly often while installing gems that have native extensions like json or ffi. Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container.

          Eljas Alakulppi added a comment - I'm having similar issues. I haven't managed to figure out what is the exact cause, but for example running "bundler install" in Ruby container triggers it fairly often while installing gems that have native extensions like json or ffi. Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container.

          Martin Sander added a comment -

          Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container.

          Good idea. Could you try to verify that assumption with a minimal pipeline?

          Martin Sander added a comment - Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container. Good idea. Could you try to verify that assumption with a minimal pipeline?

          Scott Hebert added a comment -

          We are also seeing this more and more. I also tried with a snapshot of durable-task to make use of https://github.com/jenkinsci/durable-task-plugin/pull/46, but it still happens frequently.

          Scott Hebert added a comment - We are also seeing this more and more. I also tried with a snapshot of durable-task to make use of https://github.com/jenkinsci/durable-task-plugin/pull/46 , but it still happens frequently.

          Martin Sander added a comment - scoheb : https://github.com/jenkinsci/durable-task-plugin/pull/46 won't help here, that is another timeout. The timeout that is (probably) causing this problem can be found here: https://github.com/jenkinsci/durable-task-plugin/blob/d740d4624ad81f2bf75cdf4351fe66b4378bb76c/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L57 https://github.com/jenkinsci/durable-task-plugin/blob/d740d4624ad81f2bf75cdf4351fe66b4378bb76c/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L209

          Scott Hebert added a comment -

          Thanks 0x89

          We actually switched to using a SNAPSHOT (132c66c3) of the plugin based on master...we were using 1.0.

          We will see how it behaves.

          Scott Hebert added a comment - Thanks 0x89 We actually switched to using a SNAPSHOT (132c66c3) of the plugin based on master...we were using 1.0. We will see how it behaves.

          Scott Hebert added a comment -

          0x89 Still seeing this problem even with master

          seeing this in my logs:

          sh-4.3# "ps" "-o" "pid=" "9999" ^M
          9999^M
          sh-4.3# printf "EXITCODE %3d" $?; exit^M
          EXITCODE 0exit^M
          Sep 28, 2017 3:18:49 PM org.jenkinsci.plugins.durabletask.ProcessLiveness isAlive
          WARNING: org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1@165c05c1; decorates hudson.Launcher$RemoteLauncher@7ad4a600 on hudson.remoting.Channel@37819af9:JNLP4-connect connection from 10.128.2.1/10.128.2.1:40456 does not seem able to determine whether processes are alive or not

          Scott Hebert added a comment - 0x89 Still seeing this problem even with master seeing this in my logs: sh-4.3# "ps" "-o" "pid=" "9999" ^M 9999^M sh-4.3# printf "EXITCODE %3d" $?; exit^M EXITCODE 0exit^M Sep 28, 2017 3:18:49 PM org.jenkinsci.plugins.durabletask.ProcessLiveness isAlive WARNING: org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1@165c05c1; decorates hudson.Launcher$RemoteLauncher@7ad4a600 on hudson.remoting.Channel@37819af9:JNLP4-connect connection from 10.128.2.1/10.128.2.1:40456 does not seem able to determine whether processes are alive or not

          Scott Hebert added a comment -

          also this:

          WARNING: Error getting exit code
          java.lang.InterruptedException
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
          at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
          at hudson.Proc.joinWithTimeout(Proc.java:170)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:310)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:279)
          at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)

          Scott Hebert added a comment - also this: WARNING: Error getting exit code java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72) at hudson.Proc.joinWithTimeout(Proc.java:170) at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89) at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:310) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:279) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          Scott Hebert added a comment -

          I am now trying -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120

          Scott Hebert added a comment - I am now trying -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120

          Scott Hebert added a comment -

          0x89

          so it looks like I may have found something...

          One theory I have, has to do with the logic for detecting if a process is still running...the plugin executes a "ps" inside the container while the main script is running...the result of the ps call is harvested from the output of the call and used to verify that the main script is alive...BUT it also exports all the env vars BEFORE running the "ps" command ... and some env vars may not be correctly escaped therefore causing some stream corruption between output and error and the EXITCODE is not harvested and thus it thinks the process is not running hence the "-1"

          I found these traces in my logs:

          WARNING: Unable to find "EXITCODE" in a valid identifier
          WARNING: Unable to find "EXITCODE" in r kill -l [sigspec]

          I took the combination of https://github.com/jenkinsci/kubernetes-plugin/pull/232 and https://github.com/jenkinsci/kubernetes-plugin/pull/218 to produce a new SNAPSHOT.

          Have not had a "-1" since.

          Scott Hebert added a comment - 0x89 so it looks like I may have found something... One theory I have, has to do with the logic for detecting if a process is still running...the plugin executes a "ps" inside the container while the main script is running...the result of the ps call is harvested from the output of the call and used to verify that the main script is alive...BUT it also exports all the env vars BEFORE running the "ps" command ... and some env vars may not be correctly escaped therefore causing some stream corruption between output and error and the EXITCODE is not harvested and thus it thinks the process is not running hence the "-1" I found these traces in my logs: WARNING: Unable to find "EXITCODE" in a valid identifier WARNING: Unable to find "EXITCODE" in r kill -l [sigspec] I took the combination of https://github.com/jenkinsci/kubernetes-plugin/pull/232 and https://github.com/jenkinsci/kubernetes-plugin/pull/218 to produce a new SNAPSHOT. Have not had a "-1" since.

          Martin Sander added a comment -

          scoheb: good catch!

          I will try to get the two pull requests merged quickly.

          Martin Sander added a comment - scoheb : good catch! I will try to get the two pull requests merged quickly.

          Scott Hebert added a comment -

          0x89

          Still getting this from time to time:

          WARNING: Error getting exit code
          java.lang.InterruptedException
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
          at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
          at hudson.Proc.joinWithTimeout(Proc.java:170)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:322)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)

          This one has nothing to do with the EXITCODE and invalid env vars...

          Scott Hebert added a comment - 0x89 Still getting this from time to time: WARNING: Error getting exit code java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72) at hudson.Proc.joinWithTimeout(Proc.java:170) at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89) at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:322) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) This one has nothing to do with the EXITCODE and invalid env vars...

          Jesse Glick added a comment -

          Possibly solved by JENKINS-47791.

          Jesse Glick added a comment - Possibly solved by  JENKINS-47791 .

            0x89 Martin Sander
            0x89 Martin Sander
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: