• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • kubernetes plugin v0.12 or current master
      Jenkins 2.62
      container step running in a _debian_ container

      podTemplate(name: "mypod", label: "label", containers: [
                              containerTemplate(name: 'debian',
                                      image: 'debian',
                                      ttyEnabled: true,
                                      command: 'cat',
                              )
      ]) {
          node("label") {
              container('debian') {
                  sh 'for i in $(seq 1 1000); do echo $i; sleep 0.3; done'
              }
          }
      }
      

      leads to

      [Pipeline] podTemplate
      [Pipeline] {
      [Pipeline] node
      Running on horst-nwmsn-32h5h in /home/jenkins/workspace/full
      [Pipeline] {
      [Pipeline] container
      [Pipeline] {
      [Pipeline] sh
      [full] Running shell script
      + seq 1 1000
      + echo 1
      1
      + sleep 0.3
      + echo 2
      2
      + sleep 0.3
      [Pipeline] }
      [Pipeline] // container
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] }
      [Pipeline] // podTemplate
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE
      

      Sometimes it fails a bit faster.

      Might be related to the script being started with "nohup" now.

          [JENKINS-46651] container step "script returned exit code -1"

          Scott Hebert added a comment -

          We are also seeing this more and more. I also tried with a snapshot of durable-task to make use of https://github.com/jenkinsci/durable-task-plugin/pull/46, but it still happens frequently.

          Scott Hebert added a comment - We are also seeing this more and more. I also tried with a snapshot of durable-task to make use of https://github.com/jenkinsci/durable-task-plugin/pull/46 , but it still happens frequently.

          Martin Sander added a comment - scoheb : https://github.com/jenkinsci/durable-task-plugin/pull/46 won't help here, that is another timeout. The timeout that is (probably) causing this problem can be found here: https://github.com/jenkinsci/durable-task-plugin/blob/d740d4624ad81f2bf75cdf4351fe66b4378bb76c/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L57 https://github.com/jenkinsci/durable-task-plugin/blob/d740d4624ad81f2bf75cdf4351fe66b4378bb76c/src/main/java/org/jenkinsci/plugins/durabletask/BourneShellScript.java#L207-L209

          Scott Hebert added a comment -

          Thanks 0x89

          We actually switched to using a SNAPSHOT (132c66c3) of the plugin based on master...we were using 1.0.

          We will see how it behaves.

          Scott Hebert added a comment - Thanks 0x89 We actually switched to using a SNAPSHOT (132c66c3) of the plugin based on master...we were using 1.0. We will see how it behaves.

          Scott Hebert added a comment -

          0x89 Still seeing this problem even with master

          seeing this in my logs:

          sh-4.3# "ps" "-o" "pid=" "9999" ^M
          9999^M
          sh-4.3# printf "EXITCODE %3d" $?; exit^M
          EXITCODE 0exit^M
          Sep 28, 2017 3:18:49 PM org.jenkinsci.plugins.durabletask.ProcessLiveness isAlive
          WARNING: org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1@165c05c1; decorates hudson.Launcher$RemoteLauncher@7ad4a600 on hudson.remoting.Channel@37819af9:JNLP4-connect connection from 10.128.2.1/10.128.2.1:40456 does not seem able to determine whether processes are alive or not

          Scott Hebert added a comment - 0x89 Still seeing this problem even with master seeing this in my logs: sh-4.3# "ps" "-o" "pid=" "9999" ^M 9999^M sh-4.3# printf "EXITCODE %3d" $?; exit^M EXITCODE 0exit^M Sep 28, 2017 3:18:49 PM org.jenkinsci.plugins.durabletask.ProcessLiveness isAlive WARNING: org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1@165c05c1; decorates hudson.Launcher$RemoteLauncher@7ad4a600 on hudson.remoting.Channel@37819af9:JNLP4-connect connection from 10.128.2.1/10.128.2.1:40456 does not seem able to determine whether processes are alive or not

          Scott Hebert added a comment -

          also this:

          WARNING: Error getting exit code
          java.lang.InterruptedException
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
          at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
          at hudson.Proc.joinWithTimeout(Proc.java:170)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:310)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:279)
          at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)

          Scott Hebert added a comment - also this: WARNING: Error getting exit code java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72) at hudson.Proc.joinWithTimeout(Proc.java:170) at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89) at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:310) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:279) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

          Scott Hebert added a comment -

          I am now trying -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120

          Scott Hebert added a comment - I am now trying -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120

          Scott Hebert added a comment -

          0x89

          so it looks like I may have found something...

          One theory I have, has to do with the logic for detecting if a process is still running...the plugin executes a "ps" inside the container while the main script is running...the result of the ps call is harvested from the output of the call and used to verify that the main script is alive...BUT it also exports all the env vars BEFORE running the "ps" command ... and some env vars may not be correctly escaped therefore causing some stream corruption between output and error and the EXITCODE is not harvested and thus it thinks the process is not running hence the "-1"

          I found these traces in my logs:

          WARNING: Unable to find "EXITCODE" in a valid identifier
          WARNING: Unable to find "EXITCODE" in r kill -l [sigspec]

          I took the combination of https://github.com/jenkinsci/kubernetes-plugin/pull/232 and https://github.com/jenkinsci/kubernetes-plugin/pull/218 to produce a new SNAPSHOT.

          Have not had a "-1" since.

          Scott Hebert added a comment - 0x89 so it looks like I may have found something... One theory I have, has to do with the logic for detecting if a process is still running...the plugin executes a "ps" inside the container while the main script is running...the result of the ps call is harvested from the output of the call and used to verify that the main script is alive...BUT it also exports all the env vars BEFORE running the "ps" command ... and some env vars may not be correctly escaped therefore causing some stream corruption between output and error and the EXITCODE is not harvested and thus it thinks the process is not running hence the "-1" I found these traces in my logs: WARNING: Unable to find "EXITCODE" in a valid identifier WARNING: Unable to find "EXITCODE" in r kill -l [sigspec] I took the combination of https://github.com/jenkinsci/kubernetes-plugin/pull/232 and https://github.com/jenkinsci/kubernetes-plugin/pull/218 to produce a new SNAPSHOT. Have not had a "-1" since.

          Martin Sander added a comment -

          scoheb: good catch!

          I will try to get the two pull requests merged quickly.

          Martin Sander added a comment - scoheb : good catch! I will try to get the two pull requests merged quickly.

          Scott Hebert added a comment -

          0x89

          Still getting this from time to time:

          WARNING: Error getting exit code
          java.lang.InterruptedException
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
          at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
          at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
          at hudson.Proc.joinWithTimeout(Proc.java:170)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
          at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
          at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:322)
          at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)

          This one has nothing to do with the EXITCODE and invalid env vars...

          Scott Hebert added a comment - 0x89 Still getting this from time to time: WARNING: Error getting exit code java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72) at hudson.Proc.joinWithTimeout(Proc.java:170) at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89) at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73) at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:322) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) This one has nothing to do with the EXITCODE and invalid env vars...

          Jesse Glick added a comment -

          Possibly solved by JENKINS-47791.

          Jesse Glick added a comment - Possibly solved by  JENKINS-47791 .

            0x89 Martin Sander
            0x89 Martin Sander
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: