Resolution: Fixed
kubernetes plugin v0.12 or current master
Jenkins 2.62
container step running in a _debian_ container
Powered by SuggestiMate
podTemplate(name: "mypod", label: "label", containers: [ containerTemplate(name: 'debian', image: 'debian', ttyEnabled: true, command: 'cat', ) ]) { node("label") { container('debian') { sh 'for i in $(seq 1 1000); do echo $i; sleep 0.3; done' } } }
leads to
[Pipeline] podTemplate [Pipeline] { [Pipeline] node Running on horst-nwmsn-32h5h in /home/jenkins/workspace/full [Pipeline] { [Pipeline] container [Pipeline] { [Pipeline] sh [full] Running shell script + seq 1 1000 + echo 1 1 + sleep 0.3 + echo 2 2 + sleep 0.3 [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: script returned exit code -1 Finished: FAILURE
Sometimes it fails a bit faster.
Might be related to the script being started with "nohup" now.
[JENKINS-46651] container step "script returned exit code -1"
Seems that the debian:latest container does not have ps by default, and durable-task uses it to check for the process.
Should we document this?
Btw. workaround is to derive your own image from "debian" and install the procps package in it.
I have similar problem with exit -1 error but i do have procps package installed and its happens only with 10+ concurrent build of same project.
kubernetes plugin v1.0
Jenkins 2.37
Here is my pipeline:
tag = env.BUILD_TAG podTemplate(label: tag, volumes: [ hostPathVolume(mountPath: '/var/lib/docker', hostPath: '/tmp/'+tag) ], containers: [ containerTemplate(name: 'jnlp', image: 'localreg:5000/jenkinsci/jnlp-slave', args: '${computer.jnlpmac} ${computer.name}'), containerTemplate(name: 'java', image: 'localreg:5000/base/java', ttyEnabled: true, command: 'cat'), containerTemplate(name: 'dind', image: 'localreg:5000/base-builders/dind:1.0.0', ttyEnabled: true, privileged: true, alwaysPullImage: true) ]) { node(tag) { try { container('jnlp') { stage('Preparing') { checkout scm setGitEnvironmentVariables() setSimpleEnvironmentVariables() } } container('java') { stage('Build a Maven project') { sh 'env' } } container('dind') { stage('Build Docker') { sh "docker build . || echo real bad exit code !" } } } catch (error) { throw error } } }
Sometimes i receive this error while 10+ concurrent builds are running:
/home/jenkins/workspace/docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ [Pipeline] sh [docker_builds_master-EHQCE25ZSFSE4QALBNHGET3HINBLMWZKHOXT7335PWTFXXMWNJDQ] Running shell script + docker build . Sending build context to Docker daemon 74.24 kB Step 1 : FROM localreg:5000/base/ubuntu-14.04 latest: Pulling from base/ubuntu-14.04 96c6a1f3c3b0: Pulling fs layer e8c945afff4f: Pulling fs layer b46adc58e13f: Pulling fs layer e8c945afff4f: Verifying Checksum e8c945afff4f: Download complete b46adc58e13f: Download complete 96c6a1f3c3b0: Verifying Checksum 96c6a1f3c3b0: Download complete [Pipeline] } [Pipeline] // stage [Pipeline] } [Pipeline] // container [Pipeline] } [Pipeline] // node [Pipeline] } [Pipeline] // podTemplate [Pipeline] End of Pipeline ERROR: script returned exit code -1 Finished: FAILURE
Any suggestions what can it be ?
omegam: Can you try setting org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT to something higher than the default 15 seconds, and see if that helps?
0x89: Thank you for suggestion but it's doesn't help 8(
I tried to set parameter to 30,60,120 with same result.
Here is my jenkins master launcher string:
/usr/bin/java -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120 -Djava.awt.headless=true -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85 -jar /usr/share/jenkins/jenkins.war --webroot=/var/cache/jenkins/war --httpPort=8080
I'm having similar issues. I haven't managed to figure out what is the exact cause, but for example running "bundler install" in Ruby container triggers it fairly often while installing gems that have native extensions like json or ffi. Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container.
Somewhat common pattern seems to be that it happens in cases where there might quite a few processes being launched in the container.
Good idea. Could you try to verify that assumption with a minimal pipeline?
We are also seeing this more and more. I also tried with a snapshot of durable-task to make use of https://github.com/jenkinsci/durable-task-plugin/pull/46, but it still happens frequently.
scoheb: https://github.com/jenkinsci/durable-task-plugin/pull/46 won't help here, that is another timeout.
The timeout that is (probably) causing this problem can be found here:
Thanks 0x89
We actually switched to using a SNAPSHOT (132c66c3) of the plugin based on master...we were using 1.0.
We will see how it behaves.
0x89 Still seeing this problem even with master
seeing this in my logs:
sh-4.3# "ps" "-o" "pid=" "9999" ^M
sh-4.3# printf "EXITCODE %3d" $?; exit^M
Sep 28, 2017 3:18:49 PM org.jenkinsci.plugins.durabletask.ProcessLiveness isAlive
WARNING: org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1@165c05c1; decorates hudson.Launcher$RemoteLauncher@7ad4a600 on hudson.remoting.Channel@37819af9:JNLP4-connect connection from does not seem able to determine whether processes are alive or not
also this:
WARNING: Error getting exit code
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
at hudson.Proc.joinWithTimeout(Proc.java:170)
at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:310)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:279)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I am now trying -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_FAILURE_TIMEOUT=120
so it looks like I may have found something...
One theory I have, has to do with the logic for detecting if a process is still running...the plugin executes a "ps" inside the container while the main script is running...the result of the ps call is harvested from the output of the call and used to verify that the main script is alive...BUT it also exports all the env vars BEFORE running the "ps" command ... and some env vars may not be correctly escaped therefore causing some stream corruption between output and error and the EXITCODE is not harvested and thus it thinks the process is not running hence the "-1"
I found these traces in my logs:
WARNING: Unable to find "EXITCODE" in a valid identifier
WARNING: Unable to find "EXITCODE" in r kill -l [sigspec]
I took the combination of https://github.com/jenkinsci/kubernetes-plugin/pull/232 and https://github.com/jenkinsci/kubernetes-plugin/pull/218 to produce a new SNAPSHOT.
Have not had a "-1" since.
scoheb: good catch!
I will try to get the two pull requests merged quickly.
Still getting this from time to time:
WARNING: Error getting exit code
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:72)
at hudson.Proc.joinWithTimeout(Proc.java:170)
at org.jenkinsci.plugins.durabletask.ProcessLiveness._isAlive(ProcessLiveness.java:89)
at org.jenkinsci.plugins.durabletask.ProcessLiveness.isAlive(ProcessLiveness.java:73)
at org.jenkinsci.plugins.durabletask.BourneShellScript$ShellController.exitStatus(BourneShellScript.java:198)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.check(DurableTaskStep.java:322)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.run(DurableTaskStep.java:289)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
This one has nothing to do with the EXITCODE and invalid env vars...
Example works with busybox container.