-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins 2.104 on Docker 17.10.0-ce on CentOS 7.4.1708 (Kernel 3.10.0-693.2.2.el7.x86_64)
-
Powered by SuggestiMate
When using a declarative Jenkins pipeline with a stage that uses a Docker agent, I get a confusing error message in the Jenkins log:
$ docker top 08e1c013e07083492ad0f03285f1a7d30063fb15e0cf39be7b55af6d1a03c829 ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument. See https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#entrypoint for entrypoint best practices.
The build continues normally and the cat command is actually running inside the container, so everything is fine except that the error message occurs although it shouldn't.
Comparing the code in listProcess in https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/client/DockerClient.java with the output of docker top shows the likely cause of that error:
docker top prints the following fields
UID PID PPID C STIME TTY TIME CMD build 19799 19784 0 22:23 pts/0 00:00:00 cat
However, the Java client assumes that only PID, USER, TIME and COMMAND is printed. I suggest that the process list is determined by using an explicit format specifier like
docker container top ${CONTAINER_ID} -eo pid,comm
- duplicates
-
JENKINS-39748 Since 37987 images that use ENTRYPOINT for a reason cannot be used in testing
-
- In Review
-
- is blocked by
-
JENKINS-49385 containers exit early in docker-workflow 1.15
-
- Resolved
-
- is duplicated by
-
JENKINS-49446 Regression with 1.15 and WithContainerStep
-
- Closed
-
- relates to
-
JENKINS-49385 containers exit early in docker-workflow 1.15
-
- Resolved
-
- links to
- mentioned in
-
Page Loading...
[JENKINS-49278] cat command in docker agents not detected correctly
I can confirm that this breaks builds for us, too. Any known workaround?
I believe the fix went in after 1.15 was released:
https://github.com/jenkinsci/docker-workflow-plugin/commit/f53608309af64f70471683377c248789c64f400c
I guess 1.16 will be the "Fixed Version" of this issue.
Hey ndeloof long time no see
This one also hit us over the weekend when we upgraded to 1.15, can you confirm 1.16 will fix it and expected release date?
Let's try to resolve the confusion a little bit: I opened this ticket because I saw the error message mentioned above. Then I fixed it myself, but the pull request was merged after the release of 1.15. So this fix will be part of the next release. I run a custom-built 1.16 snapshot (from master) in my Jenkins now and everything works fine.
However, I didn't see any breaking builds – this was just about the error message. When there is something breaking and you think that is might be related to this, please provide some logs and have a look at the other changes that were introduced with 1.15.
I can confirm that I have the exact same error message as in the issue desc going from 1.14 to 1.15, so I downgraded. I noticed a change in 'docker run' no longer issuing '--entrypoint'
hendrikhalkow i have posted logs similar to the one on the ticket initially as well as https://github.com/jenkinsci/docker-workflow-plugin/pull/116#issuecomment-362396110
Whats posted there is just the exact error message I posted here. However, this doesn't break anything. It is exactly as docwhat already said: This error message doesn't break your build, it is just confusing and annoying. And I can confirm that the bugfix I implemented will make the error message go away.
docker host os: centos 7, docker container os: ubuntu 16
my logs:
[...] ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** corpjenkins/java cat $ docker top fdbc72a5e9492e32640695c9e0f4320331aa8707860a015079fa86f330789172 ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument. See https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#entrypoint for entrypoint best practices.
[...]
java.io.IOException: failed to run ps at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Decorator$1.kill(WithContainerStep.java:292) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.stop(FileMonitoringTask.java:183) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.stop(DurableTaskStep.java:253) at org.jenkinsci.plugins.workflow.cps.CpsThread.stop(CpsThread.java:296) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1083) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$6.onSuccess(CpsFlowExecution.java:1072) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$4$1.run(CpsFlowExecution.java:861) at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:35) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112) at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finished: ABORTED
I just started experiencing this issue. I'm confused why it just began happening. One of my colleagues may have updated Jenkins or plugins.. or the pipeline plugin updated itself? Anyway, this issue does break our builds that use the docker.inside functionality. Is there a workaround?
$ docker top 70d7992c206f2313ad58b53d6ff67f2b4b5d3535537320612c7244649a966d08 ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument. See https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#entrypoint for entrypoint best practices. [Pipeline] { [Pipeline] sh [fdp-system-tests] Running shell script [Pipeline] } $ docker stop --time=1 70d7992c206f2313ad58b53d6ff67f2b4b5d3535537320612c7244649a966d08 $ docker rm -f 70d7992c206f2313ad58b53d6ff67f2b4b5d3535537320612c7244649a966d08 [Pipeline] // withDockerContainer [Pipeline] } [Pipeline] // ansiColor [Pipeline] } [Pipeline] // withCredentials [Pipeline] } [Pipeline] // node [Pipeline] End of Pipeline ERROR: script returned exit code -2 Finished: FAILURE
OK, I see. jayv's first part of the log is about this issue here. This will go away. The second part of the log and seglo is something different. Looks like <https://issues.jenkins-ci.org/browse/JENKINS-41316>. Please continue the discussion there.
hendrikhalkow the ticket you reported links back to this one. Is there a way to downgrade the docker pipeline plugin to workaround this issue in the meantime? What are my options?
jayv seglo marbon myoung34 marcphilipp To help you with your issue: You probably use images with custom entrypoints. Remove your custom entry point and your builds should be fine. Then wait for JENKINS-41316 to be fixed.
But this is all out of scope of this issue here.
Edit: The reason why JENKINS-41316 is linked here is this comment where this issue here is described, too: https://issues.jenkins-ci.org/browse/JENKINS-41316?focusedCommentId=327178&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-327178
When Upgrading from docker-workflow-plugin 1.14 to 1.15 we do not only face the Error Message but also stopping long running Docker Containers without any reason.
E.G. we have a Gradle Docker-Image with a "gradle" Entrypoint within the Docker-Image itself and a long running Build Process. Since the Upgrade the Docker Container gets stopped after approx. 1 Minute.
Downgraded to 1.14 and everything works fine again.
The Log Difference is as follows:
docker-worlflow-plugin 1.14: ---------------------------------- docker run -t -d -u 1000:1000 --name myContainer -w /var/jenkins_home/workspace/testjob --volumes-from 89f6d948fe0f285948be4705a73bbd1996db6e19ec88a4710761f0aa598b837b -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** --entrypoint cat mynexus:8083/mygradle:3.5 docker-worlflow-plugin 1.15: ---------------------------------- docker run -t -d -u 1000:1000 --name myContainer -w /var/jenkins_home/workspace/testjob --volumes-from 846c20a2160d3460a6a864e0e626ed3433b157f4e36d9c4747dc7c30ac331b63 -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** mynexus:8083/mygradle:3.5 cat docker top 8119cb13cc27981b4e285704288342131bd855e2c9cf3e0d24e7af717b94f16d ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument. See https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#entrypoint for entrypoint best practices.
looks like this issue need to be reopened?
I am facing same error message, and my builds intermittent die for no reason
I am using sles12sp2 images + docker 17.4
I suspect JENKINS-49385 is a duplicate issue, but it is not resolved with the 15.1 update. The cat command still instantly fails with an argument delimited entrypoint in the Dockerfile (see comments therein).
Note that with an argument delimited entrypoint, the command given (either via CMD or docker run command) will be appended after the entrypoint.
I was about to type a comment, and weird things happened with focus and Jenkins shortcuts; I ended up (accidentally) assigning this to myself - apologies for that. I've set to "Unassigned".
We're (still) seeing failures after updating to 1.15.1:
$ docker run -t -d -u 497:495 -w /tmp/jenkins-prime/agent/workspace/er-G2PEKDXUWD77U -v /tmp/jenkins-prime/agent/workspace/er-G2PEKDXUWD77U:/tmp/jenkins-prime/agent/workspace/er-G2PEKDXUWD77U:rw,z -v /tmp/jenkins-prime/agent/workspace/er-G2PEKDXUWD77U@tmp:/tmp/jenkins-prime/agent/workspace/er-G2PEKDXUWD77U@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** docker.endpoint/servicectl:9.6.0 cat $ docker top b6170c275fb28d292a179ecec4d40c2452091b44074360dec0280ba08f65c3dc -eo pid,comm ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements). Alternatively you can force image entrypoint to be disabled by adding option `--entrypoint=''`.
I'm also still seeing this issue after upgrading to 1.15.1:
$ docker top d720fd88f4be924edf76e1d55bc058d82dce45f8cb4a34eb2db492b380fe1ab5 -eo pid,comm ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements). Alternatively you can force image entrypoint to be disabled by adding option `–entrypoint=''`.
Here's the result of `docker top d720fd88f4be924edf76e1d55bc058d82dce45f8cb4a34eb2db492b380fe1ab5 -eo pid,comm`
$ docker top d720fd88f4be924edf76e1d55bc058d82dce45f8cb4a34eb2db492b380fe1ab5 -eo pid,comm PID COMMAND 123094 init 123133 cat 123212 sh 123218 sh 123219 script.sh 123221 test 123252 npm 123431 sleep
`cat` is clearly in the list.
I also tried removing the `–init` option passed to the container in case `cat` needed to be first in the list, it didn't resolve the issue.
Please let me know if I can offer any other diagnostic info.
ndeloof: After upgrading to 1.15.1 I get this error in declarative pipeline:
java.io.IOException: Failed to run top '788913bdd127e9689e54cf2c699b0d99cc17e7469575bd4d541b4c61394c5ce4'. Error: Error response from daemon: Unexpected pid '111969kworker/u256:2': strconv.Atoi: parsing "111969kworker/u256:2": invalid syntax at org.jenkinsci.plugins.docker.workflow.client.DockerClient.listProcess(DockerClient.java:140) at org.jenkinsci.plugins.docker.workflow.WithContainerStep$Execution.start(WithContainerStep.java:185)
This happens on Docker for Windows 17.12.0-ce (Git commit c97c6d6).
thxmasj sounds like a docker bug, maybe watch https://github.com/moby/moby/issues/34282
ndeloof: Yep, I upgraded Docker for Windows to 18.02, and the problem is gone. Thanks!
18.02?
https://download.docker.com/linux/static/stable/x86_64/ has:
Index of /linux/static/stable/x86_64/
../
docker-17.03.0-ce.tgz 2017-03-01 11:11 27M
docker-17.03.1-ce.tgz 2017-03-28 04:46 27M
docker-17.03.2-ce.tgz 2017-06-28 03:35 27M
docker-17.06.0-ce.tgz 2017-06-28 05:17 29M
docker-17.06.1-ce.tgz 2017-08-18 02:35 29M
docker-17.06.2-ce.tgz 2017-09-05 10:39 29M
docker-17.09.0-ce.tgz 2017-09-27 01:47 29M
docker-17.09.1-ce.tgz 2017-12-08 12:22 29M
docker-17.12.0-ce.tgz 2017-12-27 09:52 33M
chizcw: It's from the edge channel (https://download.docker.com/linux/static/edge/x86_64/).
Does this mean the docker-workflow-plugin has been updated to (silently?) require non-Stable versions of Docker?
chizcw: The problem seems to be isolated to Docker for Windows. I don't have the problem on 17.12 on Ubuntu.
chizcw no, this just means docker runtime has some issues, and some are fixed in recent releases. docker-workflow-plugin uses docker CLI and expect it to run by the reference documentation.
For the record; I also get the invalid syntax error on Docker for Windows 18.02 now. It is obviously related to the large pids bug.
Release notes ought to mention that 1.15.1 works for docker.image('maven:3.5.0').inside but not docker.image('maven:3.3.9').inside, for example. I presume that is a result of this commit by csanchez.
I'm still facing the same issue with Jenkins version 2.161,
I have search many issues opened and closed without any proper resolution to this problem, please provide the suitable resolution
We've frozen our version of this plugin at 1.14 because of the behavioural change that broke our pipelines.
(by frozen I mean, we just have to remember to deselect it when updating all the other plugins)
Hello chizcw,
can you share the complete name of the plugin, Sorry I'm new to Jenkins and docker and just performing some poc for learning perspectives.
Thanks chizcw,
I had just downgraded the same plugin before your reply and my builds worked fine.
I suspect that this is the same as JENKINS-54389, and also JENKINS-39748.
Any longer-running entrypoint will cause the job to fail because the cat command doesn't run until the end of the entrypoint.
Additionally, requiring an image to run cat could break the image. For example, if I wish to start an image whose CMD is apache-foreground then I expect apache to be running, not cat. I'm not going to serve any web responses with cat.
I think the docker-workflow plugin either needs to:
- respect HEALTHCHECKs
- allow for either no CMD override, or a specified CMD override to be provided when running the image
- allow for a wait period before checking that the expected command is running
Basically, don't run cat by default, instead respect the healthcheck.
Any further behavioral changes in this area are likely to be incompatible and unlikely to be entertained. withDockerContainer was designed for “tool-only” images with no particular designated entrypoint (typically falling back to bash or something), since its entire purpose is to provide a passive container—basically a predefined filesystem layout—in which you can run sh steps. Images which expect to be running a specific process like Apache are not supported. You can use withRun, or (better) just run whatever docker commands you like directly.
What about providing a additional switch like "useImageCMD" (default false) which ensures that the CMD from the image is used and is not overwritten with "cat"?
jglick It is very undestandable that `withDockerContainer` is expected for tool-only images, and as a matter of fact that is exactly what our organization uses it for.
Yet, such tool-only container can very easily have an ` ENTRYPOINT` which needs to completely run before the actual sh steps are executed. In our case, we have a Docker container that we use as a Conan (https://conan.io/) tool. Yet, this container has a entrypoint which runs the usual `docker-entrypoint.sh` that is notably responsible for setting up the different remote repositories and log-in to use them.
Yet, we observe spurious failure because of the race condition: sometimes the entrypoint had time to complete before the sh steps are executed, sometimes it did not.
The problem is clearly addressed by the Docker guidelines: `ENTRYPOINT` is for the container steps that should not be replaced by the client, and `CMD` is exactly for client override. It does not seem reasonable for Docker to execute commands while racing with `ENTRYPOINT`, and it is becoming such a major problem for us that we will have to walk away from this solution if the situation remains as it currently is.
For specialized use cases like this you should not use withDockerContainer. Just run whatever docker commands you need directly from a sh (or indirectly via some script).
Well, we are using the Docker agent syntax:
agent { docker { image 'ag/ubuntu-conan' args '''-v $DOCKERCONFIG_FOLDER/ag/ubuntu-conan.env:/dockerconfig.env } } stage('Use the tool') { steps { sh 'conan install whatever' } }
Is not that the exact intended "tool-only" use case you were mentioning above? If not, what is the expected use-case for these kind of agents then?
agent docker is just sugar for withDockerContainer. The expected use case is anything that happens to work the first time you try it. There is really no further guarantee than that.
For specialized use cases like this you should not use withDockerContainer. Just run whatever docker commands you need directly from a sh (or indirectly via some script).
We are revisiting our use of docker agents in our CI pipeline. jglick we are considering following your advice above, thus removing the docker agents and instead run `docker` commands in `sh` steps directly on the master.
Yet, our current setup is that we have a stage with many steps running on the agent (having the different steps showing nicely in Jenkins UI). How could we get the same result by executing the docker command directly? (i.e. executing discrete Jenkins `steps` in the same docker container, without restarting the container since it would loose its state).
adnn that is indeed a missing feature in Pipeline. I have hijacked JENKINS-44847 to discuss this.
I encountered a similar problem with the cdrx/pyinstaller-linux:python2 container (described in WEBSITE-726, see corresponding logfile).
However, in my case the container process was not detected at all by docker top.
I fixed the Python tutorial error reported by ganainm by using `sh 'docker run ...'` to replace the docker image reference from the Declarative Pipeline. See the pull request for more details.
In our usecase, we use a custom Docker image as agent that runs telegraf as a service in the background that sends metrics to InfluxDB. The benefit of this approach over having telegraf in the host running the Docker agent is that I can add default tags from the ENV variables to the metrics, like JOB ID, BRANCH, repo, etc.
I struggled for a bit to make it work because of the whole ENTRYPOINT situation described here, but found a way around.
I'm using S6 Overlay (https://github.com/just-containers/s6-overlay) so I can have a proper process manager with services running on the background. Our agents should use a Jenkins dedicated user (say UID 2000), but S6 doesn't work if you start the container with a non root user, so I adapted the ENTRYPOINT on the image to:
ENTRYPOINT ["/init", "/bin/execlineb", "-s0", "-c", "export HOME /home/jenkins s6-setuidgid jenkins $@"]
And the args in the Jenkinsfile look like: `args -u 0:0 -v /home/jenkins:/home/jenkins`, this way, the container actually starts with root, but the entry point makes it run as jenkins user. Still iron a few things, but hopefully this helps other people. Might write a blog post about it.
yorch I also have this kind of setup (s6-overlay with services) but I solved it properly. Take a look: https://github.com/felipecrs/jenkins-agent-dind/pull/11
myoung34
I don't think this is breaking anything. At least, I can see the error everywhere but it isn't breaking anything.