Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-49278

cat command in docker agents not detected correctly

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • docker-workflow-plugin
    • None
    • Jenkins 2.104 on Docker 17.10.0-ce on CentOS 7.4.1708 (Kernel 3.10.0-693.2.2.el7.x86_64)

      When using a declarative Jenkins pipeline with a stage that uses a Docker agent, I get a confusing error message in the Jenkins log:

      $ docker top 08e1c013e07083492ad0f03285f1a7d30063fb15e0cf39be7b55af6d1a03c829
      ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument. See https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#entrypoint for entrypoint best practices.
      

      The build continues normally and the cat command is actually running inside the container, so everything is fine except that the error message occurs although it shouldn't.

      Comparing the code in listProcess in https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/client/DockerClient.java with the output of docker top shows the likely cause of that error:

      docker top prints the following fields

      UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
      build               19799               19784               0                   22:23               pts/0               00:00:00            cat
      

      However, the Java client assumes that only PID, USER, TIME and COMMAND is printed. I suggest that the process list is determined by using an explicit format specifier like

      docker container top ${CONTAINER_ID} -eo pid,comm
      

          [JENKINS-49278] cat command in docker agents not detected correctly

          Ad added a comment -

          jglick It is very undestandable that `withDockerContainer` is expected for tool-only images, and as a matter of fact that is exactly what our organization uses it for.

          Yet, such tool-only container can very easily have an ` ENTRYPOINT` which needs to completely run before the actual sh steps are executed. In our case, we have a Docker container that we use as a Conan (https://conan.io/) tool. Yet, this container has a entrypoint which runs the usual `docker-entrypoint.sh` that is notably responsible for setting up the different remote repositories and log-in to use them.

          Yet, we observe spurious failure because of the race condition: sometimes the entrypoint had time to complete before the sh steps are executed, sometimes it did not.

          The problem is clearly addressed by the Docker guidelines: `ENTRYPOINT` is for the container steps that should not be replaced by the client, and `CMD` is exactly for client override. It does not seem reasonable for Docker to execute commands while racing with `ENTRYPOINT`, and it is becoming such a major problem for us that we will have to walk away from this solution if the situation remains as it currently is.

          Ad added a comment - jglick It is very undestandable that `withDockerContainer` is expected for tool-only images, and as a matter of fact that is exactly what our organization uses it for. Yet, such tool-only container can very easily have an ` ENTRYPOINT` which needs to completely run before the actual sh steps are executed. In our case, we have a Docker container that we use as a Conan ( https://conan.io/ ) tool. Yet, this container has a entrypoint which runs the usual `docker-entrypoint.sh` that is notably responsible for setting up the different remote repositories and log-in to use them. Yet, we observe spurious failure because of the race condition: sometimes the entrypoint had time to complete before the sh steps are executed, sometimes it did not. The problem is clearly addressed by the Docker guidelines: `ENTRYPOINT` is for the container steps that should not be replaced by the client, and `CMD` is exactly for client override. It does not seem reasonable for Docker to execute commands while racing with `ENTRYPOINT`, and it is becoming such a major problem for us that we will have to walk away from this solution if the situation remains as it currently is.

          Jesse Glick added a comment -

          For specialized use cases like this you should not use withDockerContainer. Just run whatever docker commands you need directly from a sh (or indirectly via some script).

          Jesse Glick added a comment - For specialized use cases like this you should not use withDockerContainer . Just run whatever docker commands you need directly from a sh (or indirectly via some script).

          Ad added a comment - - edited

          Well, we are using the Docker agent syntax:

          agent {
           docker {
             image 'ag/ubuntu-conan'
             args '''-v $DOCKERCONFIG_FOLDER/ag/ubuntu-conan.env:/dockerconfig.env
           }
          }
          
          stage('Use the tool') {
            steps {
              sh 'conan install whatever'
            }
          }

          Is not that the exact intended "tool-only" use case you were mentioning above? If not, what is the expected use-case for these kind of agents then?

          Ad added a comment - - edited Well, we are using the Docker agent syntax: agent { docker { image 'ag/ubuntu-conan' args '''-v $DOCKERCONFIG_FOLDER/ag/ubuntu-conan.env:/dockerconfig.env } } stage( 'Use the tool' ) { steps { sh 'conan install whatever' } } Is not that the exact intended "tool-only" use case you were mentioning above? If not, what is the expected use-case for these kind of agents then?

          Jesse Glick added a comment -

          agent docker is just sugar for withDockerContainer. The expected use case is anything that happens to work the first time you try it. There is really no further guarantee than that.

          Jesse Glick added a comment - agent docker is just sugar for withDockerContainer . The expected use case is anything that happens to work the first time you try it. There is really no further guarantee than that.

          Ad added a comment -

          For specialized use cases like this you should not use withDockerContainer. Just run whatever docker commands you need directly from a sh (or indirectly via some script).

          We are revisiting our use of docker agents in our CI pipeline. jglick we are considering following your advice above, thus removing the docker agents and instead run `docker` commands in `sh` steps directly on the master.

          Yet, our current setup is that we have a stage with many steps running on the agent (having the different steps showing nicely in Jenkins UI). How could we get the same result by executing the docker command directly? (i.e. executing discrete Jenkins `steps` in the same docker container, without restarting the container since it would loose its state).

          Ad added a comment - For specialized use cases like this you should not use  withDockerContainer . Just run whatever  docker  commands you need directly from a  sh  (or indirectly via some script). We are revisiting our use of docker agents in our CI pipeline. jglick  we are considering following your advice above, thus removing the docker agents and instead run `docker` commands in `sh` steps directly on the master. Yet, our current setup is that we have a stage with many steps running on the agent (having the different steps showing nicely in Jenkins UI). How could we get the same result by executing the docker command directly? (i.e. executing discrete Jenkins `steps` in the same docker container, without restarting the container since it would loose its state).

          Jesse Glick added a comment -

          adnn that is indeed a missing feature in Pipeline. I have hijacked JENKINS-44847 to discuss this.

          Jesse Glick added a comment - adnn that is indeed a missing feature in Pipeline. I have hijacked JENKINS-44847 to discuss this.

          Gan Ainm added a comment - - edited

          I encountered a similar problem with the cdrx/pyinstaller-linux:python2 container (described in WEBSITE-726, see corresponding logfile).

          However, in my case the container process was not detected at all by docker top.

          Gan Ainm added a comment - - edited I encountered a similar problem with the  cdrx/pyinstaller-linux:python2 container (described in WEBSITE-726, see corresponding logfile ). However, in my case the container process was not detected at all by docker top .

          Mark Waite added a comment -

          I fixed the Python tutorial error reported by ganainm by using `sh 'docker run ...'` to replace the docker image reference from the Declarative Pipeline. See the pull request for more details.

          Mark Waite added a comment - I fixed the Python tutorial error reported by ganainm by using `sh 'docker run ...'` to replace the docker image reference from the Declarative Pipeline. See the pull request for more details.

          Jorge Barnaby added a comment -

          In our usecase, we use a custom Docker image as agent that runs telegraf as a service in the background that sends metrics to InfluxDB. The benefit of this approach over having telegraf in the host running the Docker agent is that I can add default tags from the ENV variables to the metrics, like JOB ID, BRANCH, repo, etc.

          I struggled for a bit to make it work because of the whole ENTRYPOINT situation described here, but found a way around.

          I'm using S6 Overlay (https://github.com/just-containers/s6-overlay) so I can have a proper process manager with services running on the background. Our agents should use a Jenkins dedicated user (say UID 2000), but S6 doesn't work if you start the container with a non root user, so I adapted the ENTRYPOINT on the image to:

          ENTRYPOINT ["/init", "/bin/execlineb", "-s0", "-c", "export HOME /home/jenkins s6-setuidgid jenkins $@"]
          

          And the args in the Jenkinsfile look like: `args -u 0:0 -v /home/jenkins:/home/jenkins`, this way, the container actually starts with root, but the entry point makes it run as jenkins user. Still iron a few things, but hopefully this helps other people. Might write a blog post about it.

          Jorge Barnaby added a comment - In our usecase, we use a custom Docker image as agent that runs telegraf as a service in the background that sends metrics to InfluxDB. The benefit of this approach over having telegraf in the host running the Docker agent is that I can add default tags from the ENV variables to the metrics, like JOB ID, BRANCH, repo, etc. I struggled for a bit to make it work because of the whole ENTRYPOINT situation described here, but found a way around. I'm using S6 Overlay ( https://github.com/just-containers/s6-overlay ) so I can have a proper process manager with services running on the background. Our agents should use a Jenkins dedicated user (say UID 2000), but S6 doesn't work if you start the container with a non root user, so I adapted the ENTRYPOINT on the image to: ENTRYPOINT [ "/init" , "/bin/execlineb" , "-s0" , "-c" , "export HOME /home/jenkins s6-setuidgid jenkins $@" ] And the args in the Jenkinsfile look like: `args -u 0:0 -v /home/jenkins:/home/jenkins`, this way, the container actually starts with root, but the entry point makes it run as jenkins user. Still iron a few things, but hopefully this helps other people. Might write a blog post about it.

          Felipe Santos added a comment -

          yorch I also have this kind of setup (s6-overlay with services) but I solved it properly. Take a look: https://github.com/felipecrs/jenkins-agent-dind/pull/11

          Felipe Santos added a comment - yorch I also have this kind of setup (s6-overlay with services) but I solved it properly. Take a look:  https://github.com/felipecrs/jenkins-agent-dind/pull/11

            Unassigned Unassigned
            hendrikhalkow Hendrik Halkow
            Votes:
            4 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated: