Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38438

image.inside { ... } does not create full configured user account

      The inside command shares the underlying filesystem with wokspace running docker run with --user option so the started processes inside are likely owned by the same uid/gid as their working directory. This causes problems to system tools that do not expect to run with uid/gid that does not exists in the system and its environment is no fully configured. This causes hard to debug failures as few tool expect that current user can not be looked up in /etc/passwd or the process do not have HOME variable set. Reportedly, it is not even POSIX compliant.

      It does not take long to find a container/tool combination that will blow up with artificial uid/gid:

      $ docker run -i fedora dnf update
        ... # OK
      $ docker run -u 4242:4242 -i fedora dnf update
      Traceback (most recent call last):
        File "/usr/bin/dnf", line 36, in <module>
          main.user_main(sys.argv[1:], exit_code=True)
        File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 185, in user_main
          errcode = main(args)
        File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 84, in main
          return _main(base, args)
        File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 115, in _main
          cli.configure(map(ucd, args))
        File "/usr/lib/python2.7/site-packages/dnf/cli/cli.py", line 932, in configure
          self.read_conf_file(opts.conffile, root, releasever, overrides)
        File "/usr/lib/python2.7/site-packages/dnf/cli/cli.py", line 1043, in read_conf_file
          conf.prepend_installroot(opt)
        File "/usr/lib/python2.7/site-packages/dnf/yum/config.py", line 718, in prepend_installroot
          path = path.lstrip('/')
      AttributeError: 'NoneType' object has no attribute 'lstrip'
      

      The message does not indicate the real cause at all.

          [JENKINS-38438] image.inside { ... } does not create full configured user account

          Nicolas De Loof added a comment - see https://github.com/Dockins/docker-slaves-plugin/blob/master/src/main/java/it/dockins/dockerslaves/drivers/CliDockerDriver.java#L232 as a possible way to support this

          Jesse Glick added a comment -

          Cf. PR 57, which would not help in this case based on my comnmand-line testing:

          cid=$(docker run -d fedora sleep infinity); docker exec -u $(id -u):$(id -g) $cid dnf update; docker rm -f $cid
          

          It is unfortunate that the error is so unhelpful but that seems like a bug in the Fedora tool that could be encountered by anyone running without sudo.

          Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent.

          In general that will mean that the steps run inside the container should not expect to run as root. But for the intended usage of inside, this is not much of a restriction because it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile, thus cached and able to run as any USER; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run/withRun. Probably this usage distinction needs to be more clearly documented.

          Regarding the value of $HOME, arguably that should be set (via -e) to the workspace, in lieu of anything better.

          Jesse Glick added a comment - Cf. PR 57 , which would not help in this case based on my comnmand-line testing: cid=$(docker run -d fedora sleep infinity); docker exec -u $(id -u):$(id -g) $cid dnf update; docker rm -f $cid It is unfortunate that the error is so unhelpful but that seems like a bug in the Fedora tool that could be encountered by anyone running without sudo . Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent. In general that will mean that the steps run inside the container should not expect to run as root. But for the intended usage of inside , this is not much of a restriction because it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile , thus cached and able to run as any USER ; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run / withRun . Probably this usage distinction needs to be more clearly documented. Regarding the value of $HOME , arguably that should be set (via -e ) to the workspace, in lieu of anything better.

          Oliver Gondža added a comment - - edited

          I used the update as an example of a command that rely on environment - not a wise choice. Certainly something in rpmbuild tooling (cross building native packages sounds like a reasonable use-case for inside) failed badly as it failed to get more information about current user OS knew nothing about.

          I am not saying this is something this plugin should address. Perhaps this is something docker run should care for.

          Oliver Gondža added a comment - - edited I used the update as an example of a command that rely on environment - not a wise choice. Certainly something in rpmbuild tooling (cross building native packages sounds like a reasonable use-case for inside ) failed badly as it failed to get more information about current user OS knew nothing about. I am not saying this is something this plugin should address. Perhaps this is something docker run should care for.

          Jacob Blain Christen added a comment - - edited

          I was running into this problem when using git commands that were trying to connect to remotes via the ssh scheme. This is because ssh seems to not honor the $HOME and must be able to de-reference the current user from /etc/passwd and /etc/group. My work-around was to bind mount /etc/passwd, /etc/group, and /etc/shadow as read-only volumes, e.g.:

              def dockerSocketGID = getOutput("stat -c '%g' /var/run/docker.sock")
              def dockerArgs = "--env HOME=${pwd()}"
                  dockerArgs = "${dockerArgs} --group-add ${dockerSocketGID}"
                  dockerArgs = "${dockerArgs} --volume /etc/group:/etc/group:ro"
                  dockerArgs = "${dockerArgs} --volume /etc/passwd:/etc/passwd:ro"
                  dockerArgs = "${dockerArgs} --volume /etc/shadow:/etc/shadow:ro"
                  dockerArgs = "${dockerArgs} --volume /tmp:/tmp:rw"
                  dockerArgs = "${dockerArgs} --volume /var/lib/jenkins/.ssh:/home/ubuntu/.ssh:ro"
                  dockerArgs = "${dockerArgs} --volume /var/lib/jenkins/tools:/var/lib/jenkins/tools:ro"
                  dockerArgs = "${dockerArgs} --volume /var/run/docker.sock:/var/run/docker.sock:rw"
          
              docker.withRegistry(dockerRegistryUrl, dockerRegistryCredentialsId) {
                  pullDockerImage(dockerImg).inside(dockerArgs) {
                      // do something useful
                  }
              }
          
          def pullDockerImage(imageName) {
              def img = docker.image(imageName)
              /* make sure we have the up-to-date image */
              img.pull()
              /* dance around https://issues.jenkins-ci.org/browse/JENKINS-34276 */
              return docker.image(img.imageName())
          }
          

          P.S. For added complexity with regards to bind mounts, per this article https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/, I run my Jenkins slaves as containers that bind mount their host's /var/run/docker.sock. This means that I am pulling in the host's /etc/passwd, /etc/group, and /etc/shadow which is why this particular bind mount, /var/lib/jenkins/.ssh:/home/ubuntu/.ssh, looks the way it does (i.e. host uid=1000 is ubuntu whereas container uid=1000 is jenkins). Additionally, to cope with ssh-agent unix sockets that are normally created in the project's @tmp directory but have a limit of 108 characters and thus fall back to using /tmp, I had to add the --volume /tmp:/tmp:rw.

          P.P.S All of these bind-mount gymnastics in the workflow/pipeline scripts were prior to the advent of docker.inside automagically augmenting it's docker run command with --volumes-from <slave-container-id>.

          Jacob Blain Christen added a comment - - edited I was running into this problem when using git commands that were trying to connect to remotes via the ssh scheme. This is because ssh seems to not honor the $HOME and must be able to de-reference the current user from /etc/passwd and /etc/group . My work-around was to bind mount /etc/passwd , /etc/group , and /etc/shadow as read-only volumes, e.g.: def dockerSocketGID = getOutput( "stat -c '%g' / var /run/docker.sock" ) def dockerArgs = "--env HOME=${pwd()}" dockerArgs = "${dockerArgs} --group-add ${dockerSocketGID}" dockerArgs = "${dockerArgs} --volume /etc/group:/etc/group:ro" dockerArgs = "${dockerArgs} --volume /etc/passwd:/etc/passwd:ro" dockerArgs = "${dockerArgs} --volume /etc/shadow:/etc/shadow:ro" dockerArgs = "${dockerArgs} --volume /tmp:/tmp:rw" dockerArgs = "${dockerArgs} --volume / var /lib/jenkins/.ssh:/home/ubuntu/.ssh:ro" dockerArgs = "${dockerArgs} --volume / var /lib/jenkins/tools:/ var /lib/jenkins/tools:ro" dockerArgs = "${dockerArgs} --volume / var /run/docker.sock:/ var /run/docker.sock:rw" docker.withRegistry(dockerRegistryUrl, dockerRegistryCredentialsId) { pullDockerImage(dockerImg).inside(dockerArgs) { // do something useful } } def pullDockerImage(imageName) { def img = docker.image(imageName) /* make sure we have the up-to-date image */ img.pull() /* dance around https: //issues.jenkins-ci.org/browse/JENKINS-34276 */ return docker.image(img.imageName()) } — P.S. For added complexity with regards to bind mounts, per this article https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/ , I run my Jenkins slaves as containers that bind mount their host's /var/run/docker.sock . This means that I am pulling in the host's /etc/passwd , /etc/group , and /etc/shadow which is why this particular bind mount, /var/lib/jenkins/.ssh:/home/ubuntu/.ssh , looks the way it does (i.e. host uid=1000 is ubuntu whereas container uid=1000 is jenkins ). Additionally, to cope with ssh-agent unix sockets that are normally created in the project's @tmp directory but have a limit of 108 characters and thus fall back to using /tmp , I had to add the --volume /tmp:/tmp:rw . — P.P.S All of these bind-mount gymnastics in the workflow/pipeline scripts were prior to the advent of docker.inside automagically augmenting it's docker run command with --volumes-from <slave-container-id> .

          I just noticed this is the reason why WorkflowPluginTest#sshGitInsideDocker fails in ATH:

          [Pipeline] sh
          [junit7299999626358238985] Running shell script
          + mkdir //.ssh
          mkdir: cannot create directory '//.ssh': Permission denied
          

          The actual script is sh 'mkdir ~/.ssh && echo StrictHostKeyChecking no > ~/.ssh/config'

          Oliver Gondža added a comment - I just noticed this is the reason why WorkflowPluginTest#sshGitInsideDocker fails in ATH: [Pipeline] sh [junit7299999626358238985] Running shell script + mkdir //.ssh mkdir: cannot create directory '//.ssh': Permission denied The actual script is sh 'mkdir ~/.ssh && echo StrictHostKeyChecking no > ~/.ssh/config'

          Mike Kobit added a comment -

          Some of the restrictions here could possibly be worked around by power users if a custom entrypoint was allowed to be overridden (at https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java#L175). I haven't thought too much about the side effects, but this would allow for some more customizability, such as creating the user and group creation/mapping at startup.

          Another thing is that our developers already have tooling and automation built around docker, so they are not inclined to use docker.build with the Jenkinsfile. They would prefer to use the existing build plugins and tools inside of the Jenkinsfile, so being able to provide some way to mount the host Docker socket in (possible today) while also creating a user/group to allow the current -u user/group mapping would be useful.

          Mike Kobit added a comment - Some of the restrictions here could possibly be worked around by power users if a custom entrypoint was allowed to be overridden (at https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java#L175 ). I haven't thought too much about the side effects, but this would allow for some more customizability, such as creating the user and group creation/mapping at startup. Another thing is that our developers already have tooling and automation built around docker , so they are not inclined to use docker.build with the Jenkinsfile. They would prefer to use the existing build plugins and tools inside of the Jenkinsfile , so being able to provide some way to mount the host Docker socket in (possible today) while also creating a user/group to allow the current -u user/group mapping would be useful.

          I have the same problem with dotnet tooling.

          It can be reproduced using this dummy example

          node {
              docker.image('microsoft/aspnetcore-build:1.0.1').inside {
                  sh 'dotnet new'
                  sh 'dotnet restore'
              }
          }
          

          dotnet restore tries to create a .nuget folder in the home directory. Which is / as the user doesn't exit, so it results in a permission denied.

          Sylvain Rollinet added a comment - I have the same problem with dotnet tooling. It can be reproduced using this dummy example node { docker.image( 'microsoft/aspnetcore-build:1.0.1' ).inside { sh 'dotnet new ' sh 'dotnet restore' } } dotnet restore tries to create a .nuget folder in the home directory. Which is / as the user doesn't exit, so it results in a permission denied.

          Jesse Glick added a comment -

          Seems related to JENKINS-39748?

          Jesse Glick added a comment - Seems related to JENKINS-39748 ?

          It doesn't seem related. For me it is the same problem as described. "dotnet restore" expects to have a valid $HOME folder

          Sylvain Rollinet added a comment - It doesn't seem related. For me it is the same problem as described. "dotnet restore" expects to have a valid $HOME folder

          Right, this is just because -u is passed.  dotnet restore wants a nuget cache.  I think that this plugin should not pass -u by default.  inside(...) can be used to pass it if necessary.

          In our case we have additional infrastructure around our containers so we're not using docker.build.

          Matthew Mitchell added a comment - Right, this is just because -u is passed.  dotnet restore wants a nuget cache.  I think that this plugin should not pass -u by default.  inside(...) can be used to pass it if necessary. In our case we have additional infrastructure around our containers so we're not using docker.build.

          Oliver Gondža added a comment - - edited

          mmitche, but otherwise you might not have sufficient permissions in current folder which is likely to cause problems even more often. EDIT, When I think of it the user will likely default to root creating files that can not be manipulated once you leave the inside step.

          Oliver Gondža added a comment - - edited mmitche , but otherwise you might not have sufficient permissions in current folder which is likely to cause problems even more often. EDIT, When I think of it the user will likely default to root creating files that can not be manipulated once you leave the inside step.

          olivergondza We can make sure we clean up the container before exiting.  The issue here is that these are containers shared among different systems and teams, not built on the fly with docker.build.  So I don't have control over available users in the container.  The default for docker is to run as root within the container, so we're going to stick with that for now, unless you have another suggestion for how we should be managing use cases like this.

          I did find that you can pass -u 0:0 to the inside() step to get the desired behavior.

          Matthew Mitchell added a comment - olivergondza We can make sure we clean up the container before exiting.  The issue here is that these are containers shared among different systems and teams, not built on the fly with docker.build.  So I don't have control over available users in the container.  The default for docker is to run as root within the container, so we're going to stick with that for now, unless you have another suggestion for how we should be managing use cases like this. I did find that you can pass -u 0:0 to the inside() step to get the desired behavior.

          Jesse Glick added a comment -

          Do not plan to change the current behavior here.

          Jesse Glick added a comment - Do not plan to change the current behavior here.

          Jason MCDev added a comment -

          I am really struggling with this. We have existing builds that run inside of Docker.  We set all of our environment variables in the Dockerfile - LD_LIBRARY_PATH, PATH, etc.

          When I run this image in docker, I have none of those variable.  I have been trying to work around this for days - what is the expected workflow for this situation?  Please help? 

          Jason MCDev added a comment - I am really struggling with this. We have existing builds that run inside of Docker.  We set all of our environment variables in the Dockerfile - LD_LIBRARY_PATH, PATH, etc. When I run this image in docker, I have none of those variable.  I have been trying to work around this for days - what is the expected workflow for this situation?  Please help? 

          Jesse Glick added a comment -

          jsnmcdev do not use Image.inside for such cases. Write a per-project Dockerfile performing your build stuff and run it directly with docker commands.

          Jesse Glick added a comment - jsnmcdev do not use Image.inside for such cases. Write a per-project Dockerfile performing your build stuff and run it directly with docker commands.

          Eric Tan added a comment -

          Observations:

          uid 1000 and gid 1000 are not associated with actual names like jenkins, so 'whoami' will not work.

          Also running "pip install --user module " will fail since pip writes some objects in the user's own home directory (<user_home>/pip-stuff). Since there is no name associated with uid 1000, pip will write to the root directory instead (/pip-stuff), which it does not have permissions.

          Jason is right to suggest doing all build stuff in the Dockerfile.

          Eric Tan added a comment - Observations: uid 1000 and gid 1000 are not associated with actual names like jenkins, so 'whoami' will not work. Also running "pip install --user module " will fail since pip writes some objects in the user's own home directory (<user_home>/pip-stuff). Since there is no name associated with uid 1000, pip will write to the root directory instead (/pip-stuff), which it does not have permissions. Jason is right to suggest doing all build stuff in the Dockerfile.

          Hi!

          Is this issue planned to be resolved? I was stuck with the following error on a Maven build and I had no clue what it was:

          The specified user settings file does not exist: /path/on/jenkins/host/?/.m2/settings.xml

          I think it's a very important improvement. Other CI/CD tools do this when working with Docker to avoid such problems...

          Maybe I can try to send you a pull request with some guidance. In which class should I implement this?

          Rodrigo Carvalho Silva added a comment - Hi! Is this issue planned to be resolved? I was stuck with the following error on a Maven build and I had no clue what it was: The specified user settings file does not exist: /path/on/jenkins/host/?/.m2/settings.xml I think it's a very important improvement. Other CI/CD tools do this when working with Docker to avoid such problems... Maybe I can try to send you a pull request with some guidance. In which class should I implement this?

          Cristian added a comment - - edited

          Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent.

          Fast-forward to 2021 and nowadays we have user namespaces, specially with podman.

          it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile, thus cached and able to run as any USER; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run/withRun. Probably this usage distinction needs to be more clearly documented.

          I don't think the documentation was ever changed. If it helps somebody

          I have a Dockerfile like this

          ARG BASE_IMAGE
          FROM ${BASE_IMAGE}
          
          ARG USER_ID
          ARG GROUP_ID
          RUN groupadd -g ${GROUP_ID} builder
          RUN useradd -u ${USER_ID} -g ${GROUP_ID} builder
          

          And my Jenkinsfile says

          agent {
            dockerfile {
              additionalBuildArgs "--build-arg BASE_IMAGE=<the_image_I_want_to_use> --build-arg USER_ID=\$(id -u) --build-arg GROUP_ID=\$(id -g)"
            }
          }
          

          when it used to say

          agent {
            docker {
              image "<the_image_I_want_to_use>"
            }
          }
          

          Cristian added a comment - - edited Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent. Fast-forward to 2021 and nowadays we have user namespaces, specially with podman. it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile , thus cached and able to run as any USER ; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run / withRun . Probably this usage distinction needs to be more clearly documented. I don't think the documentation was ever changed. If it helps somebody I have a Dockerfile like this ARG BASE_IMAGE FROM ${BASE_IMAGE} ARG USER_ID ARG GROUP_ID RUN groupadd -g ${GROUP_ID} builder RUN useradd -u ${USER_ID} -g ${GROUP_ID} builder And my Jenkinsfile says agent { dockerfile { additionalBuildArgs "--build-arg BASE_IMAGE=<the_image_I_want_to_use> --build-arg USER_ID=\$(id -u) --build-arg GROUP_ID=\$(id -g)" } } when it used to say agent { docker { image "<the_image_I_want_to_use>" } }

            Unassigned Unassigned
            olivergondza Oliver Gondža
            Votes:
            11 Vote for this issue
            Watchers:
            24 Start watching this issue

              Created:
              Updated: