-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
Powered by SuggestiMate
The inside command shares the underlying filesystem with wokspace running docker run with --user option so the started processes inside are likely owned by the same uid/gid as their working directory. This causes problems to system tools that do not expect to run with uid/gid that does not exists in the system and its environment is no fully configured. This causes hard to debug failures as few tool expect that current user can not be looked up in /etc/passwd or the process do not have HOME variable set. Reportedly, it is not even POSIX compliant.
It does not take long to find a container/tool combination that will blow up with artificial uid/gid:
$ docker run -i fedora dnf update ... # OK $ docker run -u 4242:4242 -i fedora dnf update Traceback (most recent call last): File "/usr/bin/dnf", line 36, in <module> main.user_main(sys.argv[1:], exit_code=True) File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 185, in user_main errcode = main(args) File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 84, in main return _main(base, args) File "/usr/lib/python2.7/site-packages/dnf/cli/main.py", line 115, in _main cli.configure(map(ucd, args)) File "/usr/lib/python2.7/site-packages/dnf/cli/cli.py", line 932, in configure self.read_conf_file(opts.conffile, root, releasever, overrides) File "/usr/lib/python2.7/site-packages/dnf/cli/cli.py", line 1043, in read_conf_file conf.prepend_installroot(opt) File "/usr/lib/python2.7/site-packages/dnf/yum/config.py", line 718, in prepend_installroot path = path.lstrip('/') AttributeError: 'NoneType' object has no attribute 'lstrip'
The message does not indicate the real cause at all.
- relates to
-
JENKINS-39748 Since 37987 images that use ENTRYPOINT for a reason cannot be used in testing
-
- In Review
-
-
JENKINS-42695 Wrong UID in docker
-
- Resolved
-
[JENKINS-38438] image.inside { ... } does not create full configured user account
Cf. PR 57, which would not help in this case based on my comnmand-line testing:
cid=$(docker run -d fedora sleep infinity); docker exec -u $(id -u):$(id -g) $cid dnf update; docker rm -f $cid
It is unfortunate that the error is so unhelpful but that seems like a bug in the Fedora tool that could be encountered by anyone running without sudo.
Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent.
In general that will mean that the steps run inside the container should not expect to run as root. But for the intended usage of inside, this is not much of a restriction because it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile, thus cached and able to run as any USER; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run/withRun. Probably this usage distinction needs to be more clearly documented.
Regarding the value of $HOME, arguably that should be set (via -e) to the workspace, in lieu of anything better.
I used the update as an example of a command that rely on environment - not a wise choice. Certainly something in rpmbuild tooling (cross building native packages sounds like a reasonable use-case for inside) failed badly as it failed to get more information about current user OS knew nothing about.
I am not saying this is something this plugin should address. Perhaps this is something docker run should care for.
I was running into this problem when using git commands that were trying to connect to remotes via the ssh scheme. This is because ssh seems to not honor the $HOME and must be able to de-reference the current user from /etc/passwd and /etc/group. My work-around was to bind mount /etc/passwd, /etc/group, and /etc/shadow as read-only volumes, e.g.:
def dockerSocketGID = getOutput("stat -c '%g' /var/run/docker.sock") def dockerArgs = "--env HOME=${pwd()}" dockerArgs = "${dockerArgs} --group-add ${dockerSocketGID}" dockerArgs = "${dockerArgs} --volume /etc/group:/etc/group:ro" dockerArgs = "${dockerArgs} --volume /etc/passwd:/etc/passwd:ro" dockerArgs = "${dockerArgs} --volume /etc/shadow:/etc/shadow:ro" dockerArgs = "${dockerArgs} --volume /tmp:/tmp:rw" dockerArgs = "${dockerArgs} --volume /var/lib/jenkins/.ssh:/home/ubuntu/.ssh:ro" dockerArgs = "${dockerArgs} --volume /var/lib/jenkins/tools:/var/lib/jenkins/tools:ro" dockerArgs = "${dockerArgs} --volume /var/run/docker.sock:/var/run/docker.sock:rw" docker.withRegistry(dockerRegistryUrl, dockerRegistryCredentialsId) { pullDockerImage(dockerImg).inside(dockerArgs) { // do something useful } } def pullDockerImage(imageName) { def img = docker.image(imageName) /* make sure we have the up-to-date image */ img.pull() /* dance around https://issues.jenkins-ci.org/browse/JENKINS-34276 */ return docker.image(img.imageName()) }
—
P.S. For added complexity with regards to bind mounts, per this article https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/, I run my Jenkins slaves as containers that bind mount their host's /var/run/docker.sock. This means that I am pulling in the host's /etc/passwd, /etc/group, and /etc/shadow which is why this particular bind mount, /var/lib/jenkins/.ssh:/home/ubuntu/.ssh, looks the way it does (i.e. host uid=1000 is ubuntu whereas container uid=1000 is jenkins). Additionally, to cope with ssh-agent unix sockets that are normally created in the project's @tmp directory but have a limit of 108 characters and thus fall back to using /tmp, I had to add the --volume /tmp:/tmp:rw.
—
P.P.S All of these bind-mount gymnastics in the workflow/pipeline scripts were prior to the advent of docker.inside automagically augmenting it's docker run command with --volumes-from <slave-container-id>.
I just noticed this is the reason why WorkflowPluginTest#sshGitInsideDocker fails in ATH:
[Pipeline] sh [junit7299999626358238985] Running shell script + mkdir //.ssh mkdir: cannot create directory '//.ssh': Permission denied
The actual script is sh 'mkdir ~/.ssh && echo StrictHostKeyChecking no > ~/.ssh/config'
Some of the restrictions here could possibly be worked around by power users if a custom entrypoint was allowed to be overridden (at https://github.com/jenkinsci/docker-workflow-plugin/blob/master/src/main/java/org/jenkinsci/plugins/docker/workflow/WithContainerStep.java#L175). I haven't thought too much about the side effects, but this would allow for some more customizability, such as creating the user and group creation/mapping at startup.
Another thing is that our developers already have tooling and automation built around docker, so they are not inclined to use docker.build with the Jenkinsfile. They would prefer to use the existing build plugins and tools inside of the Jenkinsfile, so being able to provide some way to mount the host Docker socket in (possible today) while also creating a user/group to allow the current -u user/group mapping would be useful.
I have the same problem with dotnet tooling.
It can be reproduced using this dummy example
node { docker.image('microsoft/aspnetcore-build:1.0.1').inside { sh 'dotnet new' sh 'dotnet restore' } }
dotnet restore tries to create a .nuget folder in the home directory. Which is / as the user doesn't exit, so it results in a permission denied.
It doesn't seem related. For me it is the same problem as described. "dotnet restore" expects to have a valid $HOME folder
Right, this is just because -u is passed. dotnet restore wants a nuget cache. I think that this plugin should not pass -u by default. inside(...) can be used to pass it if necessary.
In our case we have additional infrastructure around our containers so we're not using docker.build.
mmitche, but otherwise you might not have sufficient permissions in current folder which is likely to cause problems even more often. EDIT, When I think of it the user will likely default to root creating files that can not be manipulated once you leave the inside step.
olivergondza We can make sure we clean up the container before exiting. The issue here is that these are containers shared among different systems and teams, not built on the fly with docker.build. So I don't have control over available users in the container. The default for docker is to run as root within the container, so we're going to stick with that for now, unless you have another suggestion for how we should be managing use cases like this.
I did find that you can pass -u 0:0 to the inside() step to get the desired behavior.
I am really struggling with this. We have existing builds that run inside of Docker. We set all of our environment variables in the Dockerfile - LD_LIBRARY_PATH, PATH, etc.
When I run this image in docker, I have none of those variable. I have been trying to work around this for days - what is the expected workflow for this situation? Please help?
jsnmcdev do not use Image.inside for such cases. Write a per-project Dockerfile performing your build stuff and run it directly with docker commands.
Observations:
uid 1000 and gid 1000 are not associated with actual names like jenkins, so 'whoami' will not work.
Also running "pip install --user module " will fail since pip writes some objects in the user's own home directory (<user_home>/pip-stuff). Since there is no name associated with uid 1000, pip will write to the root directory instead (/pip-stuff), which it does not have permissions.
Jason is right to suggest doing all build stuff in the Dockerfile.
Hi!
Is this issue planned to be resolved? I was stuck with the following error on a Maven build and I had no clue what it was:
The specified user settings file does not exist: /path/on/jenkins/host/?/.m2/settings.xml
I think it's a very important improvement. Other CI/CD tools do this when working with Docker to avoid such problems...
Maybe I can try to send you a pull request with some guidance. In which class should I implement this?
Really I am not sure how this could be expected to work. Typically the Jenkins agent will be running as a jenkins user or similar, and so files in the workspace need to be owned by that user so they can be read and written by other steps (or even deleted by the next build, etc.). Since the --volume option offers no apparent UID mapping feature, that means that the active processes in the container must use the same UID/GID as the agent.
Fast-forward to 2021 and nowadays we have user namespaces, specially with podman.
it is expected that there are two kinds of commands: tool/environment setup, which should be done using build with a Dockerfile, thus cached and able to run as any USER; and actual build steps, which are focused on manipulating workspace files and should run without privilege. If your build is really “about” creating an image or otherwise configuring things at the OS level, inside is inappropriate—you probably want to use only build and maybe run/withRun. Probably this usage distinction needs to be more clearly documented.
I don't think the documentation was ever changed. If it helps somebody
I have a Dockerfile like this
ARG BASE_IMAGE FROM ${BASE_IMAGE} ARG USER_ID ARG GROUP_ID RUN groupadd -g ${GROUP_ID} builder RUN useradd -u ${USER_ID} -g ${GROUP_ID} builder
And my Jenkinsfile says
agent {
dockerfile {
additionalBuildArgs "--build-arg BASE_IMAGE=<the_image_I_want_to_use> --build-arg USER_ID=\$(id -u) --build-arg GROUP_ID=\$(id -g)"
}
}
when it used to say
agent {
docker {
image "<the_image_I_want_to_use>"
}
}
see https://github.com/Dockins/docker-slaves-plugin/blob/master/src/main/java/it/dockins/dockerslaves/drivers/CliDockerDriver.java#L232 as a possible way to support this