Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-73561

docker pipeline hangs when container has problems starting

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • docker-workflow-plugin
    • None

      The following pipeline:

      pipeline {
          agent {
              docker {
                  image 'quay.io/condaforge/linux-anvil-cos7-x86_64'
              }
          }
          stages {
              stage('Test') {
                  steps {
                      sh 'echo Hello World'
                  }
              }
          }
      }

      ...will run the first time, then all subsequent builds will simply hang at the `sh` step:

      [Pipeline] Start of Pipeline
      [Pipeline] node
      Running on Jenkins in /var/lib/jenkins/workspace/test
      [Pipeline] {
      [Pipeline] isUnix
      [Pipeline] withEnv
      [Pipeline] {
      [Pipeline] sh
      + docker inspect -f . quay.io/condaforge/linux-anvil-cos7-x86_64
      .
      [Pipeline] }
      [Pipeline] // withEnv
      [Pipeline] withDockerContainer
      Jenkins does not seem to be running inside a container
      $ docker run -t -d -u 115:121 -w /var/lib/jenkins/workspace/test -v /var/lib/jenkins/workspace/test:/var/lib/jenkins/workspace/test:rw,z -v /var/lib/jenkins/workspace/test@tmp:/var/lib/jenkins/workspace/test@tmp:rw,z -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** quay.io/condaforge/linux-anvil-cos7-x86_64 cat
      $ docker top 82b5ffd16f2046efca559bb68c910a2230e4f6fedf3bdbf61b6d9e5763ae673b -eo pid,comm
      ERROR: The container started but didn't run the expected command. Please double check your ENTRYPOINT does execute the command passed as docker run argument, as required by official docker images (see https://github.com/docker-library/official-images#consistency for entrypoint consistency requirements).
      Alternatively you can force image entrypoint to be disabled by adding option `--entrypoint=''`.
      [Pipeline] {
      [Pipeline] stage
      [Pipeline] { (Test)
      [Pipeline] sh

      Based on my testing, a full system reboot seems to be the only way to "reset." Removing docker containers and restarting Jenkins did not seem to do the trick on my system. Syslog shows more info: (Note that this might not be the same syslog from the build log above; I've run this pipeline many times to diagnose and the log files are getting hard to keep track of)

      Aug  2 18:40:19 networkd-dispatcher[826]: WARNING:Unknown index 5 seen, reloading interface list
      Aug  2 18:40:19 systemd-udevd[1763]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:19 systemd-udevd[1764]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:19 systemd-networkd[807]: veth40bd37e: Link UP
      Aug  2 18:40:19 kernel: [  179.207950] docker0: port 1(veth40bd37e) entered blocking state
      Aug  2 18:40:19 kernel: [  179.207957] docker0: port 1(veth40bd37e) entered disabled state
      Aug  2 18:40:19 kernel: [  179.208044] device veth40bd37e entered promiscuous mode
      Aug  2 18:40:20 containerd[902]: time="2024-08-02T18:40:20.502900639Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
      Aug  2 18:40:20 containerd[902]: time="2024-08-02T18:40:20.502958073Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
      Aug  2 18:40:20 containerd[902]: time="2024-08-02T18:40:20.502970759Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
      Aug  2 18:40:20 containerd[902]: time="2024-08-02T18:40:20.503572662Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
      Aug  2 18:40:20 systemd[1]: Started libcontainer container 7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d.
      Aug  2 18:40:20 kernel: [  180.420318] eth0: renamed from veth0dae4b5
      Aug  2 18:40:20 systemd-networkd[807]: veth40bd37e: Gained carrier
      Aug  2 18:40:20 systemd-networkd[807]: docker0: Gained carrier
      Aug  2 18:40:20 kernel: [  180.436377] IPv6: ADDRCONF(NETDEV_CHANGE): veth40bd37e: link becomes ready
      Aug  2 18:40:20 kernel: [  180.436419] docker0: port 1(veth40bd37e) entered blocking state
      Aug  2 18:40:20 kernel: [  180.436423] docker0: port 1(veth40bd37e) entered forwarding state
      Aug  2 18:40:20 kernel: [  180.436467] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
      Aug  2 18:40:21 systemd-networkd[807]: veth40bd37e: Gained IPv6LL
      Aug  2 18:40:22 systemd-networkd[807]: docker0: Gained IPv6LL
      Aug  2 18:40:26 dockerd[1206]: time="2024-08-02T18:40:26.276620431Z" level=info msg="Container failed to exit within 1s of signal 15 - using the force" container=7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d
      Aug  2 18:40:26 systemd[1]: docker-7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d.scope: Deactivated successfully.
      Aug  2 18:40:26 dockerd[1206]: time="2024-08-02T18:40:26.488470032Z" level=info msg="ignoring event" container=7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
      Aug  2 18:40:26 containerd[902]: time="2024-08-02T18:40:26.488395723Z" level=info msg="shim disconnected" id=7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d namespace=moby
      Aug  2 18:40:26 containerd[902]: time="2024-08-02T18:40:26.488513952Z" level=warning msg="cleaning up after shim disconnected" id=7ede9e443ad022290032082637ded80263a23c6764de6dbe49143577c0cd8a5d namespace=moby
      Aug  2 18:40:26 containerd[902]: time="2024-08-02T18:40:26.488528019Z" level=info msg="cleaning up dead shim" namespace=moby
      Aug  2 18:40:26 systemd-networkd[807]: veth40bd37e: Lost carrier
      Aug  2 18:40:26 kernel: [  186.335638] docker0: port 1(veth40bd37e) entered disabled state
      Aug  2 18:40:26 kernel: [  186.335823] veth0dae4b5: renamed from eth0
      Aug  2 18:40:26 networkd-dispatcher[826]: WARNING:Unknown index 5 seen, reloading interface list
      Aug  2 18:40:26 systemd-udevd[1959]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:26 systemd-networkd[807]: veth40bd37e: Link DOWN
      Aug  2 18:40:26 kernel: [  186.452434] docker0: port 1(veth40bd37e) entered disabled state
      Aug  2 18:40:26 kernel: [  186.453120] device veth40bd37e left promiscuous mode
      Aug  2 18:40:26 kernel: [  186.453128] docker0: port 1(veth40bd37e) entered disabled state
      Aug  2 18:40:26 systemd[1]: run-docker-netns-0d5cf01af203.mount: Deactivated successfully.
      Aug  2 18:40:26 systemd[1]: var-lib-docker-overlay2-4f22a701401eb1d93cbe1ffceb4903e84fe6fa305863eaea025f3daeaed537a2-merged.mount: Deactivated successfully.
      Aug  2 18:40:27 systemd-networkd[807]: docker0: Lost carrier
      Aug  2 18:40:46 kernel: [  206.524406] [NON-UBC BLOCK] IN=enp0s25 OUT= MAC=ff:ff:ff:ff:ff:ff:a4:fc:14:30:b5:b7:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=328 TOS=0x00 PREC=0x00 TTL=255 ID=38311 PROTO=UDP SPT=68 DPT=67 LEN=308 
      Aug  2 18:40:47 kernel: [  207.648367] [NON-UBC BLOCK] IN=enp0s25 OUT= MAC=ff:ff:ff:ff:ff:ff:a4:fc:14:30:b5:b7:08:00 SRC=0.0.0.0 DST=255.255.255.255 LEN=328 TOS=0x00 PREC=0x00 TTL=255 ID=38312 PROTO=UDP SPT=68 DPT=67 LEN=308 
      Aug  2 18:40:52 systemd[1]: var-lib-docker-overlay2-19e12f681843e54a84fcc10229c768334f2f4a214828ef13108eb45e8057671c\x2dinit-merged.mount: Deactivated successfully.
      Aug  2 18:40:53 systemd[1]: var-lib-docker-overlay2-19e12f681843e54a84fcc10229c768334f2f4a214828ef13108eb45e8057671c-merged.mount: Deactivated successfully.
      Aug  2 18:40:53 systemd-udevd[2034]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:53 systemd-udevd[2035]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:53 networkd-dispatcher[826]: WARNING:Unknown index 7 seen, reloading interface list
      Aug  2 18:40:53 kernel: [  213.510757] docker0: port 1(veth5bea384) entered blocking state
      Aug  2 18:40:53 kernel: [  213.510764] docker0: port 1(veth5bea384) entered disabled state
      Aug  2 18:40:53 systemd-networkd[807]: veth5bea384: Link UP
      Aug  2 18:40:53 kernel: [  213.511917] device veth5bea384 entered promiscuous mode
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.148781255Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.148834946Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.148848600Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.148972580Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
      Aug  2 18:40:54 systemd[1]: Started libcontainer container 0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776.
      Aug  2 18:40:54 kernel: [  213.983562] eth0: renamed from vetha27c4b5
      Aug  2 18:40:54 systemd-networkd[807]: veth5bea384: Gained carrier
      Aug  2 18:40:54 systemd-networkd[807]: docker0: Gained carrier
      Aug  2 18:40:54 kernel: [  214.003356] IPv6: ADDRCONF(NETDEV_CHANGE): veth5bea384: link becomes ready
      Aug  2 18:40:54 kernel: [  214.003395] docker0: port 1(veth5bea384) entered blocking state
      Aug  2 18:40:54 kernel: [  214.003399] docker0: port 1(veth5bea384) entered forwarding state
      Aug  2 18:40:54 systemd[1]: docker-0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776.scope: Deactivated successfully.
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.878260626Z" level=info msg="shim disconnected" id=0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776 namespace=moby
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.878330353Z" level=warning msg="cleaning up after shim disconnected" id=0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776 namespace=moby
      Aug  2 18:40:54 containerd[902]: time="2024-08-02T18:40:54.878349162Z" level=info msg="cleaning up dead shim" namespace=moby
      Aug  2 18:40:54 dockerd[1206]: time="2024-08-02T18:40:54.878307520Z" level=info msg="ignoring event" container=0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
      Aug  2 18:40:55 systemd-networkd[807]: veth5bea384: Lost carrier
      Aug  2 18:40:55 kernel: [  214.779301] docker0: port 1(veth5bea384) entered disabled state
      Aug  2 18:40:55 kernel: [  214.779561] vetha27c4b5: renamed from eth0
      Aug  2 18:40:55 networkd-dispatcher[826]: WARNING:Unknown index 7 seen, reloading interface list
      Aug  2 18:40:55 systemd-udevd[2048]: Using default interface naming scheme 'v249'.
      Aug  2 18:40:55 systemd-networkd[807]: veth5bea384: Link DOWN
      Aug  2 18:40:55 kernel: [  214.958992] docker0: port 1(veth5bea384) entered disabled state
      Aug  2 18:40:55 kernel: [  214.959835] device veth5bea384 left promiscuous mode
      Aug  2 18:40:55 kernel: [  214.959843] docker0: port 1(veth5bea384) entered disabled state
      Aug  2 18:40:55 systemd[1]: run-docker-netns-e80d8c1be79b.mount: Deactivated successfully.
      Aug  2 18:40:55 systemd[1]: var-lib-docker-overlay2-19e12f681843e54a84fcc10229c768334f2f4a214828ef13108eb45e8057671c-merged.mount: Deactivated successfully.
      Aug  2 18:40:56 systemd-networkd[807]: docker0: Lost carrier
      Aug  2 18:40:56 dockerd[1206]: time="2024-08-02T18:40:56.307380517Z" level=error msg="Error setting up exec command in container 0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776: Container 0d092ae3db416822aab25da076b02c5ebbc2a6453c10cd23760b29c1f6d69776 is not running"

      The last line is key; The container isn't running properly. `docker logs` shows why:

      /usr/bin/id: cannot find name for user ID 115
      bash: unalias: cp: not found
      useradd: Permission denied.
      useradd: cannot lock /etc/passwd; try again later.
      chown: invalid user: 'conda:conda'
      cp: cannot create directory '/home/conda': Permission denied
      rm: cannot remove '/opt/conda/.condarc': Permission denied
      cp: cannot stat '/root/.condarc': Permission denied
      bash: cd: /home/conda: No such file or directory
      su-exec: setgroups(121): Operation not permitted

      This appears to be a failed execution of this script in the conda forge image. In short, the conda forge image is being rather nonstandard and dynamically creating the user for container execution to try and match the host UID. Hence, the bulk of this issue is nonstandard Docker practices in the image. I think the issues in Jenkins are limited to:

      1. The build does not repeat the `Container is not running` message, and appears to have no record of the nonzero exit code of the corresponding command. The build really should fail at this step rather than hanging.
      2. Behavior is inconsistent between execution steps; If a container is doing nonstandard behavior, an error should only pop up in one build, and subsequent builds should show the same behavior (rather than working once, then subsequent steps failing).

      The only workaround is to invoke Docker commands manually. I'm labeling this bug as minor because it appears limited to certain containers that will not work in Jenkins anyway.

            Unassigned Unassigned
            kb1rd Nathan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: