Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55066

Docker plugin erroneously terminates containers shortly after they start

    XMLWordPrintable

Details

    Description

      We are seeing an issue where Pipeline jobs using Docker agents (with the Docker plugin, as opposed to Docker containers on regular agents using Pipeline's Docker support) intermittently fail right at the start, during the initial git checkout, with a "FATAL: java.io.IOException: Unexpected termination of the channel" exception. Having enabled debug logging for the Docker plugin, it appears that the plugin is erroneously killing the container because it thinks it is no longer needed.

      Job log:

      [First few lines redacted, this is the Jenkinsfile checkout]
      
      Checking out Revision 0b45f687992585a470e5faf003309b215e3f74f1 (refs/remotes/origin/master)
       > git config core.sparsecheckout # timeout=10
       > git checkout -f 0b45f687992585a470e5faf003309b215e3f74f1
      FATAL: java.io.IOException: Unexpected termination of the channel
      java.io.EOFException
              at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2679)
              at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3154)
              at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
              at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
              at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
              at hudson.remoting.Command.readFrom(Command.java:140)
              at hudson.remoting.Command.readFrom(Command.java:126)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      Caused: java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
      Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to docker-2ae12755b75761
                      at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
                      at hudson.remoting.Request.call(Request.java:202)
                      at hudson.remoting.Channel.call(Channel.java:954)
                      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
                      at com.sun.proxy.$Proxy118.withRepository(Unknown Source)
                      at org.jenkinsci.plugins.gitclient.RemoteGitImpl.withRepository(RemoteGitImpl.java:235)
                      at hudson.plugins.git.GitSCM.printCommitMessageToLog(GitSCM.java:1271)
                      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1244)
                      at hudson.scm.SCM.checkout(SCM.java:504)
                      at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
                      at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
                      at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
                      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
                      at hudson.model.Run.execute(Run.java:1815)
                      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
                      at hudson.model.ResourceController.execute(ResourceController.java:97)
                      at hudson.model.Executor.run(Executor.java:429)
      Caused: hudson.remoting.RequestAbortedException
              at hudson.remoting.Request.abort(Request.java:340)
              at hudson.remoting.Channel.terminate(Channel.java:1038)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:96)
      Finished: FAILURE

      Docker plugin debug log:

      2018-11-15 13:59:56.444+0000 [id=25]    FINE    c.n.j.p.d.s.DockerOnceRetentionStrategy#done: terminating docker-2ae12755b75761 since PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=CompileTest#22494,label=docker-2ae12755b75761,context=CpsStepContext[4:node]:Owner[CompileTest/22494:CompileTest #22494],cookie=561ba1da-fd51-4ee6-9bc3-5d4bb75a9fd0,auth=null} seems to be finished
      2018-11-15 13:59:56.446+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Disconnected computer for slave 'docker-2ae12755b75761'.
      2018-11-15 13:59:56.448+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Removed Node for slave 'docker-2ae12755b75761'. 

      Jenkins log:

      2018-11-15 13:59:56.445+0000 [id=2063144]       SEVERE  h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel docker-2ae12755b75761
      java.net.SocketException: Socket closed
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              at java.net.SocketInputStream.read(SocketInputStream.java:171)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at java.net.SocketInputStream.read(SocketInputStream.java:127)
              at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:41)
              at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:25)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      2018-11-15 13:59:56.446+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Disconnected computer for slave 'docker-2ae12755b75761'.
      2018-11-15 13:59:56.448+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Removed Node for slave 'docker-2ae12755b75761'.
      

       

      The timestamps of the logs seem to indicate that the Docker plugin erroneously thinks that the job, or at least a step in the job, has completed and so the container should be terminated. This happens a couple of times a day on the same job, but most builds do not fail.

      Attachments

        Activity

          fernando_rosado Fernando Rosado Altamirano added a comment - - edited

          We have the similar problem, but a very strange behavior.

          • The docker host has disabled the TLS setting.
          • All the execution finish before job finish with the same error message : Socket closed
          • I followed the same steps with the similar results. Enabled the logs and so on ( I will copy it later)
          • As desesperated situation, i changed the IP of docker host:
            #vi /etc/sysconfig/network-scripts/ifcfg-eth0
            # systemctl restart network
            

          If i rollback the change, and assign again the old IP address (no other change). Jenkins docker agents fails after a few minutes (5 min or less)
          Only this change (and obviously change jenkins setting to use this IP) solve the problem.
          This is not a valid solution for us, we want to find the Root cause.

          Attached the full log details :
          Jenkins log from docker plugins :

          Asked to provision 1 agent(s) for: ansible-test
          Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedAgent
          Provisioning 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest' on 'docker-agents'; Total containers: 0 (of 100)
          Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision
          Will provision 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest', for label: 'ansible-test', in cloud: 'docker-agents'
          Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
          Pulling image 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest'. This may take awhile...
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector
          instrumented a special java.util.Set into: {}
          Aug 24, 2021 11:48:50 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient
          Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@188d282f to DockerClientParameters{dockerUri=tcp://docker-agent-host:4243, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000}
          Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage
          Finished pulling image 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest', took 542 ms
          Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
          Trying to run container for image "mydockerhub.internal/dev/ansible2.9-base-centos7:latest"
          Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
          Trying to run container for node ansible2.9-base-centos7-003xx52xa8gj7 from image: mydockerhub.internal/dev/ansible2.9-base-centos7:latest
          Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode
          Started container ID 1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4 for node ansible2.9-base-centos7-003xx52xa8gj7 from image: mydockerhub.internal/dev/ansible2.9-base-centos7:latest
          Aug 24, 2021 11:48:52 AM FINE io.netty.channel.DefaultChannelPipeline onUnhandledInboundMessage
          Discarded inbound message {} that reached at the tail of the pipeline. Please check your pipeline configuration.
          Aug 24, 2021 11:48:52 AM INFO io.jenkins.docker.client.DockerMultiplexedInputStream readInternal
          stderr from ansible2.9-base-centos7-003xx52xa8gj7 (1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4): Aug 24, 2021 11:48:52 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
          INFO: Using /home/jenkins/agent.log as an agent error log destination; output log will not be generated
          Aug 24, 2021 11:48:53 AM INFO io.jenkins.docker.client.DockerMultiplexedInputStream readInternal
          stderr from ansible2.9-base-centos7-003xx52xa8gj7 (1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4): channel started
          Aug 24, 2021 11:51:40 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0
          Started DockerContainerWatchdog Asynchronous Periodic Work
          Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute
          Docker Container Watchdog has been triggered
          Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog$Statistics writeStatisticsToLog
          Watchdog Statistics: Number of overall executions: 8274, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 43, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 910 ms, Average runtime of container retrieval: 234 ms
          Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog loadNodeMap
          We currently have 20 nodes assigned to this Jenkins instance, which we will check
          Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute
          Checking Docker Cloud docker-agents at tcp://docker-agent-host:4243
          Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute
          Docker Container Watchdog check has been completed
          Aug 24, 2021 11:51:40 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0
          Finished DockerContainerWatchdog Asynchronous Periodic Work. 7 ms
          Aug 24, 2021 11:51:52 AM FINE com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy
          terminating ansible2.9-base-centos7-003xx52xa8gj7 since test-job #31 seems to be finished
          Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println
          Disconnected computer for node 'ansible2.9-base-centos7-003xx52xa8gj7'.
          Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println
          Can't stop container '1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4' for node 'ansible2.9-base-centos7-003xx52xa8gj7' as it does not exist.
          Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println
          Removed Node for node 'ansible2.9-base-centos7-003xx52xa8gj7'.
          
          

          I also has enabled the docker debug trace and appear a explicit docker delete command :

          Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.805591390Z" level=debug msg="Assigning addresses for endpoint serene_villani's interface on network bridge"
          Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.812651917Z" level=debug msg="Programming external connectivity on endpoint serene_villani (eecc35eb5d7c74c16be3f557fc84ddb6ce86bfd7485061c558c0802327a4f49f)"
          Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.814811628Z" level=debug msg="EnableService 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 START"
          Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.814841839Z" level=debug msg="EnableService 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 DONE"
          Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.818744124Z" level=debug msg="bundle dir created" bundle=/var/run/docker/containerd/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 module=libcontainerd namespace=moby root=/var/lib/docker/overlay2/2773a0dca5fc99396f556e5e1b897224cf231a947f45c960bb6bf46c52ee1f6b/merged
          Aug 24 12:06:57 docker-agent-host containerd[1102]: time="2021-08-24T12:06:57.845538905Z" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 pid=9806
          Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
          Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
          Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth71a50aa: link becomes ready
          Aug 24 12:06:58 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered blocking state
          Aug 24 12:06:58 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered forwarding state
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.371993182Z" level=debug msg="sandbox set key processing took 156.552676ms for container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.438854804Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/create
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.472247002Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/start
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.560652987Z" level=debug msg="Calling PUT /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/archive?path=%2Fhome%2Fjenkins&noOverwriteDirNonDir=false"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.560874237Z" level=debug msg="container mounted via layerStore: &{/var/lib/docker/overlay2/2773a0dca5fc99396f556e5e1b897224cf231a947f45c960bb6bf46c52ee1f6b/merged 0x55f316e67f40 0x55f316e67f40}" container=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.561455163Z" level=debug msg="unpigz binary not found, falling back to go gzip library"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.701300673Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.706687477Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.711112832Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/exec"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.711243982Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":true,\"AttachStdout\":true,\"Cmd\":[\"/usr/bin/java\",\"-jar\",\"/home/jenkins/remoting-4.5.jar\",\"-noReconnect\",\"-noKeepAlive\",\"-agentLog\",\"/home/jenkins/agent.log\"],\"Tty\":false,\"User\":\"jenkins\",\"containerId\":\"8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846\"}"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712526832Z" level=debug msg="Calling POST /v1.32/exec/304ccc7fbf96afacc204b489462233dddf6839ff1ceac06d8eaca3446f5df773/start"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712619072Z" level=debug msg="form data: {\"Detach\":false,\"Tty\":false}"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712931510Z" level=debug msg="starting exec command 304ccc7fbf96afacc204b489462233dddf6839ff1ceac06d8eaca3446f5df773 in container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714921605Z" level=debug msg="attach: stderr: begin"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714922892Z" level=debug msg="attach: stdout: begin"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714937302Z" level=debug msg="attach: stdin: begin"
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.718168804Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-added
          Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.811154921Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-started
          Aug 24 12:07:02 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:02.061778175Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json"
          Aug 24 12:07:12 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:12.467544842Z" level=debug msg="Calling HEAD /_ping"
          Aug 24 12:07:12 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:12.468691041Z" level=debug msg="Calling GET /v1.41/containers/json"
          Aug 24 12:07:17 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:17.753068188Z" level=debug msg="Calling HEAD /_ping"
          Aug 24 12:07:21 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:21.431766056Z" level=debug msg="Calling HEAD /_ping"
          Aug 24 12:07:21 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:21.433187102Z" level=debug msg="Calling GET /v1.41/containers/8f8/json"
          Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.776240745Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D"
          Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.783347656Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/stop?t=10"
          Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.783654630Z" level=debug msg="Sending kill signal 15 to container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846"
          Aug 24 12:11:40 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:40.906046192Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.852784520Z" level=info msg="Container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 failed to exit within 10 seconds of signal 15 - using the force"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.852972852Z" level=debug msg="Sending kill signal 9 to container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.937757918Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938300206Z" level=debug msg="attach: stdout: end"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938332963Z" level=debug msg="attach: stderr: end"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938460037Z" level=debug msg="attach: stdin: end"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938559126Z" level=debug msg="attach done"
          Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.941605088Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.000044557Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/delete
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.000105917Z" level=info msg="ignoring event" container=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
          Aug 24 12:11:48 docker-agent-host containerd[1102]: time="2021-08-24T12:11:48.000436444Z" level=info msg="shim disconnected" id=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846
          Aug 24 12:11:48 docker-agent-host containerd[1102]: time="2021-08-24T12:11:48.000664261Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.002242004Z" level=debug msg="Revoking external connectivity on endpoint serene_villani (eecc35eb5d7c74c16be3f557fc84ddb6ce86bfd7485061c558c0802327a4f49f)"
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.010704047Z" level=debug msg="DeleteConntrackEntries purged ipv4:0, ipv6:0"
          Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state
          Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state
          Aug 24 12:11:48 docker-agent-host kernel: device veth71a50aa left promiscuous mode
          Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070537216Z" level=debug msg="Releasing addresses for endpoint serene_villani's interface on network bridge"
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070603344Z" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.2)"
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070725504Z" level=debug msg="Released address PoolID:LocalDefault/172.17.0.0/16, Address:172.17.0.2 Sequence:App: ipam/default/data, ID: LocalDefault/172.17.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65532, Sequence: (0xe0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:3"
          Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.105589404Z" level=debug msg="Calling DELETE /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846?v=true"
          Aug 24 12:11:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:58.007798583Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/stop?t=10"
          Aug 24 12:16:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:16:37.775109905Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D"
          Aug 24 12:16:40 docker-agent-host dockerd[22607]: time="2021-08-24T12:16:40.906616889Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D"
          Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145111111Z" level=debug msg="Closing buffered stdin pipe"
          Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145378406Z" level=debug msg="Closing buffered stdin pipe"
          Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145648625Z" level=debug msg="Closing buffered stdin pipe"
          Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145677775Z" level=debug msg="Closing buffered stdin pipe"
          

          I think the problem could be related with the Watchdog and the filter that is using. It is asking docker host about the status of the containters that jenkins has created. Docker host returns all the containers on the filter (labeled by jenkins) and it was unable to identify the running container for some reason (name are not matching)
          The execution of this code is always the same:

          Jenkins.getInstance().clouds.get(com.nirima.jenkins.plugins.docker.DockerCloud).CONTAINERS_IN_PROGRESS 
          -----
          Result: {}
          
          

          And the logs always shows the message from com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy

          terminating node-name since job-name #37 seems to be finished
          

          Even if the job has not finished. For some reason the plugin is unable to detect that the job still running.

          fernando_rosado Fernando Rosado Altamirano added a comment - - edited We have the similar problem, but a very strange behavior. The docker host has disabled the TLS setting. All the execution finish before job finish with the same error message : Socket closed I followed the same steps with the similar results. Enabled the logs and so on ( I will copy it later) As desesperated situation, i changed the IP of docker host: #vi /etc/sysconfig/network-scripts/ifcfg-eth0 # systemctl restart network If i rollback the change, and assign again the old IP address (no other change). Jenkins docker agents fails after a few minutes (5 min or less) Only this change (and obviously change jenkins setting to use this IP) solve the problem. This is not a valid solution for us, we want to find the Root cause. Attached the full log details : Jenkins log from docker plugins : Asked to provision 1 agent(s) for: ansible-test Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud canAddProvisionedAgent Provisioning 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest' on 'docker-agents'; Total containers: 0 (of 100) Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerCloud provision Will provision 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest', for label: 'ansible-test', in cloud: 'docker-agents' Aug 24, 2021 11:48:50 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage Pulling image 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest'. This may take awhile... Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM FINEST io.netty.channel.nio.NioEventLoop openSelector instrumented a special java.util.Set into: {} Aug 24, 2021 11:48:50 AM INFO io.jenkins.docker.client.DockerAPI getOrMakeClient Cached connection io.jenkins.docker.client.DockerAPI$SharableDockerClient@188d282f to DockerClientParameters{dockerUri=tcp://docker-agent-host:4243, credentialsId=null, readTimeoutInMsOrNull=300000, connectTimeoutInMsOrNull=60000} Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate pullImage Finished pulling image 'mydockerhub.internal/dev/ansible2.9-base-centos7:latest', took 542 ms Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode Trying to run container for image "mydockerhub.internal/dev/ansible2.9-base-centos7:latest" Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode Trying to run container for node ansible2.9-base-centos7-003xx52xa8gj7 from image: mydockerhub.internal/dev/ansible2.9-base-centos7:latest Aug 24, 2021 11:48:51 AM INFO com.nirima.jenkins.plugins.docker.DockerTemplate doProvisionNode Started container ID 1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4 for node ansible2.9-base-centos7-003xx52xa8gj7 from image: mydockerhub.internal/dev/ansible2.9-base-centos7:latest Aug 24, 2021 11:48:52 AM FINE io.netty.channel.DefaultChannelPipeline onUnhandledInboundMessage Discarded inbound message {} that reached at the tail of the pipeline. Please check your pipeline configuration. Aug 24, 2021 11:48:52 AM INFO io.jenkins.docker.client.DockerMultiplexedInputStream readInternal stderr from ansible2.9-base-centos7-003xx52xa8gj7 (1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4): Aug 24, 2021 11:48:52 AM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Using /home/jenkins/agent.log as an agent error log destination; output log will not be generated Aug 24, 2021 11:48:53 AM INFO io.jenkins.docker.client.DockerMultiplexedInputStream readInternal stderr from ansible2.9-base-centos7-003xx52xa8gj7 (1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4): channel started Aug 24, 2021 11:51:40 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0 Started DockerContainerWatchdog Asynchronous Periodic Work Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute Docker Container Watchdog has been triggered Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog$Statistics writeStatisticsToLog Watchdog Statistics: Number of overall executions: 8274, Executions with processing timeout: 0, Containers removed gracefully: 0, Containers removed with force: 0, Containers removal failed: 0, Nodes removed successfully: 43, Nodes removal failed: 0, Container removal average duration (gracefully): 0 ms, Container removal average duration (force): 0 ms, Average overall runtime of watchdog: 910 ms, Average runtime of container retrieval: 234 ms Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog loadNodeMap We currently have 20 nodes assigned to this Jenkins instance, which we will check Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute Checking Docker Cloud docker-agents at tcp://docker-agent-host:4243 Aug 24, 2021 11:51:40 AM INFO com.nirima.jenkins.plugins.docker.DockerContainerWatchdog execute Docker Container Watchdog check has been completed Aug 24, 2021 11:51:40 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0 Finished DockerContainerWatchdog Asynchronous Periodic Work. 7 ms Aug 24, 2021 11:51:52 AM FINE com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy terminating ansible2.9-base-centos7-003xx52xa8gj7 since test-job #31 seems to be finished Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println Disconnected computer for node 'ansible2.9-base-centos7-003xx52xa8gj7'. Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println Can't stop container '1704c833eb9299adafa8257e6c9896a82cb292be98e139d679034e8646c7bea4' for node 'ansible2.9-base-centos7-003xx52xa8gj7' as it does not exist. Aug 24, 2021 11:51:52 AM INFO io.jenkins.docker.DockerTransientNode$1 println Removed Node for node 'ansible2.9-base-centos7-003xx52xa8gj7'. I also has enabled the docker debug trace and appear a explicit docker delete command : Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.805591390Z" level=debug msg="Assigning addresses for endpoint serene_villani's interface on network bridge" Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.812651917Z" level=debug msg="Programming external connectivity on endpoint serene_villani (eecc35eb5d7c74c16be3f557fc84ddb6ce86bfd7485061c558c0802327a4f49f)" Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.814811628Z" level=debug msg="EnableService 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 START" Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.814841839Z" level=debug msg="EnableService 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 DONE" Aug 24 12:06:57 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:57.818744124Z" level=debug msg="bundle dir created" bundle=/var/run/docker/containerd/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 module=libcontainerd namespace=moby root=/var/lib/docker/overlay2/2773a0dca5fc99396f556e5e1b897224cf231a947f45c960bb6bf46c52ee1f6b/merged Aug 24 12:06:57 docker-agent-host containerd[1102]: time="2021-08-24T12:06:57.845538905Z" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 pid=9806 Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Aug 24 12:06:58 docker-agent-host kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth71a50aa: link becomes ready Aug 24 12:06:58 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered blocking state Aug 24 12:06:58 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered forwarding state Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.371993182Z" level=debug msg="sandbox set key processing took 156.552676ms for container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.438854804Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/create Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.472247002Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/start Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.560652987Z" level=debug msg="Calling PUT /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/archive?path=%2Fhome%2Fjenkins&noOverwriteDirNonDir=false" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.560874237Z" level=debug msg="container mounted via layerStore: &{/var/lib/docker/overlay2/2773a0dca5fc99396f556e5e1b897224cf231a947f45c960bb6bf46c52ee1f6b/merged 0x55f316e67f40 0x55f316e67f40}" container=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.561455163Z" level=debug msg="unpigz binary not found, falling back to go gzip library" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.701300673Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.706687477Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.711112832Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/exec" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.711243982Z" level=debug msg="form data: {\"AttachStderr\":true,\"AttachStdin\":true,\"AttachStdout\":true,\"Cmd\":[\"/usr/bin/java\",\"-jar\",\"/home/jenkins/remoting-4.5.jar\",\"-noReconnect\",\"-noKeepAlive\",\"-agentLog\",\"/home/jenkins/agent.log\"],\"Tty\":false,\"User\":\"jenkins\",\"containerId\":\"8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846\"}" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712526832Z" level=debug msg="Calling POST /v1.32/exec/304ccc7fbf96afacc204b489462233dddf6839ff1ceac06d8eaca3446f5df773/start" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712619072Z" level=debug msg="form data: {\"Detach\":false,\"Tty\":false}" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.712931510Z" level=debug msg="starting exec command 304ccc7fbf96afacc204b489462233dddf6839ff1ceac06d8eaca3446f5df773 in container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714921605Z" level=debug msg="attach: stderr: begin" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714922892Z" level=debug msg="attach: stdout: begin" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.714937302Z" level=debug msg="attach: stdin: begin" Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.718168804Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-added Aug 24 12:06:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:06:58.811154921Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-started Aug 24 12:07:02 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:02.061778175Z" level=debug msg="Calling GET /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/json" Aug 24 12:07:12 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:12.467544842Z" level=debug msg="Calling HEAD /_ping" Aug 24 12:07:12 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:12.468691041Z" level=debug msg="Calling GET /v1.41/containers/json" Aug 24 12:07:17 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:17.753068188Z" level=debug msg="Calling HEAD /_ping" Aug 24 12:07:21 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:21.431766056Z" level=debug msg="Calling HEAD /_ping" Aug 24 12:07:21 docker-agent-host dockerd[22607]: time="2021-08-24T12:07:21.433187102Z" level=debug msg="Calling GET /v1.41/containers/8f8/json" Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.776240745Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D" Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.783347656Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/stop?t=10" Aug 24 12:11:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:37.783654630Z" level=debug msg="Sending kill signal 15 to container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846" Aug 24 12:11:40 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:40.906046192Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.852784520Z" level=info msg="Container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 failed to exit within 10 seconds of signal 15 - using the force" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.852972852Z" level=debug msg="Sending kill signal 9 to container 8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.937757918Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938300206Z" level=debug msg="attach: stdout: end" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938332963Z" level=debug msg="attach: stderr: end" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938460037Z" level=debug msg="attach: stdin: end" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.938559126Z" level=debug msg="attach done" Aug 24 12:11:47 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:47.941605088Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.000044557Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/delete Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.000105917Z" level=info msg="ignoring event" container=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" Aug 24 12:11:48 docker-agent-host containerd[1102]: time="2021-08-24T12:11:48.000436444Z" level=info msg="shim disconnected" id=8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846 Aug 24 12:11:48 docker-agent-host containerd[1102]: time="2021-08-24T12:11:48.000664261Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed" Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.002242004Z" level=debug msg="Revoking external connectivity on endpoint serene_villani (eecc35eb5d7c74c16be3f557fc84ddb6ce86bfd7485061c558c0802327a4f49f)" Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.010704047Z" level=debug msg="DeleteConntrackEntries purged ipv4:0, ipv6:0" Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state Aug 24 12:11:48 docker-agent-host kernel: device veth71a50aa left promiscuous mode Aug 24 12:11:48 docker-agent-host kernel: docker0: port 1(veth71a50aa) entered disabled state Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070537216Z" level=debug msg="Releasing addresses for endpoint serene_villani's interface on network bridge" Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070603344Z" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.2)" Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.070725504Z" level=debug msg="Released address PoolID:LocalDefault/172.17.0.0/16, Address:172.17.0.2 Sequence:App: ipam/default/data, ID: LocalDefault/172.17.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65532, Sequence: (0xe0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:3" Aug 24 12:11:48 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:48.105589404Z" level=debug msg="Calling DELETE /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846?v=true" Aug 24 12:11:58 docker-agent-host dockerd[22607]: time="2021-08-24T12:11:58.007798583Z" level=debug msg="Calling POST /containers/8f875599d2e1edbcbbca68d92f16c3242fed6212d00d80720de86f1a5fa3d846/stop?t=10" Aug 24 12:16:37 docker-agent-host dockerd[22607]: time="2021-08-24T12:16:37.775109905Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D" Aug 24 12:16:40 docker-agent-host dockerd[22607]: time="2021-08-24T12:16:40.906616889Z" level=debug msg="Calling GET /containers/json?all=true&filters=%7B%22label%22%3A%5B%22com.nirima.jenkins.plugins.docker.JenkinsId%3Da1fdb6cddc65e3808d7849dcbf333580%22%5D%7D" Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145111111Z" level=debug msg="Closing buffered stdin pipe" Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145378406Z" level=debug msg="Closing buffered stdin pipe" Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145648625Z" level=debug msg="Closing buffered stdin pipe" Aug 24 12:17:18 docker-agent-host dockerd[22607]: time="2021-08-24T12:17:18.145677775Z" level=debug msg="Closing buffered stdin pipe" I think the problem could be related with the Watchdog and the filter that is using. It is asking docker host about the status of the containters that jenkins has created. Docker host returns all the containers on the filter (labeled by jenkins) and it was unable to identify the running container for some reason (name are not matching) The execution of this code is always the same: Jenkins.getInstance().clouds.get(com.nirima.jenkins.plugins.docker.DockerCloud).CONTAINERS_IN_PROGRESS ----- Result: {} And the logs always shows the message from com.nirima.jenkins.plugins.docker.strategy.DockerOnceRetentionStrategy terminating node-name since job-name #37 seems to be finished Even if the job has not finished. For some reason the plugin is unable to detect that the job still running.
          rmshnair Manikandan added a comment -

          I agree to the point that Jenkins is sending the explicit command to stop the slave container. but not able to realize why it making a decision to stop container assuming job is complete.

          rmshnair Manikandan added a comment - I agree to the point that Jenkins is sending the explicit command to stop the slave container. but not able to realize why it making a decision to stop container assuming job is complete.

          Thanks to this blog post :
          https://gitmemory.com/issue/jenkinsci/docker-plugin/678/485778494

          I have found the problem. We had a jenkins test instance running with the same configuration using the same docker-host (it was not job running, we only use it for evaluate upgrades and new plugins).
          We have copied the configuration from one to another and it is keeping the same jenkinsID.
          If you've got multiple Jenkins servers using the same docker host, you MUST ensure that they have different instance IDs, otherwise the background cleanup DockerContainerWatchdog will kill the other Jenkins server's containers.

          fernando_rosado Fernando Rosado Altamirano added a comment - Thanks to this blog post : https://gitmemory.com/issue/jenkinsci/docker-plugin/678/485778494 I have found the problem. We had a jenkins test instance running with the same configuration using the same docker-host (it was not job running, we only use it for evaluate upgrades and new plugins). We have copied the configuration from one to another and it is keeping the same jenkinsID. If you've got multiple Jenkins servers using the same docker host, you MUST ensure that they have different instance IDs, otherwise the background cleanup DockerContainerWatchdog will kill the other Jenkins server's containers.
          rmshnair Manikandan added a comment -

          Interesting, in my case, i had only one Jenkins running in the docker. will automatic restart of jenkins server container to simulates this scenario...

          rmshnair Manikandan added a comment - Interesting, in my case, i had only one Jenkins running in the docker. will automatic restart of jenkins server container to simulates this scenario...
          pjdarton pjdarton added a comment -

          FYI I had a go at re-writing the retention-strategy that docker uses so that it'll detect (and cope with) having multiple things running on a "single use" node and only terminate once it's done.
          You can find this in docker-plugin PR-859 and, from there, you can download a .hpi file containing the as-yet-unreleased code from the jenkinsci build server.

          So, if you're certain that you don't have multiple Jenkins servers with the same internal ID pointing at the same docker host/swarm causing them to stomp on each other and you're still seeing this issue, try this code - it might fix it.
          If you do try it, please do let me know how things fared (preferably by commenting on PR-859), and in particular tell me whether it helped or made things worse.

          Note: This code also includes extra logging from com.nirima.jenkins.plugins.docker.strategyDockerOnceRetentionStrategy at level FINE (logs disconnections) and FINER (logs whether it's disconnecting or not) so if you can reproduce this issue, please include FINER logging from this code.

          pjdarton pjdarton added a comment - FYI I had a go at re-writing the retention-strategy that docker uses so that it'll detect (and cope with) having multiple things running on a "single use" node and only terminate once it's done. You can find this in docker-plugin PR-859 and, from there, you can download a .hpi file containing the as-yet-unreleased code from the jenkinsci build server . So, if you're certain that you don't have multiple Jenkins servers with the same internal ID pointing at the same docker host/swarm causing them to stomp on each other and you're still seeing this issue, try this code - it might fix it. If you do try it, please do let me know how things fared (preferably by commenting on PR-859), and in particular tell me whether it helped or made things worse. Note: This code also includes extra logging from com.nirima.jenkins.plugins.docker.strategyDockerOnceRetentionStrategy at level FINE (logs disconnections) and FINER (logs whether it's disconnecting or not) so if you can reproduce this issue, please include FINER logging from this code.

          People

            Unassigned Unassigned
            owenmehegan Owen Mehegan
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated: