Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-55066

Docker plugin erroneously terminates containers shortly after they start

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We are seeing an issue where Pipeline jobs using Docker agents (with the Docker plugin, as opposed to Docker containers on regular agents using Pipeline's Docker support) intermittently fail right at the start, during the initial git checkout, with a "FATAL: java.io.IOException: Unexpected termination of the channel" exception. Having enabled debug logging for the Docker plugin, it appears that the plugin is erroneously killing the container because it thinks it is no longer needed.

      Job log:

      [First few lines redacted, this is the Jenkinsfile checkout]
      
      Checking out Revision 0b45f687992585a470e5faf003309b215e3f74f1 (refs/remotes/origin/master)
       > git config core.sparsecheckout # timeout=10
       > git checkout -f 0b45f687992585a470e5faf003309b215e3f74f1
      FATAL: java.io.IOException: Unexpected termination of the channel
      java.io.EOFException
              at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2679)
              at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3154)
              at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
              at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
              at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
              at hudson.remoting.Command.readFrom(Command.java:140)
              at hudson.remoting.Command.readFrom(Command.java:126)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      Caused: java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
      Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to docker-2ae12755b75761
                      at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
                      at hudson.remoting.Request.call(Request.java:202)
                      at hudson.remoting.Channel.call(Channel.java:954)
                      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
                      at com.sun.proxy.$Proxy118.withRepository(Unknown Source)
                      at org.jenkinsci.plugins.gitclient.RemoteGitImpl.withRepository(RemoteGitImpl.java:235)
                      at hudson.plugins.git.GitSCM.printCommitMessageToLog(GitSCM.java:1271)
                      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1244)
                      at hudson.scm.SCM.checkout(SCM.java:504)
                      at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
                      at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
                      at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
                      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
                      at hudson.model.Run.execute(Run.java:1815)
                      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
                      at hudson.model.ResourceController.execute(ResourceController.java:97)
                      at hudson.model.Executor.run(Executor.java:429)
      Caused: hudson.remoting.RequestAbortedException
              at hudson.remoting.Request.abort(Request.java:340)
              at hudson.remoting.Channel.terminate(Channel.java:1038)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:96)
      Finished: FAILURE

      Docker plugin debug log:

      2018-11-15 13:59:56.444+0000 [id=25]    FINE    c.n.j.p.d.s.DockerOnceRetentionStrategy#done: terminating docker-2ae12755b75761 since PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask{runId=CompileTest#22494,label=docker-2ae12755b75761,context=CpsStepContext[4:node]:Owner[CompileTest/22494:CompileTest #22494],cookie=561ba1da-fd51-4ee6-9bc3-5d4bb75a9fd0,auth=null} seems to be finished
      2018-11-15 13:59:56.446+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Disconnected computer for slave 'docker-2ae12755b75761'.
      2018-11-15 13:59:56.448+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Removed Node for slave 'docker-2ae12755b75761'. 

      Jenkins log:

      2018-11-15 13:59:56.445+0000 [id=2063144]       SEVERE  h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel docker-2ae12755b75761
      java.net.SocketException: Socket closed
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
              at java.net.SocketInputStream.read(SocketInputStream.java:171)
              at java.net.SocketInputStream.read(SocketInputStream.java:141)
              at java.net.SocketInputStream.read(SocketInputStream.java:127)
              at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:41)
              at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:25)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
      2018-11-15 13:59:56.446+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Disconnected computer for slave 'docker-2ae12755b75761'.
      2018-11-15 13:59:56.448+0000 [id=2063156]       INFO    i.j.docker.DockerTransientNode$1#println: Removed Node for slave 'docker-2ae12755b75761'.
      

       

      The timestamps of the logs seem to indicate that the Docker plugin erroneously thinks that the job, or at least a step in the job, has completed and so the container should be terminated. This happens a couple of times a day on the same job, but most builds do not fail.

        Attachments

          Activity

          Hide
          pjdarton pjdarton added a comment -

          If anyone has, they've kept it to themselves.

          As I said before, to fix this, we'll need a decent test case that'll reproduce the issue, using the latest docker-plugin and using standard docker images.

          If you can't reproduce the issue using the latest docker-plugin and standard docker images then I guess that's your solution - use the latest docker-plugin and use standard docker images.

          Show
          pjdarton pjdarton added a comment - If anyone has, they've kept it to themselves. As I said before, to fix this, we'll need a decent test case that'll reproduce the issue, using the latest docker-plugin and using standard docker images. If you can't reproduce the issue using the latest docker-plugin and standard docker images then I guess that's your solution - use the latest docker-plugin and use standard docker images.
          Hide
          amidar Amit Dar added a comment -

          is issue started popping on our site as well.

          jenkins 2.249.2

          docker plugin 1.2.1

           

          from the looks of it, this is not being well handled. can any of the previous commenters add any info regarding this issue?

          Show
          amidar Amit Dar added a comment - is issue started popping on our site as well. jenkins 2.249.2 docker plugin 1.2.1   from the looks of it, this is not being well handled. can any of the previous commenters add any info regarding this issue?
          Hide
          pjdarton pjdarton added a comment -

          Personally, I'd be suspicious of anything that called itself PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask

          FYI the docker plugin terminates the container when the task is complete - that's "as designed" (and also "as intended", i.e. I also believe the design is correct).  However, the docker plugin doesn't decide when things are "done", it is told when things are done, so if it's told a task is done when it isn't, this is the kind of symptom you'll see.  My expectation here is that this probably isn't a bug with the docker-plugin at all but instead a bug with whatever is telling the docker plugin it's time to kill the container.

          A quick trip to google revealed this javadoc which implies that this is down to the "workflow-durable-task-step" code not doing what it says here ... or maybe the closure is not as "done" as that code thought it was.

           

          So, all I can do is re-iterate what I've said twice before: We need a repro case.

          Someone who's experiencing this issue needs to take the time to reduce it down to just the minimal set of conditions required to make it happen.  If someone does that then we have a solvable bug; until someone does that, all we have is a bunch of folks providing sympathy & empathy for fellow suffers, but no actual help.

          TL;DR: If this is bugging you, demonstrate the bug; make it easier for people to help you.

          Show
          pjdarton pjdarton added a comment - Personally, I'd be suspicious of anything that called itself PlaceholderExecutable:ExecutorStepExecution.PlaceholderTask FYI the docker plugin terminates the container when the task is complete - that's "as designed" (and also "as intended", i.e. I also believe the design is correct).  However, the docker plugin doesn't decide when things are "done", it is told when things are done, so if it's told a task is done when it isn't, this is the kind of symptom you'll see.  My expectation here is that  this probably isn't a bug with the docker-plugin at all but instead a bug with whatever is telling the docker plugin it's time to kill the container. A quick trip to google revealed this javadoc which implies that this is down to the "workflow-durable-task-step" code not doing what it says here ... or maybe the closure is not as "done" as that code thought it was.   So, all I can do is re-iterate what I've said twice before: We need a repro case. Someone who's experiencing this issue needs to take the time to reduce it down to just the minimal set of conditions required to make it happen.  If someone does that then we have a solvable bug; until someone does that, all we have is a bunch of folks providing sympathy & empathy for fellow suffers, but no actual help. TL;DR: If this is bugging you, demonstrate the bug; make it easier for people to help you.
          Hide
          owenmehegan Owen Mehegan added a comment -

          Just wanted to say that I saw this issue while assisting a CloudBees customer, which is what led me to file this bug with the information I gathered from them. But they eventually went silent and we never made any further progress, so I don't have anything more I can offer.

          Show
          owenmehegan Owen Mehegan added a comment - Just wanted to say that I saw this issue while assisting a CloudBees customer, which is what led me to file this bug with the information I gathered from them. But they eventually went silent and we never made any further progress, so I don't have anything more I can offer.
          Hide
          mwils2424 Matt Wilson added a comment - - edited

          I'm not sure if I have this problem exactly, but its pretty close.  We've been using this plugin since spring, but we've started to see problem where jobs that run on containers aren't being started.  They would just queue up forever until the jenkins service gets restarted.  This issue just seems to have come out of the blue.  It went away for a while, but now its back.

           

          I've noticed this pattern in my debugging

          1. run a job, jobs completes fine in a container called DK_COSCOMMON7_D15-0000p4hq2f6yg.
          2. logs indicate there was some kind of socket io error, but that the container has been removed
            I/O error in channel DK_COSCOMMON7_D15-0000p4hq2f6yg
            java.net.SocketException: Socket closed
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
            at java.net.SocketInputStream.read(SocketInputStream.java:171)
            at java.net.SocketInputStream.read(SocketInputStream.java:141)
            at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
            at sun.security.ssl.InputRecord.read(InputRecord.java:503)
            at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
            at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
            at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
            at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:49)
            at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:31)
            at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:92)
            at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
            at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
            at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
            at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
            at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
            Removed Node for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'.
            Dec 12, 2020 7:39:35 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnStopped container '2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2' for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'.
            Dec 12, 2020 7:39:35 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnRemoved container '2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2' for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'.
          3. when I try and start a new job, In the gui it shows that it tries to reuse "DK_COSCOMMON7_D15-0000p4hq2f6yg", but then flips back to the label assigned to that template DK_COSCOMMON7.  logs show this

          Dec 12, 2020 7:40:12 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onErrorError during callback com.github.dockerjava.api.exception.NotFoundException: {"message":"No such container: 2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2"} at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103) at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1432) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1199) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1243) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:648) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:583) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:500) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
          Dec 12, 2020 7:40:12 AM INFO hudson.slaves.NodeProvisioner lambda$update$6Image of pmsplb-cos-tools.dev.datacard.com:8600/centos7_common:latest provisioning successfully completed. We have now 113 computer(s)
          Dec 12, 2020 7:40:12 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnDisconnected computer for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'.
          Dec 12, 2020 7:40:13 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnRemoved Node for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'.

          1. at this point my job will sit in the queue for ever, it will never restart  unless I cancel it, and then restart it.  When I restart it, the cycle continues.  job runs, second run doesn't run, cancel, next run goes.

           

          this one really drives me crazy.  I have no idea at this point what is going on.  what is causing that socket error.  Still trying to figure that part out.

           

          the above was using the "attach" container method.  I've also tried with with "ssh" which we normally use, and the pattern is identical, except that the ssh boxes don't generate the socket connection error.

          My nodes folder on my jenkins server does not list the node name post build, i.e. that is being cleaned up.  A console call to list the connected slaves does not return that machine either.  no idea where it is retaining this information.

           

          restarting the jenkins server clears all this up for a random period of time.  Could be days, weeks or hours.  I've probably restarted the service 10 times in the last 3 days.  

          this is with jenkins version 2.249.3 although I'm about to upgrade to the new lts version 2.263.1.  

          Show
          mwils2424 Matt Wilson added a comment - - edited I'm not sure if I have this problem exactly, but its pretty close.  We've been using this plugin since spring, but we've started to see problem where jobs that run on containers aren't being started.  They would just queue up forever until the jenkins service gets restarted.  This issue just seems to have come out of the blue.  It went away for a while, but now its back.   I've noticed this pattern in my debugging run a job, jobs completes fine in a container called DK_COSCOMMON7_D15-0000p4hq2f6yg. logs indicate there was some kind of socket io error, but that the container has been removed I/O error in channel DK_COSCOMMON7_D15-0000p4hq2f6yg java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:49) at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:31) at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:92) at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72) at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103) at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Removed Node for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'. Dec 12, 2020 7:39:35 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnStopped container '2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2' for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'. Dec 12, 2020 7:39:35 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnRemoved container '2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2' for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'. when I try and start a new job, In the gui it shows that it tries to reuse "DK_COSCOMMON7_D15-0000p4hq2f6yg", but then flips back to the label assigned to that template DK_COSCOMMON7.  logs show this Dec 12, 2020 7:40:12 AM SEVERE com.github.dockerjava.core.async.ResultCallbackTemplate onErrorError during callback com.github.dockerjava.api.exception.NotFoundException: {"message":"No such container: 2f42050429e1cb26c2bb4067c5cc031cc08bdff118fe70bf73624f3758fa17d2"} at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103) at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1432) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1199) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1243) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:648) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:583) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:500) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Dec 12, 2020 7:40:12 AM INFO hudson.slaves.NodeProvisioner lambda$update$6Image of pmsplb-cos-tools.dev.datacard.com:8600/centos7_common:latest provisioning successfully completed. We have now 113 computer(s) Dec 12, 2020 7:40:12 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnDisconnected computer for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'. Dec 12, 2020 7:40:13 AM INFO io.jenkins.docker.DockerTransientNode$1 printlnRemoved Node for node 'DK_COSCOMMON7_D15-0000p4hq2f6yg'. at this point my job will sit in the queue for ever, it will never restart  unless I cancel it, and then restart it.  When I restart it, the cycle continues.  job runs, second run doesn't run, cancel, next run goes.   this one really drives me crazy.  I have no idea at this point what is going on.  what is causing that socket error.  Still trying to figure that part out.   the above was using the "attach" container method.  I've also tried with with "ssh" which we normally use, and the pattern is identical, except that the ssh boxes don't generate the socket connection error. My nodes folder on my jenkins server does not list the node name post build, i.e. that is being cleaned up.  A console call to list the connected slaves does not return that machine either.  no idea where it is retaining this information.   restarting the jenkins server clears all this up for a random period of time.  Could be days, weeks or hours.  I've probably restarted the service 10 times in the last 3 days.   this is with jenkins version 2.249.3 although I'm about to upgrade to the new lts version 2.263.1.  

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            owenmehegan Owen Mehegan
            Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated: