-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Jenkins LTS 2.249.2
Docker Plugin 1.2.1
I'm not sure what has triggered this behavior. We've been using the docker plugin to spin up agents for about 6 months now and its worked pretty much flawlessly to this point. We've just recently started seeing this strange behavior where Jenkins will not spin up new agents while jobs are waiting in the queue. These jobs in the queue will just sit there forever. Eventually we get to a point where there are no agents running but multiple jobs queued up.
We have 11 cloud instances. Each instance has multiple templates associated with it. These cloud instances are all petty much identical. They serve the same templates and labels. The agents connect via ssh.
The only way I can get things back working is to restart the service. Once the service comes backup, Jenkins starts servicing the job requests again.
The only thing I can see in my logs is this entry.
09-Nov-2020 11:51:56.879 SEVERE [dockerjava-netty-3426-1] com.github.dockerjava.core.async.ResultCallbackTemplate.onError Error during callback
com.github.dockerjava.api.exception.NotFoundException: {"message":"No such container: a96167b9016d2624870640933da629f429ad736a473d690ee70fac3cd97bf211"}
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103)
at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1432)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1199)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1243)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:648)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:583)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:500)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
If I look for that container hash in the logs I see this prior to the above. So it looks like the container is created, does its job, is removed. About 8 minutes after removal we get the error above about it being missing.
09-Nov-2020 11:36:22.228 INFO Computer.threadPoolForRemoting [#51286] com.nirima.jenkins.plugins.docker.DockerTemplate.doProvisionNode Started container ID a96167b9016d2624870640933da629f429ad736a473d690ee70fac3cd97bf211 for node DK_COSCOMMON7_D03-00063jpilpdzc from image: pmsplb-cos-tools.dev.datacard.com:8600/centos7_common:latest
09-Nov-2020 11:43:47.104 INFO Computer.threadPoolForRemoting [#50769] io.jenkins.docker.DockerTransientNode$1.println Stopped container 'a96167b9016d2624870640933da629f429ad736a473d690ee70fac3cd97bf211' for node 'DK_COSCOMMON7_D03-00063jpilpdzc'.
09-Nov-2020 11:43:48.538 INFO Computer.threadPoolForRemoting [#50769] io.jenkins.docker.DockerTransientNode$1.println Removed container 'a96167b9016d2624870640933da629f429ad736a473d690ee70fac3cd97bf211' for node 'DK_COSCOMMON7_D03-00063jpilpdzc'.
I'm not sure where else to look at this point.