-
Bug
-
Resolution: Fixed
-
Major
-
None
When plugin gets configured with idleMinutes == 0, not only the agents get deleted sooner then they are launched turning the setup practically useless as reported in . It even seem to cause the provisioning for serviced labels to stuck almost completely. Here is what I observed is happening:JENKINS-47953
- Agents was created and launching yet disappearing instantly.
- The plugin was logging "No such container"[1] 3 times in a row, roughly for every botched node.
- Both stopped to happen after changing idleMinutes from 0 to 10.
- I presume the
kicked in deleting the container while provisioning was in progress.JENKINS-47953 - Eventually, and presumably because of this, all provisioning stopped with multiple pending launches that never completes[2][3] that are not done nor cancelled and yet they do not have a running thread in stacktrace. This is causing the plannedCapacity > demand so nothing else is provisioned.
-
- I admit a do not quite understand how did the futures get in such state but this too stopped occurring right after fixing idleMinutes and cancelling dangling futures.
Having said that, I suggest to make the 0 timeout unsupported and use some sane defaults even when configured explicitly (this can either happen during migration or manually by user not quite aware of this surprising consequences). An alternative would be ensuring the slave will only be disposed after launched/used.
[1]
Apr 04, 2018 3:10:56 AM com.github.dockerjava.core.async.ResultCallbackTemplate onError SEVERE: Error during callback com.github.dockerjava.api.exception.NotFoundException: {"message":"No such container: ffb2753cdd6ec73ed30477adf26ced6824e4cac434b3f740395e7e255efd7a13"} at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:103) at com.github.dockerjava.netty.handler.HttpResponseHandler.channelRead0(HttpResponseHandler.java:33) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)
[2]
label = Jenkins.instance.getLabel('jslave-idm-docker') Jenkins.instance.clouds.each { assert it.canProvision(label) } def np = label.nodeProvisioner def pl = np.@pendingLaunches.get() println "Pending launches: " + pl.size() pl.each { println "\t${it.displayName} done=${it.future.isDone()} canceled=${it.future.isCancelled()} #=${System.identityHashCode(it.future)}" //it.future.cancel(true) } println np.provisioningState return null
[3]
Pending launches: 36 Image of XXX done=false canceled=false #=423560755 Image of XXX done=false canceled=false #=51689623 Image of XXX done=false canceled=false #=1405078103 Image of XXX done=false canceled=false #=1081078172 Image of XXX done=false canceled=false #=1322747457 Image of XXX done=false canceled=false #=1747017484 Image of XXX done=false canceled=false #=1164037421 Image of XXX done=false canceled=false #=2059216652 Image of XXX done=false canceled=false #=145380244 Image of XXX done=false canceled=false #=1779545266 Image of XXX done=false canceled=false #=567415928 Image of XXX done=false canceled=false #=843366921 Image of XXX done=false canceled=false #=1747456294 Image of XXX done=false canceled=false #=259759285 Image of XXX done=false canceled=false #=1044068598 Image of XXX done=false canceled=false #=1371528757 Image of XXX done=false canceled=false #=1126642027 Image of XXX done=false canceled=false #=1390627009 Image of XXX done=false canceled=false #=2105699038 Image of XXX done=false canceled=false #=1857421890 Image of XXX done=false canceled=false #=341891734 Image of XXX done=false canceled=false #=1544367515 Image of XXX done=false canceled=false #=842491998 Image of XXX done=false canceled=false #=1825425480 Image of XXX done=false canceled=false #=2129333037 Image of XXX done=false canceled=false #=1270845598 Image of XXX done=false canceled=false #=1120105519 Image of XXX done=false canceled=false #=1087273911 Image of XXX done=false canceled=false #=1717220064 Image of XXX done=false canceled=false #=1373042133 Image of XXX done=false canceled=false #=149787084 Image of XXX done=false canceled=false #=994918565 Image of XXX done=false canceled=false #=145959070 Image of XXX done=false canceled=false #=1848450048 Image of XXX done=false canceled=false #=170558456 Image of XXX done=false canceled=false #=1538499184 StrategyState{label=jslave-idm-docker, snapshot=LoadStatisticsSnapshot{definedExecutors=0, onlineExecutors=0, connectingExecutors=0, busyExecutors=0, idleExecutors=0, availableExecutors=0, queueLength=14}, plannedCapacitySnapshot=36, additionalPlannedCapacity=0}
https://github.com/jenkinsci/docker-plugin/pull/623 ensured that the idleMinutes can't be zero.
That was fixed in 1.1.4.