-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
* Jenkins 2.190.3
* docker-plugin:1.1.9
Symptom
- At some point in time, the docker plugin stops to provision any new docker agent.
Evidence
- I found out that at some point, the docker plugin tries to provision new containers and hangs:
2020-02-05 23:55:49.244+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null 2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning '***/jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud '***' 2020-02-05 23:55:49.467+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null 2020-02-05 23:55:49.583+0000 [id=67] INFO c.n.j.plugins.docker.DockerCloud#canAddProvisionedSlave: Not Provisioning '***/jenkins-agent:2.190.3.2'. Template instance limit of '8' reached on cloud '***' 2020-02-05 23:55:59.244+0000 [id=71] INFO c.n.j.plugins.docker.DockerCloud#provision: Asked to provision 2 slave(s) for: null // NO MORE ENTRIES
- After this last call, there is no more trace of any attempt from the docker plugin. Looking into the threaddump, I found out that the thread running the DockerCloud code is waiting on an external output:
"jenkins.util.Timer [#6]" id=71 (0x47) state=WAITING cpu=95%
- waiting on <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)
- locked <0x7891a9a1> (a java.util.concurrent.CountDownLatch$Sync)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.github.dockerjava.core.async.ResultCallbackTemplate.awaitCompletion(ResultCallbackTemplate.java:92)
at com.github.dockerjava.netty.InvocationBuilder$ResponseCallback.awaitResult(InvocationBuilder.java:60)
at com.github.dockerjava.netty.InvocationBuilder.get(InvocationBuilder.java:189)
at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:60)
at io.jenkins.docker.client.ListContainersCmdExec.execute(ListContainersCmdExec.java:24)
at com.github.dockerjava.netty.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:35)
at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:614)
at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedSlave(DockerCloud.java:632)
at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:352)
- It appears the wait happens in this code and I'm under the impression that there is not timeout so in case the call hangs, the thread will hang and the plugin won't provision anymore.