Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-45074

spot-fleet plugin deadlocks master when scaling up

    XMLWordPrintable

Details

    Description

      We see periodically that the spot-fleet causes the master to deadlock on scale up events (I think this also occurs on scale-down events too but I don't have logs for that yet). The master stays up and appears functional, but the queue is locked and you can't submit new builds via the UI. I see this in the Jenkins log:

      Jun 22, 2017 3:10:31 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
      SEVERE: I/O error in channel i-005ce3b8f5ae8c029
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2353)
      at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2822)
      at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:804)
      at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
      at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
      
      Jun 22, 2017 3:11:32 AM com.amazon.jenkins.ec2fleet.EC2FleetCloud updateStatus
      INFO: Found new instances from fleet (docker_ci ec2-fleet ubuntu-16.04): [i-03a1246c8e590d6eb]
      Jun 22, 2017 3:11:32 AM com.amazon.jenkins.ec2fleet.IdleRetentionStrategy <init>
      INFO: Idle Retention initiated
      Jun 22, 2017 3:12:10 AM jenkins.metrics.api.Metrics$HealthChecker execute
      WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#8] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@38b86b17 (owned by jenkins.util.Timer [#4]):
      at sun.misc.Unsafe.park(Native Method)
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
      at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
      at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
      at hudson.model.Queue._withLock(Queue.java:1332)
      at hudson.model.Queue.withLock(Queue.java:1211)
      at jenkins.model.Nodes.addNode(Nodes.java:133)
      at jenkins.model.Jenkins.addNode(Jenkins.java:2115)
      at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:355)
      at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:312)
      at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42)
      at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      , jenkins.util.Timer [#4] locked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@5ba7db19 (owned by jenkins.util.Timer [#8]):
      at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38)
      at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
      at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
      at hudson.model.Queue._withLock(Queue.java:1334)
      at hudson.model.Queue.withLock(Queue.java:1211)
      at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
      at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      ]]
      Jun 22, 2017 3:15:09 AM hudson.model.AsyncPeriodicWork$1 run
      INFO: Started EC2 alive slaves monitor
      Jun 22, 2017 3:15:09 AM hudson.model.AsyncPeriodicWork$1 run
      INFO: Finished EC2 alive slaves monitor. 0 ms
      

       

      I'm not sure why this doesn't happen all the time. It appears that some of the slaves failed to come up, I wonder if that is a culprit. I also wonder if we can do better than the big lock that we place around the master when doing scale up/down. I haven't looked deeply at the code but the other aws-ec2 plugin doesn't seem to hold such an large lock.

      Attachments

        Issue Links

          Activity

            elatt Erik Lattimore created issue -

            It appears that I have hit another deadlock during scale-up. We were going from 20 nodes to 25 with lots of active jobs running (3 executors on each node). Jenkins failed to connect to a couple of the nodes and marked them as offline. The nodes are fine, and the spot-fleet has scaled up to the requested 25 nodes but Jenkins appears to be stuck thinking there are only 22 nodes in the fleet (with the 2 new ones being offline until I manually re-launched the SSH agents). Now they are online but the system appears to be dead-locked. I don't see any issues in the jenkins.log but looking at the thread dump from the UI, if I am reading the output correctly, it looks like a deadlock: thread #4 is waiting on #6 and vice-versa.

            jenkins.util.Timer [#4]
            "jenkins.util.Timer [#4]" Id=35 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 owned by "jenkins.util.Timer [#6]" Id=42
            	at sun.misc.Unsafe.park(Native Method)
            	-  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7
            	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
            	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
            	at hudson.model.Queue._withLock(Queue.java:1340)
            	at hudson.model.Queue.withLock(Queue.java:1219)
            	at jenkins.model.Nodes.addNode(Nodes.java:133)
            	at jenkins.model.Jenkins.addNode(Jenkins.java:2116)
            	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:361)
            	-  locked hudson.model.Hudson@60edd628
            	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318)
            	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae
            	at com.amazon.jenkins.ec2fleet.EC2FleetCloud.provision(EC2FleetCloud.java:206)
            	-  locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae
            	at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
            	at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
            	at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
            	at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
            	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            	at java.lang.Thread.run(Thread.java:745)
            
            	Number of locked synchronizers = 2
            	- java.util.concurrent.ThreadPoolExecutor$Worker@ce6847f
            	- java.util.concurrent.locks.ReentrantLock$NonfairSync@6113d365
            
            jenkins.util.Timer [#5]
            "jenkins.util.Timer [#5]" Id=40 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 owned by "jenkins.util.Timer [#6]" Id=42
            	at sun.misc.Unsafe.park(Native Method)
            	-  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7
            	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
            	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
            	at hudson.model.Queue.maintain(Queue.java:1420)
            	at hudson.model.Queue$MaintainTask.doRun(Queue.java:2770)
            	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            	at java.lang.Thread.run(Thread.java:745)
            
            	Number of locked synchronizers = 1
            	- java.util.concurrent.ThreadPoolExecutor$Worker@34315fae
            
            jenkins.util.Timer [#6]
            "jenkins.util.Timer [#6]" Id=42 Group=main BLOCKED on com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae owned by "jenkins.util.Timer [#4]" Id=35
            	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38)
            	-  blocked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae
            	at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
            	at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
            	at hudson.model.Queue._withLock(Queue.java:1342)
            	at hudson.model.Queue.withLock(Queue.java:1219)
            	at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
            	at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            	at java.lang.Thread.run(Thread.java:745)
            
            	Number of locked synchronizers = 2
            	- java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7
            	- java.util.concurrent.ThreadPoolExecutor$Worker@3b7e68a7
            
            elatt Erik Lattimore added a comment - It appears that I have hit another deadlock during scale-up. We were going from 20 nodes to 25 with lots of active jobs running (3 executors on each node). Jenkins failed to connect to a couple of the nodes and marked them as offline. The nodes are fine, and the spot-fleet has scaled up to the requested 25 nodes but Jenkins appears to be stuck thinking there are only 22 nodes in the fleet (with the 2 new ones being offline until I manually re-launched the SSH agents). Now they are online but the system appears to be dead-locked. I don't see any issues in the jenkins.log but looking at the thread dump from the UI, if I am reading the output correctly, it looks like a deadlock: thread #4 is waiting on #6 and vice-versa. jenkins.util.Timer [#4] "jenkins.util.Timer [#4]" Id=35 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 owned by "jenkins.util.Timer [#6]" Id=42 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1340) at hudson.model.Queue.withLock(Queue.java:1219) at jenkins.model.Nodes.addNode(Nodes.java:133) at jenkins.model.Jenkins.addNode(Jenkins.java:2116) at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:361) - locked hudson.model.Hudson@60edd628 at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318) - locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae at com.amazon.jenkins.ec2fleet.EC2FleetCloud.provision(EC2FleetCloud.java:206) - locked com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Number of locked synchronizers = 2 - java.util.concurrent.ThreadPoolExecutor$Worker@ce6847f - java.util.concurrent.locks.ReentrantLock$NonfairSync@6113d365 jenkins.util.Timer [#5] "jenkins.util.Timer [#5]" Id=40 Group=main WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 owned by "jenkins.util.Timer [#6]" Id=42 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.maintain(Queue.java:1420) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2770) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@34315fae jenkins.util.Timer [#6] "jenkins.util.Timer [#6]" Id=42 Group=main BLOCKED on com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae owned by "jenkins.util.Timer [#4]" Id=35 at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38) - blocked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@67c385ae at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15) at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Number of locked synchronizers = 2 - java.util.concurrent.locks.ReentrantLock$NonfairSync@7bf831f7 - java.util.concurrent.ThreadPoolExecutor$Worker@3b7e68a7
            iwsmatt Matt Thompson added a comment -

            I'm not sure if this is the same scenario but I've also been seeing some deadlocks (that take down the UI) from this plugin lately:

            WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [Handling POST /plugin/swarm/createSlave from 176.16.12.92 : RequestHandler
            Thread[#23] locked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@41b69ab7 (owned by jenkins.util.Timer [#2]):
            at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38)
            at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
            at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:730)
            at hudson.model.Queue._withLock(Queue.java:1342)
            at hudson.model.Queue.withLock(Queue.java:1219)
            at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:727)
            at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
            at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:45)
            at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:192)
            at hudson.model.Queue._withLock(Queue.java:1342)
            at hudson.model.Queue.withLock(Queue.java:1219)
            at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:175)
            at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1571)
            at jenkins.model.Nodes$2.run(Nodes.java:137)
            at hudson.model.Queue._withLock(Queue.java:1342)
            at hudson.model.Queue.withLock(Queue.java:1219)
            at jenkins.model.Nodes.addNode(Nodes.java:133)
            at jenkins.model.Jenkins.addNode(Jenkins.java:2114)
            at hudson.plugins.swarm.PluginImpl.doCreateSlave(PluginImpl.java:219)
            at java.lang.invoke.LambdaForm$DMH/763014505.invokeVirtual_L5IL5I_V(LambdaForm$DMH)
            at java.lang.invoke.LambdaForm$BMH/1014711654.reinvoke(LambdaForm$BMH)
            at java.lang.invoke.LambdaForm$BMH/389617658.reinvoke(LambdaForm$BMH)
            at java.lang.invoke.LambdaForm$MH/1295063972.invoker(LambdaForm$MH)
            at java.lang.invoke.LambdaForm$MH/322204726.invokeExact_MT(LambdaForm$MH)
            at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
            at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
            at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
            at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
            at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
            at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
            at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715)
            at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845)
            at org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
            at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
            at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715)
            at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845)
            at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649)
            at org.kohsuke.stapler.Stapler.service(Stapler.java:238)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
            at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:135)
            at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
            at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:51)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
            at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
            at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
            at jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
            at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132)
            at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:138)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:49)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
            at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at jenkins.security.BasicHeaderProcessor.success(BasicHeaderProcessor.java:139)
            at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:81)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249)
            at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67)
            at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
            at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90)
            at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
            at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
            at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553)
            at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
            at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
            at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
            at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
            at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
            at org.eclipse.jetty.server.Server.handle(Server.java:499)
            at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
            at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
            at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
            at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            , jenkins.util.Timer [#2] locked on hudson.model.Hudson@2a453713 (owned by Handling POST /plugin/swarm/createSlave from 176.16.12.92 : RequestHandlerThread[#23]):
            at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:358)
            at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318)
            at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
            
            iwsmatt Matt Thompson added a comment - I'm not sure if this is the same scenario but I've also been seeing some deadlocks (that take down the UI) from this plugin lately: WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [Handling POST /plugin/swarm/createSlave from 176.16.12.92 : RequestHandler Thread [#23] locked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@41b69ab7 (owned by jenkins.util.Timer [#2]): at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38) at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15) at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:730) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:727) at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120) at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:45) at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:192) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:175) at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1571) at jenkins.model.Nodes$2.run(Nodes.java:137) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) at jenkins.model.Nodes.addNode(Nodes.java:133) at jenkins.model.Jenkins.addNode(Jenkins.java:2114) at hudson.plugins.swarm.PluginImpl.doCreateSlave(PluginImpl.java:219) at java.lang.invoke.LambdaForm$DMH/763014505.invokeVirtual_L5IL5I_V(LambdaForm$DMH) at java.lang.invoke.LambdaForm$BMH/1014711654.reinvoke(LambdaForm$BMH) at java.lang.invoke.LambdaForm$BMH/389617658.reinvoke(LambdaForm$BMH) at java.lang.invoke.LambdaForm$MH/1295063972.invoker(LambdaForm$MH) at java.lang.invoke.LambdaForm$MH/322204726.invokeExact_MT(LambdaForm$MH) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845) at org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:715) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:845) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:649) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:135) at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:225) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132) at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:51) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132) at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132) at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132) at jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:132) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:138) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.success(BasicHeaderProcessor.java:139) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:81) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:553) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) , jenkins.util.Timer [#2] locked on hudson.model.Hudson@2a453713 (owned by Handling POST /plugin/swarm/createSlave from 176.16.12.92 : RequestHandlerThread[#23]): at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:358) at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318) at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748)
            ryeates Robert Yeates added a comment -

            We are seeing the same issue with a 15 node spot fleet and the ec2 fleet plugin to provision slaves and also encountering this deadlock:

            Sep 07, 2017 4:28:35 PM jenkins.metrics.api.Metrics$HealthChecker execute
            WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#3] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@22807dcd (owned by jenkins.util.Timer [#6]):
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
            at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
            at hudson.model.Queue._withLock(Queue.java:1340)
            at hudson.model.Queue.withLock(Queue.java:1219)
            at jenkins.model.Nodes.removeNode(Nodes.java:237)
            at jenkins.model.Jenkins.removeNode(Jenkins.java:2121)
            at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:360)
            at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318)
            at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42)
            at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51)
            at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:748)
            , AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#129] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@22807dcd (owned by jenkins.util.Timer [#6]):
            at sun.misc.Unsafe.park(Native Method)
            at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
            at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
            at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
            at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
            at hudson.model.Queue.maintain(Queue.java:1420)
            at hudson.model.Queue$1.call(Queue.java:321)
            at hudson.model.Queue$1.call(Queue.java:318)
            at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108)
            at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98)
            at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110)
            at java.lang.Thread.run(Thread.java:748)
            , jenkins.util.Timer [#6] locked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@3885e4e5 (owned by jenkins.util.Timer [#3]):
            at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38)
            at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15)
            at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:730)
            at hudson.model.Queue._withLock(Queue.java:1342)
            at hudson.model.Queue.withLock(Queue.java:1219)
            at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:727)
            at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
            at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:45)
            at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:192)
            at hudson.model.Queue._withLock(Queue.java:1342)
            at hudson.model.Queue.withLock(Queue.java:1219)
            

            Jenkins version 2.60.2

            EC2 Fleet plugin version 1.1.4

            Also added details on https://issues.jenkins-ci.org/browse/JENKINS-37483

            ryeates Robert Yeates added a comment - We are seeing the same issue with a 15 node spot fleet and the ec2 fleet plugin to provision slaves and also encountering this deadlock: Sep 07, 2017 4:28:35 PM jenkins.metrics.api.Metrics$HealthChecker execute WARNING: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#3] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@22807dcd (owned by jenkins.util.Timer [#6]): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1340) at hudson.model.Queue.withLock(Queue.java:1219) at jenkins.model.Nodes.removeNode(Nodes.java:237) at jenkins.model.Jenkins.removeNode(Jenkins.java:2121) at com.amazon.jenkins.ec2fleet.EC2FleetCloud.addNewSlave(EC2FleetCloud.java:360) at com.amazon.jenkins.ec2fleet.EC2FleetCloud.updateStatus(EC2FleetCloud.java:318) at com.amazon.jenkins.ec2fleet.CloudNanny.doRun(CloudNanny.java:42) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:51) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang. Thread .run( Thread .java:748) , AtmostOneTaskExecutor[Periodic Jenkins queue maintenance] [#129] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@22807dcd (owned by jenkins.util.Timer [#6]): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue.maintain(Queue.java:1420) at hudson.model.Queue$1.call(Queue.java:321) at hudson.model.Queue$1.call(Queue.java:318) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:108) at jenkins.util.AtmostOneTaskExecutor$1.call(AtmostOneTaskExecutor.java:98) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:110) at java.lang. Thread .run( Thread .java:748) , jenkins.util.Timer [#6] locked on com.amazon.jenkins.ec2fleet.EC2FleetCloud@3885e4e5 (owned by jenkins.util.Timer [#3]): at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:38) at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:15) at hudson.slaves.SlaveComputer$4.run(SlaveComputer.java:730) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) at hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:727) at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120) at hudson.model.AbstractCIBase.access$000(AbstractCIBase.java:45) at hudson.model.AbstractCIBase$2.run(AbstractCIBase.java:192) at hudson.model.Queue._withLock(Queue.java:1342) at hudson.model.Queue.withLock(Queue.java:1219) Jenkins version 2.60.2 EC2 Fleet plugin version 1.1.4 Also added details on  https://issues.jenkins-ci.org/browse/JENKINS-37483
            elatt Erik Lattimore made changes -
            Field Original Value New Value
            Link This issue relates to JENKINS-37483 [ JENKINS-37483 ]
            bkmeneguello Bruno Meneguello added a comment - Potentially fixed with  https://github.com/jenkinsci/ec2-fleet-plugin/pull/14
            terma Artem Stasiuk added a comment -

            Was released under version 1.1.7

            terma Artem Stasiuk added a comment - Was released under version 1.1.7
            terma Artem Stasiuk added a comment -

            Feel free to reopen if you still have problem with it.

            terma Artem Stasiuk added a comment - Feel free to reopen if you still have problem with it.
            terma Artem Stasiuk made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]

            People

              schmutze Chad Schmutzer
              elatt Erik Lattimore
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: