We seem to be running into an issue about once per day where multiple threads deadlock trying to access and update resources within the EC2 plugin.
We have several jobs that add substantial numbers of subjobs (~40) to the build queue, and they thus invoke the Pipeline step `ec2 cloud: 'AWS Cloud', template: 'Micro'` several times to preallocate enough EC2 nodes to run them all (though it looks like this behavior will no longer be necessary in ec2-plugin 1.40).
In addition, it seems that manually provisioning a node through the UI or manually deleting a node has a chance of deadlocking if it runs at the same time as the provisioning or unprovisioning process happens.
The following stacktrace shows the three threads running in 1.40-SNAPSHOT (master as of Friday afternoon).
Warning, the following threads are deadlocked : Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700, jenkins.util.Timer [#3], jenkins.util.Timer [#6]
"jenkins.util.Timer [#3]" daemon prio=5 WAITING
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
hudson.model.Queue._withLock(Queue.java:1437)
hudson.model.Queue.withLock(Queue.java:1300)
jenkins.model.Nodes.updateNode(Nodes.java:193)
jenkins.model.Jenkins.updateNode(Jenkins.java:2077)
hudson.model.Node.save(Node.java:140)
hudson.util.PersistedList.onModified(PersistedList.java:173)
hudson.util.PersistedList.replaceBy(PersistedList.java:85)
hudson.model.Slave.<init>(Slave.java:198)
hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:134)
hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49)
hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42)
hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:899)
hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:606)
hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:578)
hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:415)
hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:542)
hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:557)
hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748) "jenkins.util.Timer [#6]" daemon prio=5 BLOCKED
hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:671)
hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47)
hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:452)
hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:420)
hudson.plugins.ec2.EC2OndemandSlave.terminate(EC2OndemandSlave.java:68)
hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:360)
hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:126)
hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:88)
hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:46)
hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
hudson.model.Queue._withLock(Queue.java:1380)
hudson.model.Queue.withLock(Queue.java:1257)
hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
"Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700" prio=5 WAITING
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
hudson.model.Queue.schedule2(Queue.java:587)
hudson.model.Queue.schedule2(Queue.java:713)
jenkins.model.ParameterizedJobMixIn.doBuild(ParameterizedJobMixIn.java:217)
jenkins.model.ParameterizedJobMixIn$ParameterizedJob.doBuild(ParameterizedJobMixIn.java:408)
java.lang.invoke.LambdaForm$DMH/227306521.invokeInterface_L4_V(LambdaForm$DMH)
java.lang.invoke.LambdaForm$BMH/1196970080.reinvoke(LambdaForm$BMH)
java.lang.invoke.LambdaForm$MH/457755914.invoker(LambdaForm$MH)
java.lang.invoke.LambdaForm$MH/145876426.invokeExact_MT(LambdaForm$MH)
java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
org.kohsuke.stapler.Stapler.service(Stapler.java:238)
javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
We upgraded to 1.40-SNAPSHOT after running into similar global deadlocks in 1.38 and 1.39, which I can attach stack dumps for, but since current master has a lot of reworking of the locking code, I'm not sure if they'll be useful.