-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1
We seem to be running into an issue about once per day where multiple threads deadlock trying to access and update resources within the EC2 plugin.
We have several jobs that add substantial numbers of subjobs (~40) to the build queue, and they thus invoke the Pipeline step `ec2 cloud: 'AWS Cloud', template: 'Micro'` several times to preallocate enough EC2 nodes to run them all (though it looks like this behavior will no longer be necessary in ec2-plugin 1.40).
In addition, it seems that manually provisioning a node through the UI or manually deleting a node has a chance of deadlocking if it runs at the same time as the provisioning or unprovisioning process happens.
The following stacktrace shows the three threads running in 1.40-SNAPSHOT (master as of Friday afternoon).
Warning, the following threads are deadlocked : Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700, jenkins.util.Timer [#3], jenkins.util.Timer [#6] "jenkins.util.Timer [#3]" daemon prio=5 WAITING sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) hudson.model.Queue._withLock(Queue.java:1437) hudson.model.Queue.withLock(Queue.java:1300) jenkins.model.Nodes.updateNode(Nodes.java:193) jenkins.model.Jenkins.updateNode(Jenkins.java:2077) hudson.model.Node.save(Node.java:140) hudson.util.PersistedList.onModified(PersistedList.java:173) hudson.util.PersistedList.replaceBy(PersistedList.java:85) hudson.model.Slave.<init>(Slave.java:198) hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:134) hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49) hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42) hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:899) hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:606) hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:578) hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:415) hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:542) hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:557) hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61) hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809) hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) "jenkins.util.Timer [#6]" daemon prio=5 BLOCKED hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:671) hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47) hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:452) hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:420) hudson.plugins.ec2.EC2OndemandSlave.terminate(EC2OndemandSlave.java:68) hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:360) hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:126) hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:88) hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:46) hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72) hudson.model.Queue._withLock(Queue.java:1380) hudson.model.Queue.withLock(Queue.java:1257) hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63) hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748) "Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700" prio=5 WAITING sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) hudson.model.Queue.schedule2(Queue.java:587) hudson.model.Queue.schedule2(Queue.java:713) jenkins.model.ParameterizedJobMixIn.doBuild(ParameterizedJobMixIn.java:217) jenkins.model.ParameterizedJobMixIn$ParameterizedJob.doBuild(ParameterizedJobMixIn.java:408) java.lang.invoke.LambdaForm$DMH/227306521.invokeInterface_L4_V(LambdaForm$DMH) java.lang.invoke.LambdaForm$BMH/1196970080.reinvoke(LambdaForm$BMH) java.lang.invoke.LambdaForm$MH/457755914.invoker(LambdaForm$MH) java.lang.invoke.LambdaForm$MH/145876426.invokeExact_MT(LambdaForm$MH) java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248) org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248) org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) org.kohsuke.stapler.Stapler.invoke(Stapler.java:668) org.kohsuke.stapler.Stapler.service(Stapler.java:238) javax.servlet.http.HttpServlet.service(HttpServlet.java:790) org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239) net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215) net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88) org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114) hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99) org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
We upgraded to 1.40-SNAPSHOT after running into similar global deadlocks in 1.38 and 1.39, which I can attach stack dumps for, but since current master has a lot of reworking of the locking code, I'm not sure if they'll be useful.
- is duplicated by
-
JENKINS-56986 Deadlock on EC2 resources and build queue
- Closed
- is related to
-
JENKINS-54187 EC2 Plugin deadlock leaving Jenkins unresponsive
- Closed
-
JENKINS-56986 Deadlock on EC2 resources and build queue
- Closed