Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53858

Deadlock on EC2 resources

    XMLWordPrintable

Details

    • Bug
    • Status: Fixed but Unreleased (View Workflow)
    • Critical
    • Resolution: Fixed
    • ec2-plugin
    • Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1

    Description

      We seem to be running into an issue about once per day where multiple threads deadlock trying to access and update resources within the EC2 plugin.

      We have several jobs that add substantial numbers of subjobs (~40) to the build queue, and they thus invoke the Pipeline step `ec2 cloud: 'AWS Cloud', template: 'Micro'` several times to preallocate enough EC2 nodes to run them all (though it looks like this behavior will no longer be necessary in ec2-plugin 1.40).

      In addition, it seems that manually provisioning a node through the UI or manually deleting a node has a chance of deadlocking if it runs at the same time as the provisioning or unprovisioning process happens.

       

      The following stacktrace shows the three threads running in 1.40-SNAPSHOT (master as of Friday afternoon).

      Warning, the following threads are deadlocked : Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700, jenkins.util.Timer [#3], jenkins.util.Timer [#6]
      
       "jenkins.util.Timer [#3]" daemon prio=5 WAITING
         sun.misc.Unsafe.park(Native Method)
         java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
         java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
         java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
         hudson.model.Queue._withLock(Queue.java:1437)
         hudson.model.Queue.withLock(Queue.java:1300)
         jenkins.model.Nodes.updateNode(Nodes.java:193)
         jenkins.model.Jenkins.updateNode(Jenkins.java:2077)
         hudson.model.Node.save(Node.java:140)
         hudson.util.PersistedList.onModified(PersistedList.java:173)
         hudson.util.PersistedList.replaceBy(PersistedList.java:85)
         hudson.model.Slave.<init>(Slave.java:198)
         hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:134)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42)
         hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:899)
         hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:606)
         hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:578)
         hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:415)
         hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:542)
         hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:557)
         hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
         hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
         hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
         hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748) "jenkins.util.Timer [#6]" daemon prio=5 BLOCKED
         hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:671)
         hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47)
         hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:452)
         hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:420)
         hudson.plugins.ec2.EC2OndemandSlave.terminate(EC2OndemandSlave.java:68)
         hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:360)
         hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:126)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:88)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:46)
         hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
         hudson.model.Queue._withLock(Queue.java:1380)
         hudson.model.Queue.withLock(Queue.java:1257)
         hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748)
      
      "Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700" prio=5 WAITING
        sun.misc.Unsafe.park(Native Method)
        java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        hudson.model.Queue.schedule2(Queue.java:587)
        hudson.model.Queue.schedule2(Queue.java:713)
        jenkins.model.ParameterizedJobMixIn.doBuild(ParameterizedJobMixIn.java:217)
        jenkins.model.ParameterizedJobMixIn$ParameterizedJob.doBuild(ParameterizedJobMixIn.java:408)
        java.lang.invoke.LambdaForm$DMH/227306521.invokeInterface_L4_V(LambdaForm$DMH)
        java.lang.invoke.LambdaForm$BMH/1196970080.reinvoke(LambdaForm$BMH)
        java.lang.invoke.LambdaForm$MH/457755914.invoker(LambdaForm$MH)
        java.lang.invoke.LambdaForm$MH/145876426.invokeExact_MT(LambdaForm$MH)
        java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
        org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
        org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
        org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
        org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
        org.kohsuke.stapler.Stapler.service(Stapler.java:238)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
        org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
        net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
        org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
        hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
      

      We upgraded to 1.40-SNAPSHOT after running into similar global deadlocks in 1.38 and 1.39, which I can attach stack dumps for, but since current master has a lot of reworking of the locking code, I'm not sure if they'll be useful.

      Attachments

        Issue Links

          Activity

            People

              thoulen FABRIZIO MANFREDI
              masond Mason Donahue
              Votes:
              18 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: