• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • ec2-plugin
    • Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1

      We seem to be running into an issue about once per day where multiple threads deadlock trying to access and update resources within the EC2 plugin.

      We have several jobs that add substantial numbers of subjobs (~40) to the build queue, and they thus invoke the Pipeline step `ec2 cloud: 'AWS Cloud', template: 'Micro'` several times to preallocate enough EC2 nodes to run them all (though it looks like this behavior will no longer be necessary in ec2-plugin 1.40).

      In addition, it seems that manually provisioning a node through the UI or manually deleting a node has a chance of deadlocking if it runs at the same time as the provisioning or unprovisioning process happens.

       

      The following stacktrace shows the three threads running in 1.40-SNAPSHOT (master as of Friday afternoon).

      Warning, the following threads are deadlocked : Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700, jenkins.util.Timer [#3], jenkins.util.Timer [#6]
      
       "jenkins.util.Timer [#3]" daemon prio=5 WAITING
         sun.misc.Unsafe.park(Native Method)
         java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
         java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
         java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
         hudson.model.Queue._withLock(Queue.java:1437)
         hudson.model.Queue.withLock(Queue.java:1300)
         jenkins.model.Nodes.updateNode(Nodes.java:193)
         jenkins.model.Jenkins.updateNode(Jenkins.java:2077)
         hudson.model.Node.save(Node.java:140)
         hudson.util.PersistedList.onModified(PersistedList.java:173)
         hudson.util.PersistedList.replaceBy(PersistedList.java:85)
         hudson.model.Slave.<init>(Slave.java:198)
         hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:134)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42)
         hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:899)
         hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:606)
         hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:578)
         hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:415)
         hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:542)
         hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:557)
         hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
         hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
         hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
         hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748) "jenkins.util.Timer [#6]" daemon prio=5 BLOCKED
         hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:671)
         hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47)
         hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:452)
         hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:420)
         hudson.plugins.ec2.EC2OndemandSlave.terminate(EC2OndemandSlave.java:68)
         hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:360)
         hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:126)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:88)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:46)
         hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
         hudson.model.Queue._withLock(Queue.java:1380)
         hudson.model.Queue.withLock(Queue.java:1257)
         hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748)
      
      "Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700" prio=5 WAITING
        sun.misc.Unsafe.park(Native Method)
        java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        hudson.model.Queue.schedule2(Queue.java:587)
        hudson.model.Queue.schedule2(Queue.java:713)
        jenkins.model.ParameterizedJobMixIn.doBuild(ParameterizedJobMixIn.java:217)
        jenkins.model.ParameterizedJobMixIn$ParameterizedJob.doBuild(ParameterizedJobMixIn.java:408)
        java.lang.invoke.LambdaForm$DMH/227306521.invokeInterface_L4_V(LambdaForm$DMH)
        java.lang.invoke.LambdaForm$BMH/1196970080.reinvoke(LambdaForm$BMH)
        java.lang.invoke.LambdaForm$MH/457755914.invoker(LambdaForm$MH)
        java.lang.invoke.LambdaForm$MH/145876426.invokeExact_MT(LambdaForm$MH)
        java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
        org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
        org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
        org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
        org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
        org.kohsuke.stapler.Stapler.service(Stapler.java:238)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
        org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
        net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
        org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
        hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
      

      We upgraded to 1.40-SNAPSHOT after running into similar global deadlocks in 1.38 and 1.39, which I can attach stack dumps for, but since current master has a lot of reworking of the locking code, I'm not sure if they'll be useful.

          [JENKINS-53858] Deadlock on EC2 resources

          Mason Donahue created issue -
          FABRIZIO MANFREDI made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Mason Donahue made changes -
          Environment Original: Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue) New: Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1
          Stefan Verhoeff made changes -
          Link New: This issue is related to JENKINS-54187 [ JENKINS-54187 ]
          FABRIZIO MANFREDI made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Progress [ 3 ] New: Closed [ 6 ]
          Sam Gleske made changes -
          Link New: This issue is related to JENKINS-56986 [ JENKINS-56986 ]
          Sam Gleske made changes -
          Link New: This issue is duplicated by JENKINS-56986 [ JENKINS-56986 ]
          Nicolas De Loof made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Closed [ 6 ] New: Reopened [ 4 ]
          FABRIZIO MANFREDI made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Fixed but Unreleased [ 10203 ]

            thoulen FABRIZIO MANFREDI
            masond Mason Donahue
            Votes:
            18 Vote for this issue
            Watchers:
            42 Start watching this issue

              Created:
              Updated:
              Resolved: