Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53858

Deadlock on EC2 resources

    XMLWordPrintable

Details

    • Bug
    • Status: Fixed but Unreleased (View Workflow)
    • Critical
    • Resolution: Fixed
    • ec2-plugin
    • Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1

    Description

      We seem to be running into an issue about once per day where multiple threads deadlock trying to access and update resources within the EC2 plugin.

      We have several jobs that add substantial numbers of subjobs (~40) to the build queue, and they thus invoke the Pipeline step `ec2 cloud: 'AWS Cloud', template: 'Micro'` several times to preallocate enough EC2 nodes to run them all (though it looks like this behavior will no longer be necessary in ec2-plugin 1.40).

      In addition, it seems that manually provisioning a node through the UI or manually deleting a node has a chance of deadlocking if it runs at the same time as the provisioning or unprovisioning process happens.

       

      The following stacktrace shows the three threads running in 1.40-SNAPSHOT (master as of Friday afternoon).

      Warning, the following threads are deadlocked : Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700, jenkins.util.Timer [#3], jenkins.util.Timer [#6]
      
       "jenkins.util.Timer [#3]" daemon prio=5 WAITING
         sun.misc.Unsafe.park(Native Method)
         java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
         java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
         java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
         java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
         hudson.model.Queue._withLock(Queue.java:1437)
         hudson.model.Queue.withLock(Queue.java:1300)
         jenkins.model.Nodes.updateNode(Nodes.java:193)
         jenkins.model.Jenkins.updateNode(Jenkins.java:2077)
         hudson.model.Node.save(Node.java:140)
         hudson.util.PersistedList.onModified(PersistedList.java:173)
         hudson.util.PersistedList.replaceBy(PersistedList.java:85)
         hudson.model.Slave.<init>(Slave.java:198)
         hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:134)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49)
         hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42)
         hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:899)
         hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:606)
         hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:578)
         hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:415)
         hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:542)
         hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:557)
         hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
         hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
         hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
         hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748) "jenkins.util.Timer [#6]" daemon prio=5 BLOCKED
         hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:671)
         hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47)
         hudson.plugins.ec2.EC2AbstractSlave.fetchLiveInstanceData(EC2AbstractSlave.java:452)
         hudson.plugins.ec2.EC2AbstractSlave.isAlive(EC2AbstractSlave.java:420)
         hudson.plugins.ec2.EC2OndemandSlave.terminate(EC2OndemandSlave.java:68)
         hudson.plugins.ec2.EC2AbstractSlave.idleTimeout(EC2AbstractSlave.java:360)
         hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:126)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:88)
         hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:46)
         hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
         hudson.model.Queue._withLock(Queue.java:1380)
         hudson.model.Queue.withLock(Queue.java:1257)
         hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
         hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
         jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
         java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
         java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         java.lang.Thread.run(Thread.java:748)
      
      "Handling POST /job/Selenium%20Tests/job/PAID-1256%252Fenable-paid-tests/build from 172.26.3.39 : qtp125130493-18700" prio=5 WAITING
        sun.misc.Unsafe.park(Native Method)
        java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        hudson.model.Queue.schedule2(Queue.java:587)
        hudson.model.Queue.schedule2(Queue.java:713)
        jenkins.model.ParameterizedJobMixIn.doBuild(ParameterizedJobMixIn.java:217)
        jenkins.model.ParameterizedJobMixIn$ParameterizedJob.doBuild(ParameterizedJobMixIn.java:408)
        java.lang.invoke.LambdaForm$DMH/227306521.invokeInterface_L4_V(LambdaForm$DMH)
        java.lang.invoke.LambdaForm$BMH/1196970080.reinvoke(LambdaForm$BMH)
        java.lang.invoke.LambdaForm$MH/457755914.invoker(LambdaForm$MH)
        java.lang.invoke.LambdaForm$MH/145876426.invokeExact_MT(LambdaForm$MH)
        java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
        org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343)
        org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184)
        org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117)
        org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248)
        org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58)
        org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:864)
        org.kohsuke.stapler.Stapler.invoke(Stapler.java:668)
        org.kohsuke.stapler.Stapler.service(Stapler.java:238)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154)
        org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:239)
        net.bull.javamelody.MonitoringFilter.doFilter(MonitoringFilter.java:215)
        net.bull.javamelody.PluginMonitoringFilter.doFilter(PluginMonitoringFilter.java:88)
        org.jvnet.hudson.plugins.monitoring.HudsonMonitoringFilter.doFilter(HudsonMonitoringFilter.java:114)
        hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151)
        hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99)
        org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84)
        hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
        jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93)
        hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87)
      

      We upgraded to 1.40-SNAPSHOT after running into similar global deadlocks in 1.38 and 1.39, which I can attach stack dumps for, but since current master has a lot of reworking of the locking code, I'm not sure if they'll be useful.

      Attachments

        Issue Links

          Activity

            masond Mason Donahue created issue -
            thoulen FABRIZIO MANFREDI made changes -
            Field Original Value New Value
            Status Open [ 1 ] In Progress [ 3 ]
            masond Mason Donahue made changes -
            Environment Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue) Jenkins v2.144, ubuntu 14.04, ec2-plugin 1.38, 1.39, 1.40-SNAPSHOT (private-160d794a-masondonahue), 1.40.1
            sverhoef Stefan Verhoeff made changes -
            Link This issue is related to JENKINS-54187 [ JENKINS-54187 ]
            thoulen FABRIZIO MANFREDI made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            sag47 Sam Gleske made changes -
            Link This issue is related to JENKINS-56986 [ JENKINS-56986 ]
            sag47 Sam Gleske made changes -
            Link This issue is duplicated by JENKINS-56986 [ JENKINS-56986 ]
            ndeloof Nicolas De Loof made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            thoulen FABRIZIO MANFREDI made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Fixed but Unreleased [ 10203 ]

            People

              thoulen FABRIZIO MANFREDI
              masond Mason Donahue
              Votes:
              18 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: