Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54187

EC2 Plugin deadlock leaving Jenkins unresponsive

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • ec2-plugin
    • None

      We recently upgraded our Jenkins instance to the latest LTS (2.138.2) and started noticing some irregularities with the EC2 plugin, namely:

      • A lot more build nodes were created than before, for the same amount of incoming items in the build queue
      • This caused an increased amount of AWS API calls, namely StopInstances event. Possibly related to the previous point.

      This led us to try out the latest snapshot, 1.41-SNAPSHOT build on revision d4bdd6b83a7102330fd97ffbbd067edc34e47f97. A few hours later, our Jenkins instance had a deadlock problem that is described in the log I'm attaching.

      Notice how the Gerrit plugin stops processing events after the EC2 plugin deadlock, this ultimately left our Jenkins unresponsive.

      One important bit of information might be that we're using the Stop/Disconnect on Idle Timeout plugin option.

      We're happy to provide more information if needed.

        1. all-jenkins.log
          26 kB
        2. jenkins.log
          10 kB
        3. jenkins.txt
          6 kB

          [JENKINS-54187] EC2 Plugin deadlock leaving Jenkins unresponsive

          Danny added a comment -

          This is happened to me twice during the past days, happens both on Jenkins ver. 2.138.1 and on Jenkins ver. 2.138.2

          Danny added a comment - This is happened to me twice during the past days, happens both on Jenkins ver. 2.138.1 and on Jenkins ver. 2.138.2

          Can you test this snapshot that contains a fix for the deadlock? : 

          https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/ec2/1.42-SNAPSHOT/ec2-1.42-20181106.195515-1.hpi

           

          FABRIZIO MANFREDI added a comment - Can you test this snapshot that contains a fix for the deadlock? :  https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/ec2/1.42-SNAPSHOT/ec2-1.42-20181106.195515-1.hpi  

          I got the following null pointer exception when I try to launch a slave using the above snapshot version and the latest version of Jenkins LTS 2.138.2

          java.lang.NullPointerExceptionjava.lang.NullPointerException at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:587) at hudson.plugins.ec2.EC2Cloud.doProvision(EC2Cloud.java:344) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)Caused: javax.servlet.ServletException at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:784) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.audit_trail.AuditTrailFilter.doFilter(AuditTrailFilter.java:92) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680) at java.lang.Thread.run(Thread.java:748) 

          Oliver Pereira added a comment - I got the following null pointer exception when I try to launch a slave using the above snapshot version and the latest version of Jenkins LTS 2.138.2 java.lang.NullPointerExceptionjava.lang.NullPointerException at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:587) at hudson.plugins.ec2.EC2Cloud.doProvision(EC2Cloud.java:344) at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627) at org.kohsuke.stapler.Function$MethodFunction.invoke(Function.java:343) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:184) at org.kohsuke.stapler.Function.bindAndInvokeAndServeResponse(Function.java:117) at org.kohsuke.stapler.MetaClass$1.doDispatch(MetaClass.java:129) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734)Caused: javax.servlet.ServletException at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:784) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.MetaClass$5.doDispatch(MetaClass.java:248) at org.kohsuke.stapler.NameBasedDispatcher.dispatch(NameBasedDispatcher.java:58) at org.kohsuke.stapler.Stapler.tryInvoke(Stapler.java:734) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:864) at org.kohsuke.stapler.Stapler.invoke(Stapler.java:668) at org.kohsuke.stapler.Stapler.service(Stapler.java:238) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:154) at org.jenkinsci.plugins.ssegateway.Endpoint$SSEListenChannelFilter.doFilter(Endpoint.java:243) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.ResourceCacheControl.doFilter(ResourceCacheControl.java:134) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at io.jenkins.blueocean.auth.jwt.impl.JwtAuthenticationFilter.doFilter(JwtAuthenticationFilter.java:61) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.audit_trail.AuditTrailFilter.doFilter(AuditTrailFilter.java:92) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at jenkins.metrics.impl.MetricsFilter.doFilter(MetricsFilter.java:125) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.plugins.greenballs.GreenBallFilter.doFilter(GreenBallFilter.java:59) at hudson.util.PluginServletFilter$1.doFilter(PluginServletFilter.java:151) at hudson.util.PluginServletFilter.doFilter(PluginServletFilter.java:157) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.security.csrf.CrumbFilter.doFilter(CrumbFilter.java:99) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:84) at hudson.security.UnwrapSecurityExceptionFilter.doFilter(UnwrapSecurityExceptionFilter.java:51) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:117) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.providers.anonymous.AnonymousProcessingFilter.doFilter(AnonymousProcessingFilter.java:125) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.rememberme.RememberMeProcessingFilter.doFilter(RememberMeProcessingFilter.java:142) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.ui.AbstractProcessingFilter.doFilter(AbstractProcessingFilter.java:271) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at jenkins.security.BasicHeaderProcessor.doFilter(BasicHeaderProcessor.java:93) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at org.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilter(HttpSessionContextIntegrationFilter.java:249) at hudson.security.HttpSessionContextIntegrationFilter2.doFilter(HttpSessionContextIntegrationFilter2.java:67) at hudson.security.ChainedServletFilter$1.doFilter(ChainedServletFilter.java:87) at hudson.security.ChainedServletFilter.doFilter(ChainedServletFilter.java:90) at hudson.security.HudsonFilter.doFilter(HudsonFilter.java:171) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.compression.CompressionFilter.doFilter(CompressionFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at hudson.util.CharacterEncodingFilter.doFilter(CharacterEncodingFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.kohsuke.stapler.DiagnosticThreadNameFilter.doFilter(DiagnosticThreadNameFilter.java:30) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:531) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680) at java.lang. Thread .run( Thread .java:748)

          PR 321 does not throw this error anymore and I am going to continue testing this.

          Oliver Pereira added a comment - PR 321 does not throw this error anymore and I am going to continue testing this.

          Greg Smith added a comment -

          We were hitting this error too, and after reviewing all changes, I thought that maybe this problem and JENKINS-53401 were related. 

          Developer of that fix stated in that Jira that it is possible they are:  https://issues.jenkins-ci.org/browse/JENKINS-53401?focusedCommentId=353132&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-353132

          Greg Smith added a comment - We were hitting this error too, and after reviewing all changes, I thought that maybe this problem and JENKINS-53401 were related.  Developer of that fix stated in that Jira that it is possible they are:  https://issues.jenkins-ci.org/browse/JENKINS-53401?focusedCommentId=353132&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-353132

          Jeff Squyres added a comment -

          +1 – we are hitting this issue, too. We had to disable all of the EC2 testing in the Open MPI project (github.com/open-mpi/ompi), which has unfortunately killed a good portion of our CI testing.

          Jeff Squyres added a comment - +1 – we are hitting this issue, too. We had to disable all of the EC2 testing in the Open MPI project (github.com/open-mpi/ompi), which has unfortunately killed a good portion of our CI testing.

          Callum Pember added a comment - - edited

          +1. This has been an ongoing issue for us for over a month. Attached a log from today.

          Once this happens, Jenkins becomes unresponsive to most actions.

          jenkins.txt

           

          Dec 12, 2018 11:08:04 PM INFO hudson.plugins.ec2.EC2Cloud provision
          SlaveTemplate{ami='ami-xx', labels='lapis-prod'}. Attempting to provision slave needed by excess workload of 1 units
          Dec 12, 2018 11:08:04 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          SlaveTemplate{ami='ami-xx', labels='lapis-prod'}. Considering launching
          Dec 12, 2018 11:08:05 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          SlaveTemplate{ami='ami-xx', labels='lapis-prod'}. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate
          Dec 12, 2018 11:08:05 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
          SlaveTemplate{ami='ami-xxx', labels='lapis-prod'}. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-xxx]}, {Name: instance-type,Values: [m3.2xlarge]}, {Name: key-name,Values: [jenkins-slave]}, {Name: availability-zone,Values: [ap-southeast-2b]}, {Name: subnet-id,Values: [subnet-xx]}, {Name: instance.group-id,Values: [sg-xx]}, {Name: tag:jenkins_server_url,Values: [https://jenkins.xx.xx.xx/]}, {Name: tag:Name,Values: [jenkins-lapis-worker]}, {Name: tag:jenkins_slave_type,Values: [demand_lapis-prod]}],InstanceIds: [],}
          Dec 12, 2018 11:08:33 PM WARNING jenkins.metrics.api.Metrics$HealthChecker execute
          Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#3] locked on hudson.plugins.ec2.AmazonEC2Cloud@76b6b03d (owned by jenkins.util.Timer [#4]):
          	 at hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:748)
          	 at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47)
          	 at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25)
          	 at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:127)
          	 at hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:112)
          	 at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:90)
          	 at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:48)
          	 at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
          	 at hudson.model.Queue._withLock(Queue.java:1381)
          	 at hudson.model.Queue.withLock(Queue.java:1258)
          	 at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
          	 at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
          	 at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          	 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
          	 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
          	 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
          	 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	 at java.lang.Thread.run(Thread.java:748)
          , jenkins.util.Timer [#4] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@486c87f (owned by jenkins.util.Timer [#3]):
          	 at sun.misc.Unsafe.park(Native Method)
          	 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          	 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          	 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
          	 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
          	 at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
          	 at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
          	 at hudson.model.Queue._withLock(Queue.java:1438)
          	 at hudson.model.Queue.withLock(Queue.java:1301)
          	 at jenkins.model.Nodes.updateNode(Nodes.java:193)
          	 at jenkins.model.Jenkins.updateNode(Jenkins.java:2097)
          	 at hudson.model.Node.save(Node.java:140)
          	 at hudson.util.PersistedList.onModified(PersistedList.java:173)
          	 at hudson.util.PersistedList.replaceBy(PersistedList.java:85)
          	 at hudson.model.Slave.<init>(Slave.java:199)
          	 at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:138)
          	 at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49)
          	 at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42)
          	 at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:963)
          	 at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:660)
          	 at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:632)
          	 at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:463)
          	 at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:587)
          	 at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:602)
          	 at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715)
          	 at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320)
          	 at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61)
          	 at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809)
          	 at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72)
          	 at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
          	 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          	 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
          	 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
          	 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
          	 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          	 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          	 at java.lang.Thread.run(Thread.java:748)
          ]]
          
          

          Callum Pember added a comment - - edited +1. This has been an ongoing issue for us for over a month. Attached a log from today. Once this happens, Jenkins becomes unresponsive to most actions. jenkins.txt   Dec 12, 2018 11:08:04 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami= 'ami-xx' , labels= 'lapis-prod' }. Attempting to provision slave needed by excess workload of 1 units Dec 12, 2018 11:08:04 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-xx' , labels= 'lapis-prod' }. Considering launching Dec 12, 2018 11:08:05 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-xx' , labels= 'lapis-prod' }. Setting Instance Initiated Shutdown Behavior : ShutdownBehavior.Terminate Dec 12, 2018 11:08:05 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami= 'ami-xxx' , labels= 'lapis-prod' }. Looking for existing instances with describe-instance: {Filters: [{Name: image-id,Values: [ami-xxx]}, {Name: instance-type,Values: [m3.2xlarge]}, {Name: key-name,Values: [jenkins-slave]}, {Name: availability-zone,Values: [ap-southeast-2b]}, {Name: subnet-id,Values: [subnet-xx]}, {Name: instance.group-id,Values: [sg-xx]}, {Name: tag:jenkins_server_url,Values: [https: //jenkins.xx.xx.xx/]}, {Name: tag:Name,Values: [jenkins-lapis-worker]}, {Name: tag:jenkins_slave_type,Values: [demand_lapis-prod]}],InstanceIds: [],} Dec 12, 2018 11:08:33 PM WARNING jenkins.metrics.api.Metrics$HealthChecker execute Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#3] locked on hudson.plugins.ec2.AmazonEC2Cloud@76b6b03d (owned by jenkins.util.Timer [#4]): at hudson.plugins.ec2.EC2Cloud.connect(EC2Cloud.java:748) at hudson.plugins.ec2.CloudHelper.getInstance(CloudHelper.java:47) at hudson.plugins.ec2.CloudHelper.getInstanceWithRetry(CloudHelper.java:25) at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:127) at hudson.plugins.ec2.EC2RetentionStrategy.internalCheck(EC2RetentionStrategy.java:112) at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:90) at hudson.plugins.ec2.EC2RetentionStrategy.check(EC2RetentionStrategy.java:48) at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72) at hudson.model.Queue._withLock(Queue.java:1381) at hudson.model.Queue.withLock(Queue.java:1258) at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) , jenkins.util.Timer [#4] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@486c87f (owned by jenkins.util.Timer [#3]): at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at hudson.model.Queue._withLock(Queue.java:1438) at hudson.model.Queue.withLock(Queue.java:1301) at jenkins.model.Nodes.updateNode(Nodes.java:193) at jenkins.model.Jenkins.updateNode(Jenkins.java:2097) at hudson.model.Node.save(Node.java:140) at hudson.util.PersistedList.onModified(PersistedList.java:173) at hudson.util.PersistedList.replaceBy(PersistedList.java:85) at hudson.model.Slave.<init>(Slave.java:199) at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:138) at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:49) at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:42) at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:963) at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:660) at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:632) at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:463) at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:587) at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:602) at hudson.slaves.NodeProvisioner$StandardStrategyImpl.apply(NodeProvisioner.java:715) at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:320) at hudson.slaves.NodeProvisioner.access$000(NodeProvisioner.java:61) at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:809) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:72) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang. Thread .run( Thread .java:748) ]]

          Same issue again after accidentally upgrading to ec2-1.41 and Jenkins LTS 2.150.1.

           

          Looks like this bug is a duplicate, linking: JENKINS-53858

           

          Fix for JENKINS-53858 has been announced for 1.42 on the wiki: https://wiki.jenkins.io/display/JENKINS/Amazon+EC2+Plugin#AmazonEC2Plugin-Version1.42(NotReleaseyet,2018)

          Stefan Verhoeff added a comment - Same issue again after accidentally upgrading to ec2-1.41 and Jenkins LTS 2.150.1.   Looks like this bug is a duplicate, linking: JENKINS-53858   Fix for JENKINS-53858 has been announced for 1.42 on the wiki:  https://wiki.jenkins.io/display/JENKINS/Amazon+EC2+Plugin#AmazonEC2Plugin-Version1.42(NotReleaseyet,2018)

          Eric Knecht added a comment -

          We have encountered this issue as well. An ETA on the 1.42 release would be highly appreciated.

          Eric Knecht added a comment - We have encountered this issue as well. An ETA on the 1.42 release would be highly appreciated.

          Mike Poulson added a comment -

          I have confirmed the RC version of 1.42 fixed things. I couldn’t wait.

          Mike Poulson added a comment - I have confirmed the RC version of 1.42 fixed things. I couldn’t wait.

            thoulen FABRIZIO MANFREDI
            ruenzuo Renzo Crisóstomo
            Votes:
            6 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated:
              Resolved: