Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68820

Sometimes Jenkins hang when EC2 agent launching

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • ec2-plugin, gradle-plugin
    • None
    • Jenkins 2.332.3 and 2.346.1
      Amazon EC2 plugin Version 1.68
    • gradle:1.39.4

    Description

      in the last week, periodically when running a job that uses ec2 agent, the jenkins finally hang (web GUI is still accessible, but impossible to cancel the jobs or run a new one) and we need to restart the Jenkins. Before that everything worked fine, such problems appeared only last week. If run manually via Manage Nodes and Clouds periodically jenkins returns 504 after a while, EC2 instance started and jenkins doesn't hang. Also seems that this happens more often with the Spot Configuration enabled.

      Latest log:

      022-06-23 19:27:05.227+0000 [id=5461]  WARNING j.m.api.Metrics$HealthChecker#execute: Some health checks are reporting as unhealthy: [thread-deadlock : [jenkins.util.Timer [#2] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@5db1de9 (owned by Computer.threadPoolForRemoting [#647]):
               at java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
               at java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
               at java.base@11.0.15/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
               at hudson.model.Queue.maintain(Queue.java:1479)
               at hudson.model.Queue$MaintainTask.doRun(Queue.java:2899)
               at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
               at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
               at java.base@11.0.15/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
               at java.base@11.0.15/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
               at java.base@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
               at java.base@11.0.15/java.lang.Thread.run(Thread.java:829)
      , Computer.threadPoolForRemoting [#647] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@b1cf322 (owned by jenkins.util.Timer [#4]):
               at java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
               at java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
               at java.base@11.0.15/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
               at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:694)
               at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:796)
               at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$null$11(MinimumInstanceChecker.java:114)
               at hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda$1030/0x0000000102a9bc40.accept(Unknown Source)
               at java.base@11.0.15/java.util.ArrayList.forEach(ArrayList.java:1541)
               at java.base@11.0.15/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
               at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$12(MinimumInstanceChecker.java:77)
               at hudson.plugins.ec2.util.MinimumInstanceChecker$$Lambda$257/0x0000000100417440.accept(Unknown Source)
               at java.base@11.0.15/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
               at java.base@11.0.15/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
               at java.base@11.0.15/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
               at java.base@11.0.15/java.util.Iterator.forEachRemaining(Iterator.java:133)
               at java.base@11.0.15/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
               at java.base@11.0.15/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
               at java.base@11.0.15/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
               at java.base@11.0.15/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
               at java.base@11.0.15/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
               at java.base@11.0.15/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
               at java.base@11.0.15/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
               at hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:76)
               at hudson.plugins.ec2.SlaveTemplate$OnSaveListener.onChange(SlaveTemplate.java:1844)
               at hudson.model.listeners.SaveableListener.fireOnChange(SaveableListener.java:82)
               at jenkins.model.Jenkins.save(Jenkins.java:3545)
               at hudson.util.PersistedList.onModified(PersistedList.java:193)
               at hudson.util.PersistedList._onModified(PersistedList.java:224)
               at hudson.util.PersistedList.add(PersistedList.java:85)
               at hudson.plugins.gradle.injection.MavenOptsSetter.setMavenOpts(MavenOptsSetter.java:40)
               at hudson.plugins.gradle.injection.MavenOptsSetter.remove(MavenOptsSetter.java:31)
               at hudson.plugins.gradle.injection.MavenBuildScanInjection.removeMavenExtension(MavenBuildScanInjection.java:86)
               at hudson.plugins.gradle.injection.MavenBuildScanInjection.inject(MavenBuildScanInjection.java:58)
               at hudson.plugins.gradle.injection.BuildScanInjectionListener.lambda$inject$0(BuildScanInjectionListener.java:57)
               at hudson.plugins.gradle.injection.BuildScanInjectionListener$$Lambda$426/0x0000000100daf440.accept(Unknown Source)
               at java.base@11.0.15/java.util.Arrays$ArrayList.forEach(Arrays.java:4390)
               at hudson.plugins.gradle.injection.BuildScanInjectionListener.inject(BuildScanInjectionListener.java:57)
               at hudson.plugins.gradle.injection.BuildScanInjectionListener.onConfigurationChange(BuildScanInjectionListener.java:49)
               at hudson.model.AbstractCIBase$$Lambda$424/0x0000000100dafc40.accept(Unknown Source)
               at jenkins.util.Listeners.lambda$notify$0(Listeners.java:59)
               at jenkins.util.Listeners$$Lambda$425/0x0000000100daf040.run(Unknown Source)
               at jenkins.util.Listeners.notify(Listeners.java:70)
               at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:277)
               at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1670)
               at jenkins.model.Nodes$5.run(Nodes.java:279)
               at hudson.model.Queue._withLock(Queue.java:1395)
               at hudson.model.Queue.withLock(Queue.java:1269)
               at jenkins.model.Nodes.removeNode(Nodes.java:270)
               at jenkins.model.Jenkins.removeNode(Jenkins.java:2215)
               at hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:91)
               at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$1$1.run(OnceRetentionStrategy.java:128)
               at hudson.model.Queue._withLock(Queue.java:1395)
               at hudson.model.Queue.withLock(Queue.java:1269)
               at org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$1.run(OnceRetentionStrategy.java:123)
               at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
               at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
               at java.base@11.0.15/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
               at java.base@11.0.15/java.util.concurrent.FutureTask.run(FutureTask.java:264)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
               at java.base@11.0.15/java.lang.Thread.run(Thread.java:829)
      , jenkins.util.Timer [#4] locked on java.util.concurrent.locks.ReentrantLock$NonfairSync@5db1de9 (owned by Computer.threadPoolForRemoting [#647]):
               at java.base@11.0.15/jdk.internal.misc.Unsafe.park(Native Method)
               at java.base@11.0.15/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
               at java.base@11.0.15/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
               at java.base@11.0.15/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
               at hudson.model.Queue._withLock(Queue.java:1454)
               at hudson.model.Queue.withLock(Queue.java:1312)
               at jenkins.model.Nodes.updateNode(Nodes.java:201)
               at jenkins.model.Jenkins.updateNode(Jenkins.java:2229)
               at hudson.model.Node.save(Node.java:143)
               at hudson.util.PersistedList.onModified(PersistedList.java:193)
               at hudson.util.PersistedList.replaceBy(PersistedList.java:99)
               at hudson.model.Slave.setNodeProperties(Slave.java:315)
               at hudson.plugins.ec2.EC2AbstractSlave.<init>(EC2AbstractSlave.java:163)
               at hudson.plugins.ec2.EC2OndemandSlave.<init>(EC2OndemandSlave.java:73)
               at hudson.plugins.ec2.util.EC2AgentFactoryImpl.createOnDemandAgent(EC2AgentFactoryImpl.java:15)
               at hudson.plugins.ec2.SlaveTemplate.newOndemandSlave(SlaveTemplate.java:1566)
               at hudson.plugins.ec2.SlaveTemplate.toSlaves(SlaveTemplate.java:1182)
               at hudson.plugins.ec2.SlaveTemplate.provisionOndemand(SlaveTemplate.java:1154)
               at hudson.plugins.ec2.SlaveTemplate.provisionSpot(SlaveTemplate.java:1355)
               at hudson.plugins.ec2.SlaveTemplate.provision(SlaveTemplate.java:889)
               at hudson.plugins.ec2.EC2Cloud.getNewOrExistingAvailableSlave(EC2Cloud.java:714)
               at hudson.plugins.ec2.EC2Cloud.provision(EC2Cloud.java:740)
               at com.cloudbees.jenkins.plugins.amazonecs.ECSProvisioningStrategy.apply(ECSProvisioningStrategy.java:65)
               at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:326)
               at hudson.slaves.NodeProvisioner.access$1000(NodeProvisioner.java:71)
               at hudson.slaves.NodeProvisioner$NodeProvisionerInvoker.doRun(NodeProvisioner.java:824)
               at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:92)
               at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
               at java.base@11.0.15/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
               at java.base@11.0.15/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
               at java.base@11.0.15/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
               at java.base@11.0.15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
               at java.base@11.0.15/java.lang.Thread.run(Thread.java:829)
      ]] 

      Attachments

        Activity

          allan_burdajewicz Allan BURDAJEWICZ added a comment - - edited

          Saw something similar recently though not in the context of ec2, but kubernetes. I think this is caused by a recent feature of the Gradle Plugin https://github.com/jenkinsci/gradle-plugin/commit/b4aa34b2c48d9d96e212c80d45508dc40c5a023f.
          The feature has apparently been disabled by default https://github.com/jenkinsci/gradle-plugin/pull/162 but this is not released yet. Try disabling Gradle plugin and see if that fixes it.
          This new feature is disabled by default in the latest release of the gradle plugin https://github.com/jenkinsci/gradle-plugin/releases/tag/gradle-1.39.4. maybe try upgrading this plugin.
          cc wolfs

          allan_burdajewicz Allan BURDAJEWICZ added a comment - - edited Saw something similar recently though not in the context of ec2, but kubernetes. I think this is caused by a recent feature of the Gradle Plugin https://github.com/jenkinsci/gradle-plugin/commit/b4aa34b2c48d9d96e212c80d45508dc40c5a023f . The feature has apparently been disabled by default https://github.com/jenkinsci/gradle-plugin/pull/162 but this is not released yet. Try disabling Gradle plugin and see if that fixes it. This new feature is disabled by default in the latest release of the gradle plugin https://github.com/jenkinsci/gradle-plugin/releases/tag/gradle-1.39.4 . maybe try upgrading this plugin. cc wolfs

          allan_burdajewicz thanks, looks like it works good now.

          matchden Denis Matchenko added a comment - allan_burdajewicz thanks, looks like it works good now.

          Caused by a feature introduced in gradle-1.39. Feature disabled by default in gradle-1.39.4.

          allan_burdajewicz Allan BURDAJEWICZ added a comment - Caused by a feature introduced in gradle-1.39 . Feature disabled by default in gradle-1.39.4 .

          People

            Unassigned Unassigned
            matchden Denis Matchenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: