Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22549

JCloudsRetentionStrategy.check deadlocks with JCloudsCloud.provision

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      We observe a deadlock in JClouds plugin that happens in a following scenarion:

      1. There is a thread (thread A) that performs deployment of a new slave
      2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
      3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
      4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
      5. addNode is synchronized on Hudson instance and locks it
      6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
      7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

      The stack trace of thead A is therefore as follows:

      Name: Computer.threadPoolForRemoting [#135]
      State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer [#8]
      Total blocked: 8  Total waited: 1
      
      Stack trace: 
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
      hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
      hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
      hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
         - locked java.lang.Object@5a4034b5
      jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
      jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
      jenkins.model.Jenkins.addNode(Jenkins.java:1693)
         - locked hudson.model.Hudson@32da12ad
      jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
      jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
      java.util.concurrent.FutureTask.run(FutureTask.java:262)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:744)
      

      The stack trace of thread B is as follows:

      Name: jenkins.util.Timer [#8]
      State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting [#135]
      Total blocked: 2  Total waited: 274
      
      Stack trace: 
      jenkins.model.Jenkins.save(Jenkins.java:2672)
      hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
      hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)
         - locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
      jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
      hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
      hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:744)
      

      This actually causes deadly problems during normal operation of JClouds under somewhat high load.

        Attachments

          Issue Links

            Activity

            pupssman Ivan Kalinin created issue -
            olivergondza Oliver Gondža made changes -
            Field Original Value New Value
            Component/s jclouds-jenkins [ 17483 ]
            Description We observe a deadlock in JClouds plugin that happens in a following scenarion:

            1. There is a thread (thread A) that performs deployment of a new slave
            2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
            3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
            4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
            5. addNode is synchronized on Hudson instance and locks it
            6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
            7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

            The stack trace of thead A is therefore as follows:
            ===================================================

            Name: Computer.threadPoolForRemoting [#135]
            State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer [#8]
            Total blocked: 8 Total waited: 1

            Stack trace:
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
            hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
            hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
            hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
               - locked java.lang.Object@5a4034b5
            jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
            jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
            jenkins.model.Jenkins.addNode(Jenkins.java:1693)
               - locked hudson.model.Hudson@32da12ad
            jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
            jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
            java.util.concurrent.FutureTask.run(FutureTask.java:262)
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            java.lang.Thread.run(Thread.java:744)


            The stack trace of thread B is as follows:
            ==========================================

            Name: jenkins.util.Timer [#8]
            State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting [#135]
            Total blocked: 2 Total waited: 274

            Stack trace:
            jenkins.model.Jenkins.save(Jenkins.java:2672)
            hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
            hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)
               - locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
            hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
            hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
            java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            java.lang.Thread.run(Thread.java:744)

            =====================================

            This actually causes deadly problems during normal operation of JClouds under somewhat high load.
            We observe a deadlock in JClouds plugin that happens in a following scenarion:

            1. There is a thread (thread A) that performs deployment of a new slave
            2. Another thread (thread B) is working at the same time and decides take down one of the existing nodes
            3. Thread A performs Hudson.getInstance().addNode() here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsCloud.java#L205. At th
            4. Thread B performs setTemporaryOffline on a computer of the slave it wants to take down here: https://github.com/jenkinsci/jclouds-plugin/blob/master/jclouds-plugin/src/main/java/jenkins/plugins/jclouds/compute/JCloudsRetentionStrategy.java#L30 . At that moment it already holds the lock for the JCloudsRetentionStarategy instance because it is synchronized
            5. addNode is synchronized on Hudson instance and locks it
            6. addNode subsequently calls hudson.model.AbstractCIBase.updateComputerList, which calls retentionStrategy.check on a node being deleted in thread B and locks on it
            7. setTemporarilyOffline calls Jenkins.save that locks on the Jenkins (i.e. Hudson) instance, which is already locked by thread A in step 5.

            The stack trace of thead A is therefore as follows:


            {noformat}
            Name: Computer.threadPoolForRemoting [#135]
            State: BLOCKED on jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114 owned by: jenkins.util.Timer [#8]
            Total blocked: 8 Total waited: 1

            Stack trace:
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:22)
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
            hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:663)
            hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
            hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
               - locked java.lang.Object@5a4034b5
            jenkins.model.Jenkins.updateComputerList(Jenkins.java:1214)
            jenkins.model.Jenkins.setNodes(Jenkins.java:1711)
            jenkins.model.Jenkins.addNode(Jenkins.java:1693)
               - locked hudson.model.Hudson@32da12ad
            jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:205)
            jenkins.plugins.jclouds.compute.JCloudsCloud$2.call(JCloudsCloud.java:201)
            java.util.concurrent.FutureTask.run(FutureTask.java:262)
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            java.lang.Thread.run(Thread.java:744)
            {noformat}

            The stack trace of thread B is as follows:


            {noformat}
            Name: jenkins.util.Timer [#8]
            State: BLOCKED on hudson.model.Hudson@32da12ad owned by: Computer.threadPoolForRemoting [#135]
            Total blocked: 2 Total waited: 274

            Stack trace:
            jenkins.model.Jenkins.save(Jenkins.java:2672)
            hudson.model.Node.setTemporaryOfflineCause(Node.java:221)
            hudson.model.Computer.setTemporarilyOffline(Computer.java:590)
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:30)
               - locked jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy@4ccee114
            jenkins.plugins.jclouds.compute.JCloudsRetentionStrategy.check(JCloudsRetentionStrategy.java:15)
            hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:66)
            hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
            java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
            java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            java.lang.Thread.run(Thread.java:744)
            {noformat}


            This actually causes deadly problems during normal operation of JClouds under somewhat high load.
            olivergondza Oliver Gondža made changes -
            Link This issue duplicates JENKINS-15820 [ JENKINS-15820 ]
            olivergondza Oliver Gondža made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 154651 ] JNJira + In-Review [ 194979 ]
            felfert Fritz Elfert made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

              People

              Assignee:
              abayer Andrew Bayer
              Reporter:
              pupssman Ivan Kalinin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: