Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26380

Deadlock between Queue and Jenkins model

    XMLWordPrintable

Details

    Description

      When a node is removed (synchronized Jenkins.removeNode()), then eventually Computer.setNumExecutors() is called, attempting to lock the Queue. If, at the same time, a CloudRetentionStrategy.check() runs and determines that a node should be terminated, it locks the Queue before calling Jenkins.removeNode() and attempting to get a lock on the Jenkins object.

      A thread dump (deadlock.tdump) is attached which shows the deadlock.

      We're using DockerComputers that use the OnceRetentionStrategy, which means that nodes are removed every time a task completes, so the potential for this deadlock occurring is quite high (we experience it multiple times per day).

      Attachments

        Activity

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/executors/OnceRetentionStrategy.java
          http://jenkins-ci.org/commit/durable-task-plugin/18733f566e3ddb1dafe32dfb16025586cb76306f
          Log:
          JENKINS-26380 Terminate nodes correctly

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/org/jenkinsci/plugins/durabletask/executors/OnceRetentionStrategy.java http://jenkins-ci.org/commit/durable-task-plugin/18733f566e3ddb1dafe32dfb16025586cb76306f Log: JENKINS-26380 Terminate nodes correctly

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/org/jenkinsci/plugins/durabletask/executors/OnceRetentionStrategy.java
          http://jenkins-ci.org/commit/durable-task-plugin/cce88cad22f78997d6a7b839fb3f2f75b4ce94c9
          Log:
          Merge pull request #2 from stephenc/jenkins-26380

          JENKINS-26380 Terminate nodes correctly

          Compare: https://github.com/jenkinsci/durable-task-plugin/compare/6325fef67a86...cce88cad22f7

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/org/jenkinsci/plugins/durabletask/executors/OnceRetentionStrategy.java http://jenkins-ci.org/commit/durable-task-plugin/cce88cad22f78997d6a7b839fb3f2f75b4ce94c9 Log: Merge pull request #2 from stephenc/jenkins-26380 JENKINS-26380 Terminate nodes correctly Compare: https://github.com/jenkinsci/durable-task-plugin/compare/6325fef67a86...cce88cad22f7
          jglick Jesse Glick added a comment -

          I guess should be considered fixed with that change.

          jglick Jesse Glick added a comment - I guess should be considered fixed with that change.
          jglick Jesse Glick added a comment -

          Made another fix in 1.3.

          jglick Jesse Glick added a comment - Made another fix in 1.3.
          hashar Antoine Musso added a comment - Jessie Glick fix is: Pull request: https://github.com/jenkinsci/durable-task-plugin/pull/3 Commit: https://github.com/jenkinsci/durable-task-plugin/commit/12c593402410034fe6e9f066d5fb4c1503891d54

          People

            stephenconnolly Stephen Connolly
            bernie Bernie Schelberg
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: