Could it be that the change in JENKINS-22131 is causing deadlocks? If I downgrade to 1.555, there is no problem, all versions from 1.556 show the same behaviour: More and more executors are deadlocked. To me it looks if two concurrent jobs do something which locks each other.

      Warning, the following threads are deadlocked : Executor #0 for 1sp-slave12 : executing CONTEST_Technologies_Tests #1568, Executor #13 for 1sp-slave1 : executing CONTEST_Technologies_Tests #1569
      
      The threads are blocked in:
      hudson.model.Run.save(Run.java:1856) (#1568)
      hudson.model.AbstractBuild.getRunMixIn(AbstractBuild.java:178) (#1569)
      
      Executor #13 for 1sp-slave1 : executing CONTEST_Technologies_Tests #1569
      hudson.model.AbstractBuild.getRunMixIn(AbstractBuild.java:178)
      jenkins.model.lazy.LazyBuildMixIn$RunMixIn.dropLinks(LazyBuildMixIn.java:336)
      hudson.model.AbstractBuild.dropLinks(AbstractBuild.java:193)
      hudson.model.RunMap.removeValue(RunMap.java:120)
      hudson.model.RunMap.remove(RunMap.java:86)
      jenkins.model.lazy.LazyBuildMixIn.removeRun(LazyBuildMixIn.java:216)
      hudson.model.AbstractProject.removeRun(AbstractProject.java:1002)
      hudson.model.AbstractProject.removeRun(AbstractProject.java:145)
      hudson.model.Run.removeRunFromParent(Run.java:1478)
      hudson.model.Run.delete(Run.java:1473)
      hudson.tasks.LogRotator.perform(LogRotator.java:124)
      hudson.model.Job.logRotate(Job.java:441)
      hudson.model.Run.execute(Run.java:1751)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      hudson.model.ResourceController.execute(ResourceController.java:88)
      hudson.model.Executor.run(Executor.java:231)
      
      Executor #0 for 1sp-slave12 : executing CONTEST_Technologies_Tests #1568
      hudson.model.Run.save(Run.java:1856)
      hudson.plugins.changelog_history.ChangeLogHistoryRunListener.copyChangeLogs(ChangeLogHistoryRunListener.java:82)
      hudson.plugins.changelog_history.ChangeLogHistoryRunListener.onDeleted(ChangeLogHistoryRunListener.java:50)
      hudson.plugins.changelog_history.ChangeLogHistoryRunListener.onDeleted(ChangeLogHistoryRunListener.java:40)
      hudson.model.listeners.RunListener.fireDeleted(RunListener.java:244)
      hudson.model.Run.delete(Run.java:1447)
      hudson.tasks.LogRotator.perform(LogRotator.java:124)
      hudson.model.Job.logRotate(Job.java:441)
      hudson.model.Run.execute(Run.java:1751)
      hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
      hudson.model.ResourceController.execute(ResourceController.java:88)
      hudson.model.Executor.run(Executor.java:231)
      

          [JENKINS-22560] Deadlocked executors on slaves since 1.556

          Dirk Kuypers added a comment -

          We are a .NET shop. I need more details how I could debug Java applications. I even asked on the user mailing list how to get a thread dump but got no answer. I was really glad when I could extract it from the bubble help of monitoring plugin's html page.

          So if I can provide more info just tell me what I should do.

          Dirk Kuypers added a comment - We are a .NET shop. I need more details how I could debug Java applications. I even asked on the user mailing list how to get a thread dump but got no answer. I was really glad when I could extract it from the bubble help of monitoring plugin's html page. So if I can provide more info just tell me what I should do.

          Jesse Glick added a comment -

          Looks like Run.execute always acquired a lock on an older build via LogRotator and Run.delete. And using the Change Log History plugin means that when deleting an older build Jenkins also acquires a lock on a newer build, which is a bug in that plugin. And the change in 1.556 means that Run.delete could also (via dropLinks) acquire locks on both older and newer builds, which in combination with the bug in CLH means that deadlocks are possible.

          Jesse Glick added a comment - Looks like Run.execute always acquired a lock on an older build via LogRotator and Run.delete . And using the Change Log History plugin means that when deleting an older build Jenkins also acquires a lock on a newer build, which is a bug in that plugin. And the change in 1.556 means that Run.delete could also (via dropLinks ) acquire locks on both older and newer builds, which in combination with the bug in CLH means that deadlocks are possible.

          Jesse Glick added a comment -

          @bruce the Monitoring plugin can give some information but that is not in a great format for our purposes. jstack (a tool from the JDK) is ideal for diagnosing thread dumps. You can also navigate to /threadDump in a browser; or install the Support Core plugin and use the Support link to generate a complete diagnostic bundle.

          Jesse Glick added a comment - @bruce the Monitoring plugin can give some information but that is not in a great format for our purposes. jstack (a tool from the JDK) is ideal for diagnosing thread dumps. You can also navigate to /threadDump in a browser; or install the Support Core plugin and use the Support link to generate a complete diagnostic bundle.

          Jesse Glick added a comment -

          Also looks like this deadlock would only happen if the job were marked as being capable of concurrent builds. Not clear to me whether it requires the presence of CLH; looks like it might be possible (if less likely) without it.

          Note that the behavior of LogRotator for concurrent-capable jobs is not well defined: it might wind up deleting the wrong number of builds, since it does not hold a lock on a Run for the duration of its loop, but just does a one-off calculation of old build candidates. Probably harmless normally.

          Jesse Glick added a comment - Also looks like this deadlock would only happen if the job were marked as being capable of concurrent builds. Not clear to me whether it requires the presence of CLH; looks like it might be possible (if less likely) without it. Note that the behavior of LogRotator for concurrent-capable jobs is not well defined: it might wind up deleting the wrong number of builds, since it does not hold a lock on a Run for the duration of its loop, but just does a one-off calculation of old build candidates. Probably harmless normally.

          Dirk Kuypers added a comment -

          I generate the jobs via JOB-DSL plugin from template. So if it helps to start them non-concurrent I can give it a try after upgrading again. But I think concurrency is the key here and if it goes away after removing it from the jobs it wouldn't help much.

          Removing the CLH plugin is another option, but I would loose quite useful changelog history... Seems to be out of maintainance either. Hmmmm. I think I will give the removal a try.

          Dirk Kuypers added a comment - I generate the jobs via JOB-DSL plugin from template. So if it helps to start them non-concurrent I can give it a try after upgrading again. But I think concurrency is the key here and if it goes away after removing it from the jobs it wouldn't help much. Removing the CLH plugin is another option, but I would loose quite useful changelog history... Seems to be out of maintainance either. Hmmmm. I think I will give the removal a try.

          Jesse Glick added a comment -

          Or just go back to 1.555 (or whatever) temporarily; I hope the fix will be in 1.560.

          Jesse Glick added a comment - Or just go back to 1.555 (or whatever) temporarily; I hope the fix will be in 1.560.

          Dirk Kuypers added a comment -

          I am back to 1.555. But I was not sure if it is fixable in short time. Then I will change nothing and will be happy to run the ultimative test for you. I think this issue is not easily reproducible.

          Dirk Kuypers added a comment - I am back to 1.555. But I was not sure if it is fixable in short time. Then I will change nothing and will be happy to run the ultimative test for you. I think this issue is not easily reproducible.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          core/src/main/java/hudson/model/AbstractBuild.java
          core/src/main/java/hudson/model/Run.java
          core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          http://jenkins-ci.org/commit/jenkins/a5d848bcec45100ca4f12303a8c619664f2eb945
          Log:
          [FIXED JENKINS-22560] Avoid deadlock by making AbstractBuild.runMixIn final.
          (Forgot that Run’s are unmarshalled in place after a special constructor is called, so there is no need for readResolve or other tricks.)
          Also calling RunListener.onDeleted outside of the Run lock to avoid problems with things like ChangeLogHistoryRunListener.

          Compare: https://github.com/jenkinsci/jenkins/compare/b35f834b6651...a5d848bcec45

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html core/src/main/java/hudson/model/AbstractBuild.java core/src/main/java/hudson/model/Run.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java http://jenkins-ci.org/commit/jenkins/a5d848bcec45100ca4f12303a8c619664f2eb945 Log: [FIXED JENKINS-22560] Avoid deadlock by making AbstractBuild.runMixIn final. (Forgot that Run’s are unmarshalled in place after a special constructor is called, so there is no need for readResolve or other tricks.) Also calling RunListener.onDeleted outside of the Run lock to avoid problems with things like ChangeLogHistoryRunListener. Compare: https://github.com/jenkinsci/jenkins/compare/b35f834b6651...a5d848bcec45

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3292
          [FIXED JENKINS-22560] Avoid deadlock by making AbstractBuild.runMixIn final. (Revision a5d848bcec45100ca4f12303a8c619664f2eb945)

          Result = SUCCESS
          Jesse Glick : a5d848bcec45100ca4f12303a8c619664f2eb945
          Files :

          • core/src/main/java/hudson/model/AbstractBuild.java
          • core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java
          • changelog.html
          • core/src/main/java/hudson/model/Run.java

          dogfood added a comment - Integrated in jenkins_main_trunk #3292 [FIXED JENKINS-22560] Avoid deadlock by making AbstractBuild.runMixIn final. (Revision a5d848bcec45100ca4f12303a8c619664f2eb945) Result = SUCCESS Jesse Glick : a5d848bcec45100ca4f12303a8c619664f2eb945 Files : core/src/main/java/hudson/model/AbstractBuild.java core/src/main/java/jenkins/model/lazy/LazyBuildMixIn.java changelog.html core/src/main/java/hudson/model/Run.java

          Dirk Kuypers added a comment -

          Just a short feedback: Jenkins ver. 1.560-SNAPSHOT (private-04/10/2014 15:41 GMT-jenkins) is running since about 8 hours now without a deadlock, so I think the bug is gone. Thanks a lot for the fast fix!

          Dirk Kuypers added a comment - Just a short feedback: Jenkins ver. 1.560-SNAPSHOT (private-04/10/2014 15:41 GMT-jenkins) is running since about 8 hours now without a deadlock, so I think the bug is gone. Thanks a lot for the fast fix!

            jglick Jesse Glick
            bruce Dirk Kuypers
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: