Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-24763

Long running OldDataMonitor.doDiscard() results in thread starvation

      A very large set of OldData can take a long time to discard. While this is running, most Jenkins operations which save any data are blocked.

      https://gist.github.com/recampbell/9336d3a32270e75a9333

      doDiscard is synchronized and hudson.diagnosis.OldDataMonitor#remove also wants to lock OldDataMonitor. Most threads eventually lock when saving something, at least until the doDiscard eventually completes.

          [JENKINS-24763] Long running OldDataMonitor.doDiscard() results in thread starvation

          Daniel Beck added a comment -

          Appears to be a rather minor issue (with the workaround being to wait for ODM to finish).

          In what way is this a livelock?

          Daniel Beck added a comment - Appears to be a rather minor issue (with the workaround being to wait for ODM to finish). In what way is this a livelock ?

          Ryan Campbell added a comment -

          In the meantime, no build can finish, and no slaves can come online. In the incident observed, the discard was taking more than an hour and Jenkins could not complete builds or do basically any other operation.

          It's unacceptable for Jenkins to be inoperable for minutes at a time, hence a higher severity.

          I suppose this is really just "starvation' instead of a livelock per se. Will update the description.

          Ryan Campbell added a comment - In the meantime, no build can finish, and no slaves can come online. In the incident observed, the discard was taking more than an hour and Jenkins could not complete builds or do basically any other operation. It's unacceptable for Jenkins to be inoperable for minutes at a time, hence a higher severity. I suppose this is really just "starvation' instead of a livelock per se. Will update the description.

          Ryan Campbell added a comment -

          Ryan Campbell added a comment - Pull request: https://github.com/jenkinsci/jenkins/pull/1399

          Jesse Glick added a comment -

          What makes you think this is really a problem in OldDataMonitor? I look at the thread dump and I see JobConfigHistorySaveableListener being likely at fault.

          Jesse Glick added a comment - What makes you think this is really a problem in OldDataMonitor ? I look at the thread dump and I see JobConfigHistorySaveableListener being likely at fault.

          Daniel Beck added a comment -

          FWIW JobConfigHistorySaveableListener takes forever in my experience if you enable the deduplication option ("Do not save duplicate history"). I've long disabled it for that reason.

          Daniel Beck added a comment - FWIW JobConfigHistorySaveableListener takes forever in my experience if you enable the deduplication option ("Do not save duplicate history"). I've long disabled it for that reason.

          Ryan Campbell added a comment -

          I see Jesse's point about JobConfigHistorySaveableListener – it's certainly the root cause.

          Nevertheless, OldDataMonitor could be more robust by reducing the scope of it's lock – to better handle other poorly performing SaveableListeners. So I'll change this to an Improvement if that suits you.

          Ryan Campbell added a comment - I see Jesse's point about JobConfigHistorySaveableListener – it's certainly the root cause. Nevertheless, OldDataMonitor could be more robust by reducing the scope of it's lock – to better handle other poorly performing SaveableListeners. So I'll change this to an Improvement if that suits you.

          Jesse Glick added a comment -

          Makes sense.

          Jesse Glick added a comment - Makes sense.

          Ryan Campbell added a comment -

          Ryan Campbell added a comment - Latest pull request: https://github.com/jenkinsci/jenkins/pull/1402

          Code changed in jenkins
          User: Ryan Campbell
          Path:
          core/src/main/java/hudson/diagnosis/OldDataMonitor.java
          test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java
          http://jenkins-ci.org/commit/jenkins/423b51a43a3c2f469d12695c5fddcda52e97159e
          Log:
          JENKINS-24763 Prevent thread starvation in OldDataMonitor by reducing scope of synchronization

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Ryan Campbell Path: core/src/main/java/hudson/diagnosis/OldDataMonitor.java test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java http://jenkins-ci.org/commit/jenkins/423b51a43a3c2f469d12695c5fddcda52e97159e Log: JENKINS-24763 Prevent thread starvation in OldDataMonitor by reducing scope of synchronization

          Code changed in jenkins
          User: Jesse Glick
          Path:
          core/src/main/java/hudson/diagnosis/OldDataMonitor.java
          test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java
          http://jenkins-ci.org/commit/jenkins/ef6c04244744670650fb36a4482e74daf2094f26
          Log:
          Merge branch 'JENKINS-24763' of github.com:recampbell/jenkins

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: core/src/main/java/hudson/diagnosis/OldDataMonitor.java test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java http://jenkins-ci.org/commit/jenkins/ef6c04244744670650fb36a4482e74daf2094f26 Log: Merge branch ' JENKINS-24763 ' of github.com:recampbell/jenkins

          Code changed in jenkins
          User: Jesse Glick
          Path:
          changelog.html
          http://jenkins-ci.org/commit/jenkins/a8262d503e6af11feebac6005360d4db7639aff4
          Log:
          [FIXED JENKINS-24763] Noting merge of #1402.

          Compare: https://github.com/jenkinsci/jenkins/compare/3addbabf8afa...a8262d503e6a

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: changelog.html http://jenkins-ci.org/commit/jenkins/a8262d503e6af11feebac6005360d4db7639aff4 Log: [FIXED JENKINS-24763] Noting merge of #1402. Compare: https://github.com/jenkinsci/jenkins/compare/3addbabf8afa...a8262d503e6a

          dogfood added a comment -

          Integrated in jenkins_main_trunk #3738
          JENKINS-24763 Prevent thread starvation in OldDataMonitor by reducing scope of synchronization (Revision 423b51a43a3c2f469d12695c5fddcda52e97159e)
          [FIXED JENKINS-24763] Noting merge of #1402. (Revision a8262d503e6af11feebac6005360d4db7639aff4)

          Result = SUCCESS
          Ryan Campbell : 423b51a43a3c2f469d12695c5fddcda52e97159e
          Files :

          • test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java
          • core/src/main/java/hudson/diagnosis/OldDataMonitor.java

          Jesse Glick : a8262d503e6af11feebac6005360d4db7639aff4
          Files :

          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #3738 JENKINS-24763 Prevent thread starvation in OldDataMonitor by reducing scope of synchronization (Revision 423b51a43a3c2f469d12695c5fddcda52e97159e) [FIXED JENKINS-24763] Noting merge of #1402. (Revision a8262d503e6af11feebac6005360d4db7639aff4) Result = SUCCESS Ryan Campbell : 423b51a43a3c2f469d12695c5fddcda52e97159e Files : test/src/test/java/hudson/diagnosis/OldDataMonitorTest.java core/src/main/java/hudson/diagnosis/OldDataMonitor.java Jesse Glick : a8262d503e6af11feebac6005360d4db7639aff4 Files : changelog.html

          Could you please add this also to lts?

          Ireneusz Makowski added a comment - Could you please add this also to lts?

          Daniel Beck added a comment -

          The 1.580.x line is done with the RC to 1.580.3 posted, and it will be in 1.596.x already.

          Daniel Beck added a comment - The 1.580.x line is done with the RC to 1.580.3 posted, and it will be in 1.596.x already.

            Unassigned Unassigned
            recampbell Ryan Campbell
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: