Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67620

Priority Sorter Plugin Crashes Jenkins when Queue is large

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • prioritysorter-plugin
    • None
    • Jenkins version: 2.319.1
      Priority Sorter Plugin: 4.1.0

      Hi,

      I recently tried upgrading the priority sorter plugin from 3.6.0 to 4.1.0. This eventually caused Jenkins to become unresponsive and fail to load after about 4 hours of runtime.  The Jenkins process was still running when we investigated, but needed a `systemctl restart` to become responsive again.  After the restart, Jenkins again slowed and eventually started throwing 500 errors again after another 4-8 hours.  Downgrading the plugin back to version 3.6.0 brought it back to a place of stability.

          [JENKINS-67620] Priority Sorter Plugin Crashes Jenkins when Queue is large

          Mark Waite added a comment -

          Sorry that it failed like that. Pull requests that have been merged between 3.6.0 and 4.1.0 include:

          • 0862637 Merge pull request #110 from MarkEWaite/embrace-divBasedFormLayout
          • b551e69 Merge pull request #109 from NotMyFault/chore/master/prep-for-icon-removal-from-core
          • c7e8dce Merge pull request #108 from jenkinsci/dependabot/maven/master/io.jenkins.tools.incrementals-git-changelist-maven-extension-1.3
          • 9493353 Merge pull request #105 from MarkEWaite/prevent-unused-imports
          • b143ff7 Merge pull request #104 from MarkEWaite/fix-spelling
          • 78f0440 Merge pull request #103 from jenkinsci/dependabot/maven/master/pmdVersion-6.41.0
          • b235f81 Merge pull request #100 from MarkEWaite/release-drafter-use-patch-version
          • c71d7d6 Merge pull request #102 from MarkEWaite/remove-script-security-exclusion
          • 9c35ae8 Merge pull request #101 from MarkEWaite/complete-all-tests-in-ci
          • 0def6bd Merge pull request #99 from MarkEWaite/fix-dependabot
          • 94de890 Merge pull request #97 from MarkEWaite/require-newer-jenkins
          • a07b2e7 Merge pull request #96 from MarkEWaite/automate-pr-labeling
          • 3560aef Merge pull request #95 from MarkEWaite/link-to-report-an-issue
          • 9fcc8d5 Merge pull request #92 from MarkEWaite/enable-pmd-checks
          • cc7cf5e Merge pull request #94 from MarkEWaite/automate-dependency-checks
          • 8974524 Merge pull request #93 from MarkEWaite/scm-use-https-protocol
          • f3e8efd Merge pull request #91 from MarkEWaite/add-spotbugs-checks
          • 7bffeba Merge pull request #90 from MarkEWaite/update-author-info
          • 6d0f518 Merge pull request #89 from MarkEWaite/add-developer
          • 5b29865 Merge pull request #85 from HelderMagalhaes/patch-1 (spelling fixes and formatting improvements)
          • 7abcd38 Merge pull request #79 from jenkinsci/dependabot/maven/master/io.jenkins.tools.bom-bom-2.235.x-918.vae501d2cdc99
          • 3137bab Merge pull request #86 from jenkinsci/dependabot/maven/master/org.jenkins-ci.plugins-plugin-4.31
          • 89b5275 Merge pull request #87 from jenkinsci/dependabot/maven/master/org.jenkins-ci.plugins-nested-view-1.22
          • 76bd185 Merge pull request #88 from MarkEWaite/fix-form-save
          • 9507de4 Merge pull request #59 from jenkinsci/getinstance
          • aaf7efe Merge pull request #58 from jenkinsci/changelog-ghr
          • 7ccb38e Merge pull request #56 from jenkinsci/dependabot/maven/master/io.jenkins.tools.bom-bom-2.235.x-25
          • dc3ef52 Merge pull request #55 from jenkinsci/dependabot/maven/master/org.codehaus.mojo-versions-maven-plugin-2.8.1
          • 29ce726 Merge pull request #54 from jenkinsci/chore-bots
          • 0ca804e Merge pull request #52 from timja/JENKINS-64694-fix-plugin (UI form changes)
          • ae64bdf Merge pull request #51 from timja/refresh
          • e7dfdb2 Merge pull request #49 from LinuxSuRen/i10n-zh
          • b439ade Merge pull request #45 from olivergondza/add-jenkinsfile

          I don't see any of those that seem obviously like they would cause a performance regression or a dramatic change of behavior. Dependencies are updated, UI forms are updated, and there are some changes in https://github.com/jenkinsci/priority-sorter-plugin/pull/51/files that are intended to remain binary compatible.

          Mark Waite added a comment - Sorry that it failed like that. Pull requests that have been merged between 3.6.0 and 4.1.0 include: 0862637 Merge pull request #110 from MarkEWaite/embrace-divBasedFormLayout b551e69 Merge pull request #109 from NotMyFault/chore/master/prep-for-icon-removal-from-core c7e8dce Merge pull request #108 from jenkinsci/dependabot/maven/master/io.jenkins.tools.incrementals-git-changelist-maven-extension-1.3 9493353 Merge pull request #105 from MarkEWaite/prevent-unused-imports b143ff7 Merge pull request #104 from MarkEWaite/fix-spelling 78f0440 Merge pull request #103 from jenkinsci/dependabot/maven/master/pmdVersion-6.41.0 b235f81 Merge pull request #100 from MarkEWaite/release-drafter-use-patch-version c71d7d6 Merge pull request #102 from MarkEWaite/remove-script-security-exclusion 9c35ae8 Merge pull request #101 from MarkEWaite/complete-all-tests-in-ci 0def6bd Merge pull request #99 from MarkEWaite/fix-dependabot 94de890 Merge pull request #97 from MarkEWaite/require-newer-jenkins a07b2e7 Merge pull request #96 from MarkEWaite/automate-pr-labeling 3560aef Merge pull request #95 from MarkEWaite/link-to-report-an-issue 9fcc8d5 Merge pull request #92 from MarkEWaite/enable-pmd-checks cc7cf5e Merge pull request #94 from MarkEWaite/automate-dependency-checks 8974524 Merge pull request #93 from MarkEWaite/scm-use-https-protocol f3e8efd Merge pull request #91 from MarkEWaite/add-spotbugs-checks 7bffeba Merge pull request #90 from MarkEWaite/update-author-info 6d0f518 Merge pull request #89 from MarkEWaite/add-developer 5b29865 Merge pull request #85 from HelderMagalhaes/patch-1 (spelling fixes and formatting improvements) 7abcd38 Merge pull request #79 from jenkinsci/dependabot/maven/master/io.jenkins.tools.bom-bom-2.235.x-918.vae501d2cdc99 3137bab Merge pull request #86 from jenkinsci/dependabot/maven/master/org.jenkins-ci.plugins-plugin-4.31 89b5275 Merge pull request #87 from jenkinsci/dependabot/maven/master/org.jenkins-ci.plugins-nested-view-1.22 76bd185 Merge pull request #88 from MarkEWaite/fix-form-save 9507de4 Merge pull request #59 from jenkinsci/getinstance aaf7efe Merge pull request #58 from jenkinsci/changelog-ghr 7ccb38e Merge pull request #56 from jenkinsci/dependabot/maven/master/io.jenkins.tools.bom-bom-2.235.x-25 dc3ef52 Merge pull request #55 from jenkinsci/dependabot/maven/master/org.codehaus.mojo-versions-maven-plugin-2.8.1 29ce726 Merge pull request #54 from jenkinsci/chore-bots 0ca804e Merge pull request #52 from timja/ JENKINS-64694 -fix-plugin (UI form changes) ae64bdf Merge pull request #51 from timja/refresh e7dfdb2 Merge pull request #49 from LinuxSuRen/i10n-zh b439ade Merge pull request #45 from olivergondza/add-jenkinsfile I don't see any of those that seem obviously like they would cause a performance regression or a dramatic change of behavior. Dependencies are updated, UI forms are updated, and there are some changes in https://github.com/jenkinsci/priority-sorter-plugin/pull/51/files that are intended to remain binary compatible.

          Andrew Savino added a comment -

          I agree, looking at the diff, nothing immediately looks suspicious.  I'm not the most familiar with Jenkins -are there logs that might help point to the cause of the slowdown/crash?

          Andrew Savino added a comment - I agree, looking at the diff, nothing immediately looks suspicious.  I'm not the most familiar with Jenkins -are there logs that might help point to the cause of the slowdown/crash?

          Mark Waite added a comment -

          Might be in the Jenkins log file that is written to disc. Usually at /var/log/jenkins/*

          Mark Waite added a comment - Might be in the Jenkins log file that is written to disc. Usually at /var/log/jenkins/*

          Dan Hewitt added a comment -

          I think we observed this yesterday. After a few hours, jenkins became unable to schedule new work (either through cron triggers or POST requests). Threads were started but would hang and Jetty would never return to the user in the case of manually started jobs. We didn't find anything relevant in the logs and after restarting jenkins we saw the same issue within a few hours. We're looking into a reproducer but I'm not confident we'll be able to simulate the load.

          We have a similar version of jenkins and were not actively using Priority Sorter (it had been installed the day before).

          Dan Hewitt added a comment - I think we observed this yesterday. After a few hours, jenkins became unable to schedule new work (either through cron triggers or POST requests). Threads were started but would hang and Jetty would never return to the user in the case of manually started jobs. We didn't find anything relevant in the logs and after restarting jenkins we saw the same issue within a few hours. We're looking into a reproducer but I'm not confident we'll be able to simulate the load. We have a similar version of jenkins and were not actively using Priority Sorter (it had been installed the day before).

          Mark Waite added a comment -

          tapdancingrodent thanks for reporting the issue. I reviewed the pull request https://github.com/jenkinsci/priority-sorter-plugin/pull/51/files that was mentioned in a previous comment and found nothing suspicious. If more details are available, please share them in case it helps others with this type of failure.

          Mark Waite added a comment - tapdancingrodent thanks for reporting the issue. I reviewed the pull request https://github.com/jenkinsci/priority-sorter-plugin/pull/51/files that was mentioned in a previous comment and found nothing suspicious. If more details are available, please share them in case it helps others with this type of failure.

          Dan Hewitt added a comment -

          Unfortunately it would seem like queue length isn’t the only factor, here. When we first observed this issue it was with 45-50 busy built-in executors and a modest queue (<50 jobs). We absolutely thrashed our dev instance (exact same container as saw the failures in prod, test cases below) with simulated workloads but were unable to replicate the problem with version 4.1.

          100 busy built-in executors, 300+ jobs in the queue.
          100 busy EC2 executors, ~100 jobs in the queue.

          Based on this, you can probably rule out queue size, overall load, and rate of jobs entering the queue as root causes - maybe another plugin is interacting badly with Priorty Sorter and causing it to enter a hanging state? I've attached a list of plugin versions we're using with jenkins at `2.319.2`.

          plugins.txt

          Dan Hewitt added a comment - Unfortunately it would seem like queue length isn’t the only factor, here. When we first observed this issue it was with 45-50 busy built-in executors and a modest queue (<50 jobs). We absolutely thrashed our dev instance (exact same container as saw the failures in prod, test cases below) with simulated workloads but were unable to replicate the problem with version 4.1. 100 busy built-in executors, 300+ jobs in the queue. 100 busy EC2 executors, ~100 jobs in the queue. Based on this, you can probably rule out queue size, overall load, and rate of jobs entering the queue as root causes - maybe another plugin is interacting badly with Priorty Sorter and causing it to enter a hanging state? I've attached a list of plugin versions we're using with jenkins at `2.319.2`. plugins.txt

            Unassigned Unassigned
            andrewsavino Andrew Savino
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: