Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-58684

Linux max task resource exhaustion after excessive timer thread creation

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: sse-gateway-plugin
    • Labels:
    • Environment:
      Jenkins 2.176.2 using Ubuntu 18.04 package file from pkg.jenkins.io/debian-stable
      AMD Threadripper 32-core/64-core machine with 128GB of RAM
      Single master node with 20 executors
    • Similar Issues:

      Description

      Jenkins is exhausting resources on my machine seemingly to create 10k + timer threads. This thread creation then hits the systemd imposed 15% of task limit set by the kernel (/proc/sys/kernel/pid_max = 131072) which is 19660 by default on my system.

      This particular java heap dump shows that 4675 of the total 4877 match the expression Timer-.* and have the Context Class Loader -> org.jenkinsci.plugins.workflow.cps.CpsGroovyShell$CleanGroovyClassLoader.

      Bottom right corner of my Jenkins UI confirms I'm running the latest version (Jenkins ver. 2.176.2)with all plugins updated on OpenJDK Runtime Environment (build 1.8.0_222-b05) with Ubuntu 18.04 as of 2019.07.23.

       

      To dig deeper, after the resource limit is hit, threads fail to create and Jenkins becomes unstable until restarted, logs contain many instances of the following:

      SEVERE: Timer task hudson.model.Queue$MaintainTask@621715d8 failed
      java.lang.OutOfMemoryError: unable to create new native thread
      

       
      NetData on the system shows that if it doesn't hit the thread limit it will spike and often recover correctly suggesting the threads are cleaned up at some point if it doesn't derail the process first.

      I've increased the systemd limit on the system and the problem hasn't derailed the jenkins process in a few days by adding the following to the systemd service file:

      TasksMax=32768
      

      I have been capturing heaps with some automation tools and catch several heaps exceeding 20k threads in the last few days signifying that the process is still occurring and likely affecting other users, they probably aren't noticing though.

       

      Not sure if this is related to a plugin (listed below with version) or Jenkins core code?

       

      [
        {
          "plugin": "Apache HttpComponents Client 4.x API Plugin (apache-httpcomponents-client-4-api)",
          "version": "4.5.5-3.0"
        },
        {
          "plugin": "Authentication Tokens API Plugin (authentication-tokens)",
          "version": "1.3"
        },
        {
          "plugin": "Authorize Project (authorize-project)",
          "version": "1.3.0"
        },
        {
          "plugin": "Autofavorite for Blue Ocean (blueocean-autofavorite)",
          "version": "1.2.4"
        },
        {
          "plugin": "Bitbucket Branch Source Plugin (cloudbees-bitbucket-branch-source)",
          "version": "2.4.5"
        },
        {
          "plugin": "Bitbucket Pipeline for Blue Ocean (blueocean-bitbucket-pipeline)",
          "version": "1.17.0"
        },
        {
          "plugin": "Bitbucket Plugin (bitbucket)",
          "version": "1.1.10"
        },
        {
          "plugin": "Blue Ocean (blueocean)",
          "version": "1.17.0"
        },
        {
          "plugin": "Blue Ocean Core JS (blueocean-core-js)",
          "version": "1.17.0"
        },
        {
          "plugin": "Blue Ocean Executor Info (blueocean-executor-info)",
          "version": "1.17.0"
        },
        {
          "plugin": "Blue Ocean Pipeline Editor (blueocean-pipeline-editor)",
          "version": "1.17.0"
        },
        {
          "plugin": "Branch API Plugin (branch-api)",
          "version": "2.5.3"
        },
        {
          "plugin": "Build Timeout (build-timeout)",
          "version": "1.19"
        },
        {
          "plugin": "Command Agent Launcher Plugin (command-launcher)",
          "version": "1.3"
        },
        {
          "plugin": "Common API for Blue Ocean (blueocean-commons)",
          "version": "1.17.0"
        },
        {
          "plugin": "Conditional BuildStep (conditional-buildstep)",
          "version": "1.3.6"
        },
        {
          "plugin": "Config API for Blue Ocean (blueocean-config)",
          "version": "1.17.0"
        },
        {
          "plugin": "Copy Artifact Plugin (copyartifact)",
          "version": "1.42.1"
        },
        {
          "plugin": "Credentials Binding Plugin (credentials-binding)",
          "version": "1.19"
        },
        {
          "plugin": "Credentials Plugin (credentials)",
          "version": "2.2.0"
        },
        {
          "plugin": "Dashboard for Blue Ocean (blueocean-dashboard)",
          "version": "1.17.0"
        },
        {
          "plugin": "Design Language (jenkins-design-language)",
          "version": "1.17.0"
        },
        {
          "plugin": "Display URL API (display-url-api)",
          "version": "2.3.1"
        },
        {
          "plugin": "Display URL for Blue Ocean (blueocean-display-url)",
          "version": "2.3.0"
        },
        {
          "plugin": "Docker Commons Plugin (docker-commons)",
          "version": "1.15"
        },
        {
          "plugin": "Docker Pipeline (docker-workflow)",
          "version": "1.18"
        },
        {
          "plugin": "Durable Task Plugin (durable-task)",
          "version": "1.30"
        },
        {
          "plugin": "Email Extension Plugin (email-ext)",
          "version": "2.66"
        },
        {
          "plugin": "Events API for Blue Ocean (blueocean-events)",
          "version": "1.17.0"
        },
        {
          "plugin": "External Monitor Job Type Plugin (external-monitor-job)",
          "version": "1.7"
        },
        {
          "plugin": "Favorite (favorite)",
          "version": "2.3.2"
        },
        {
          "plugin": "Folders Plugin (cloudbees-folder)",
          "version": "6.9"
        },
        {
          "plugin": "GIT server Plugin (git-server)",
          "version": "1.7"
        },
        {
          "plugin": "Git Pipeline for Blue Ocean (blueocean-git-pipeline)",
          "version": "1.17.0"
        },
        {
          "plugin": "Git client plugin (git-client)",
          "version": "3.0.0-rc"
        },
        {
          "plugin": "Git plugin (git)",
          "version": "4.0.0-rc"
        },
        {
          "plugin": "GitHub API Plugin (github-api)",
          "version": "1.95"
        },
        {
          "plugin": "GitHub Branch Source Plugin (github-branch-source)",
          "version": "2.5.4"
        },
        {
          "plugin": "GitHub Pipeline for Blue Ocean (blueocean-github-pipeline)",
          "version": "1.17.0"
        },
        {
          "plugin": "GitHub plugin (github)",
          "version": "1.29.4"
        },
        {
          "plugin": "Google Login Plugin (google-login)",
          "version": "1.6"
        },
        {
          "plugin": "Gradle Plugin (gradle)",
          "version": "1.33"
        },
        {
          "plugin": "HTML Publisher plugin (htmlpublisher)",
          "version": "1.18"
        },
        {
          "plugin": "HTTP Request Plugin (http_request)",
          "version": "1.8.23"
        },
        {
          "plugin": "Handy Uri Templates 2.x API Plugin (handy-uri-templates-2-api)",
          "version": "2.1.7-1.0"
        },
        {
          "plugin": "Icon Shim Plugin (icon-shim)",
          "version": "2.0.3"
        },
        {
          "plugin": "JIRA Integration for Blue Ocean (blueocean-jira)",
          "version": "1.17.0"
        },
        {
          "plugin": "JIRA plugin (jira)",
          "version": "3.0.8"
        },
        {
          "plugin": "JSch dependency plugin (jsch)",
          "version": "0.1.55"
        },
        {
          "plugin": "JUnit Plugin (junit)",
          "version": "1.28"
        },
        {
          "plugin": "JWT for Blue Ocean (blueocean-jwt)",
          "version": "1.17.0"
        },
        {
          "plugin": "Jackson 2 API Plugin (jackson2-api)",
          "version": "2.9.9.1"
        },
        {
          "plugin": "JavaScript GUI Lib: ACE Editor bundle plugin (ace-editor)",
          "version": "1.1"
        },
        {
          "plugin": "JavaScript GUI Lib: Handlebars bundle plugin (handlebars)",
          "version": "1.1.1"
        },
        {
          "plugin": "JavaScript GUI Lib: Moment.js bundle plugin (momentjs)",
          "version": "1.1.1"
        },
        {
          "plugin": "JavaScript GUI Lib: jQuery bundles (jQuery and jQuery UI) plugin (jquery-detached)",
          "version": "1.2.1"
        },
        {
          "plugin": "Javadoc Plugin (javadoc)",
          "version": "1.5"
        },
        {
          "plugin": "LDAP Plugin (ldap)",
          "version": "1.20"
        },
        {
          "plugin": "Lockable Resources plugin (lockable-resources)",
          "version": "2.5"
        },
        {
          "plugin": "Mailer Plugin (mailer)",
          "version": "1.23"
        },
        {
          "plugin": "MapDB API Plugin (mapdb-api)",
          "version": "1.0.9.0"
        },
        {
          "plugin": "Matrix Authorization Strategy Plugin (matrix-auth)",
          "version": "2.4.2"
        },
        {
          "plugin": "Matrix Project Plugin (matrix-project)",
          "version": "1.14"
        },
        {
          "plugin": "Maven Integration plugin (maven-plugin)",
          "version": "3.3"
        },
        {
          "plugin": "Mercurial plugin (mercurial)",
          "version": "2.7"
        },
        {
          "plugin": "Metrics Plugin (metrics)",
          "version": "4.0.2.5"
        },
        {
          "plugin": "OWASP Markup Formatter Plugin (antisamy-markup-formatter)",
          "version": "1.5"
        },
        {
          "plugin": "Oracle Java SE Development Kit Installer Plugin (jdk-tool)",
          "version": "1.3"
        },
        {
          "plugin": "PAM Authentication plugin (pam-auth)",
          "version": "1.5.1"
        },
        {
          "plugin": "Personalization for Blue Ocean (blueocean-personalization)",
          "version": "1.17.0"
        },
        {
          "plugin": "Pipeline (workflow-aggregator)",
          "version": "2.6"
        },
        {
          "plugin": "Pipeline Graph Analysis Plugin (pipeline-graph-analysis)",
          "version": "1.10"
        },
        {
          "plugin": "Pipeline SCM API for Blue Ocean (blueocean-pipeline-scm-api)",
          "version": "1.17.0"
        },
        {
          "plugin": "Pipeline Utility Steps (pipeline-utility-steps)",
          "version": "2.3.0"
        },
        {
          "plugin": "Pipeline implementation for Blue Ocean (blueocean-pipeline-api-impl)",
          "version": "1.17.0"
        },
        {
          "plugin": "Pipeline: API (workflow-api)",
          "version": "2.35"
        },
        {
          "plugin": "Pipeline: Basic Steps (workflow-basic-steps)",
          "version": "2.18"
        },
        {
          "plugin": "Pipeline: Build Step (pipeline-build-step)",
          "version": "2.9"
        },
        {
          "plugin": "Pipeline: Declarative (pipeline-model-definition)",
          "version": "1.3.9"
        },
        {
          "plugin": "Pipeline: Declarative Agent API (pipeline-model-declarative-agent)",
          "version": "1.1.1"
        },
        {
          "plugin": "Pipeline: Declarative Extension Points API (pipeline-model-extensions)",
          "version": "1.3.9"
        },
        {
          "plugin": "Pipeline: GitHub Groovy Libraries (pipeline-github-lib)",
          "version": "1.0"
        },
        {
          "plugin": "Pipeline: Groovy (workflow-cps)",
          "version": "2.72"
        },
        {
          "plugin": "Pipeline: Input Step (pipeline-input-step)",
          "version": "2.10"
        },
        {
          "plugin": "Pipeline: Job (workflow-job)",
          "version": "2.33"
        },
        {
          "plugin": "Pipeline: Milestone Step (pipeline-milestone-step)",
          "version": "1.3.1"
        },
        {
          "plugin": "Pipeline: Model API (pipeline-model-api)",
          "version": "1.3.9"
        },
        {
          "plugin": "Pipeline: Multibranch (workflow-multibranch)",
          "version": "2.21"
        },
        {
          "plugin": "Pipeline: Nodes and Processes (workflow-durable-task-step)",
          "version": "2.32"
        },
        {
          "plugin": "Pipeline: REST API Plugin (pipeline-rest-api)",
          "version": "2.11"
        },
        {
          "plugin": "Pipeline: SCM Step (workflow-scm-step)",
          "version": "2.9"
        },
        {
          "plugin": "Pipeline: Shared Groovy Libraries (workflow-cps-global-lib)",
          "version": "2.14"
        },
        {
          "plugin": "Pipeline: Stage Step (pipeline-stage-step)",
          "version": "2.3"
        },
        {
          "plugin": "Pipeline: Stage Tags Metadata (pipeline-stage-tags-metadata)",
          "version": "1.3.9"
        },
        {
          "plugin": "Pipeline: Stage View Plugin (pipeline-stage-view)",
          "version": "2.11"
        },
        {
          "plugin": "Pipeline: Step API (workflow-step-api)",
          "version": "2.20"
        },
        {
          "plugin": "Pipeline: Supporting APIs (workflow-support)",
          "version": "3.3"
        },
        {
          "plugin": "Plain Credentials Plugin (plain-credentials)",
          "version": "1.5"
        },
        {
          "plugin": "Pub-Sub \"light\" Bus (pubsub-light)",
          "version": "1.12"
        },
        {
          "plugin": "REST API for Blue Ocean (blueocean-rest)",
          "version": "1.17.0"
        },
        {
          "plugin": "REST Implementation for Blue Ocean (blueocean-rest-impl)",
          "version": "1.17.0"
        },
        {
          "plugin": "Resource Disposer Plugin (resource-disposer)",
          "version": "0.13"
        },
        {
          "plugin": "Run Condition Plugin (run-condition)",
          "version": "1.2"
        },
        {
          "plugin": "SCM API Plugin (scm-api)",
          "version": "2.6.3"
        },
        {
          "plugin": "SSH Agent Plugin (ssh-agent)",
          "version": "1.17"
        },
        {
          "plugin": "SSH Credentials Plugin (ssh-credentials)",
          "version": "1.17.1"
        },
        {
          "plugin": "SSH Slaves plugin (ssh-slaves)",
          "version": "1.30.1"
        },
        {
          "plugin": "Script Security Plugin (script-security)",
          "version": "1.61"
        },
        {
          "plugin": "Server Sent Events (SSE) Gateway Plugin (sse-gateway)",
          "version": "1.18"
        },
        {
          "plugin": "Skip Notifications Trait plugin (skip-notifications-trait)",
          "version": "1.0.3"
        },
        {
          "plugin": "Structs Plugin (structs)",
          "version": "1.19"
        },
        {
          "plugin": "Subversion Plug-in (subversion)",
          "version": "2.12.2"
        },
        {
          "plugin": "Timestamper (timestamper)",
          "version": "1.10"
        },
        {
          "plugin": "Token Macro Plugin (token-macro)",
          "version": "2.8"
        },
        {
          "plugin": "Trilead API Plugin (trilead-api)",
          "version": "1.0.3"
        },
        {
          "plugin": "Variant Plugin (variant)",
          "version": "1.2"
        },
        {
          "plugin": "WMI Windows Agents Plugin (windows-slaves)",
          "version": "1.4"
        },
        {
          "plugin": "Web for Blue Ocean (blueocean-web)",
          "version": "1.17.0"
        },
        {
          "plugin": "Workspace Cleanup Plugin (ws-cleanup)",
          "version": "0.37"
        },
        {
          "plugin": "bouncycastle API Plugin (bouncycastle-api)",
          "version": "2.17"
        },
        {
          "plugin": "i18n for Blue Ocean (blueocean-i18n)",
          "version": "1.17.0"
        }
      ]
      

        Attachments

          Issue Links

            Activity

            2bluesc Kyle Manna created issue -
            Hide
            2bluesc Kyle Manna added a comment -

            Originally reported in JENKINS-57725 as it seems related to the changelog entry for v2.176.2 LTS release.

            Show
            2bluesc Kyle Manna added a comment - Originally reported in JENKINS-57725 as it seems related to the changelog entry for v2.176.2 LTS release.
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Like I mentioned on the other issue, the threads that are leaking are instances of java.util.TimerThread with the automatically generated name. I think this is an issue in some plugin, since the only instance of java.util.Timer in core is named "Jenkins cron thread"

            I would search the jenkinsci organization on GitHub for all uses of java.util.Timer, cross-check that with the set of plugins you have installed, and then examine the ways that the plugins you have installed are using java.util.Timer based on the search. My guess is that something is repeatedly calling new Timer rather than storing a single timer somewhere and reusing it (maybe also better to switch the plugin to using an ExecutorService if that is the problem).

            The fact that these threads context class loader's are a CpsGroovyShell$CleanGroovyClassLoader means that I would first check plugins that implement Pipeline steps or are used by Pipelines on your instance in some way. It might also be possible that you have a Pipeline shared library that is creating these timers, so if you can't find anything in a plugin, that's where I would look next.

            FWIW based on a quick check of the search I didn't see anything obvious (though there was some suspicious stuff in sse-gateway plugin), so you are probably going to need to spend some time trying to isolate the issue, figure out what changed recently on your instance in terms of updates/Pipelines/Shared Libraries, see if you have the same problem happens on older versions of Jenkins or plugins, etc.

            Show
            dnusbaum Devin Nusbaum added a comment - Like I mentioned on the other issue, the threads that are leaking are instances of java.util.TimerThread with the automatically generated name. I think this is an issue in some plugin, since the only instance of java.util.Timer in core is named "Jenkins cron thread" I would search the jenkinsci organization on GitHub for all uses of java.util.Timer , cross-check that with the set of plugins you have installed, and then examine the ways that the plugins you have installed are using java.util.Timer based on the search. My guess is that something is repeatedly calling new Timer rather than storing a single timer somewhere and reusing it (maybe also better to switch the plugin to using an ExecutorService if that is the problem). The fact that these threads context class loader's are a CpsGroovyShell$CleanGroovyClassLoader means that I would first check plugins that implement Pipeline steps or are used by Pipelines on your instance in some way. It might also be possible that you have a Pipeline shared library that is creating these timers, so if you can't find anything in a plugin, that's where I would look next. FWIW based on a quick check of the search I didn't see anything obvious (though there was some suspicious stuff in sse-gateway plugin), so you are probably going to need to spend some time trying to isolate the issue, figure out what changed recently on your instance in terms of updates/Pipelines/Shared Libraries, see if you have the same problem happens on older versions of Jenkins or plugins, etc.
            Hide
            mellowplace Rob Graham added a comment -

            We have also had an issue that looks extremely similar to this, still investigating but we did what Devin Nusbaum recommended and have possibly tracked it to the fix for this issue https://issues.jenkins-ci.org/browse/JENKINS-51057 in the sse-gateway-plugin which is a dependency of BlueOcean (a plugin I notice you also have Kyle Manna)

            I was able to accelarate thread creation by kicking off a very simple build multiple times and following progress with the BlueOcean UI.

            Show
            mellowplace Rob Graham added a comment - We have also had an issue that looks extremely similar to this, still investigating but we did what Devin Nusbaum recommended and have possibly tracked it to the fix for this issue  https://issues.jenkins-ci.org/browse/JENKINS-51057 in the sse-gateway-plugin which is a dependency of BlueOcean (a plugin I notice you also have Kyle Manna ) I was able to accelarate thread creation by kicking off a very simple build multiple times and following progress with the BlueOcean UI.
            Hide
            mellowplace Rob Graham added a comment -

            Downgrading to BlueOcean 1.17 (which in turn uses sse-gateway 1.17) appears to have resolved our issue

            Show
            mellowplace Rob Graham added a comment - Downgrading to BlueOcean 1.17 (which in turn uses sse-gateway 1.17) appears to have resolved our issue
            Hide
            dnusbaum Devin Nusbaum added a comment -

            FWIW I opened https://github.com/jenkinsci/sse-gateway-plugin/pull/34 to give the timer threads in sse-gateway meaningful names to make it easier to understand whether that plugin really is the problem (I don't know what that plugin actually does).

            Show
            dnusbaum Devin Nusbaum added a comment - FWIW I opened https://github.com/jenkinsci/sse-gateway-plugin/pull/34 to give the timer threads in sse-gateway meaningful names to make it easier to understand whether that plugin really is the problem (I don't know what that plugin actually does).
            Hide
            mellowplace Rob Graham added a comment - - edited

            I think I have spoken too soon, our problem may still be there. Frustratingly we have evidence that BlueOcean 1.17 (which we were running for a couple of weeks without problem) used sse-gateway 1.17 but installing BlueOcean 1.17 now seems to pull in sse-gateway 1.18


            Update: We run Jenkins in Docker and the version issues look like they might be a bug in the install-plugins.sh script. I've explicitly pinned back blueocean and all associated plugins to 1.17 properly now and will update in a couple of days if our threads issue is resolved or not.

            Show
            mellowplace Rob Graham added a comment - - edited I think I have spoken too soon, our problem may still be there. Frustratingly we have evidence that BlueOcean 1.17 (which we were running for a couple of weeks without problem) used sse-gateway 1.17 but installing BlueOcean 1.17 now seems to pull in sse-gateway 1.18 — Update: We run Jenkins in Docker and the version issues look like they might be a bug in the install-plugins.sh script. I've explicitly pinned back blueocean and all associated plugins to 1.17 properly now and will update in a couple of days if our threads issue is resolved or not.
            Hide
            kpritam Pritam Kadam added a comment -

            I did face this issue today and it seemed like `sse-gateway` thread leak issue. Earlier jenkins was creating threads exponentially on every build but once I updated all the plugins, issue got resolved. Now it uses on an average overall 100 threads.

            And when I monitored all the threads, they were all named with `Timer-$id`

            Show
            kpritam Pritam Kadam added a comment - I did face this issue today and it seemed like `sse-gateway` thread leak issue. Earlier jenkins was creating threads exponentially on every build but once I updated all the plugins, issue got resolved. Now it uses on an average overall 100 threads. And when I monitored all the threads, they were all named with `Timer-$id`
            Hide
            olamy Olivier Lamy added a comment -

            with just released 1.19 the thread will have a special name so we can identify if this is the culprit.

            Please let us know. Thanks

            Show
            olamy Olivier Lamy added a comment - with just released 1.19 the thread will have a special name so we can identify if this is the culprit. Please let us know. Thanks
            Hide
            2bluesc Kyle Manna added a comment -

            with just released 1.19 the thread will have a special name so we can identify if this is the culprit.

            Thanks! Just installed sse-gateway 1.19, will monitor tomorrow and next week and report back.

            Show
            2bluesc Kyle Manna added a comment - with just released 1.19 the thread will have a special name so we can identify if this is the culprit. Thanks! Just installed sse-gateway 1.19, will monitor tomorrow and next week and report back.
            dnusbaum Devin Nusbaum made changes -
            Field Original Value New Value
            Component/s sse-gateway-plugin [ 21477 ]
            Component/s groovy-plugin [ 15549 ]
            dnusbaum Devin Nusbaum made changes -
            Assignee vjuranek [ vjuranek ]
            Hide
            mellowplace Rob Graham added a comment -

            I can confirm downgrade to BlueOcean 1.17 has fixed the threading issue - we do have a small memory leak now which I assume is caused also by the sse-gateway (given the change in 1.18 that has introduced this threading issue was a fix for a mem leak), but it's far more manageable than the threading issue.

            Show
            mellowplace Rob Graham added a comment - I can confirm downgrade to BlueOcean 1.17 has fixed the threading issue - we do have a small memory leak now which I assume is caused also by the sse-gateway (given the change in 1.18 that has introduced this threading issue was a fix for a mem leak), but it's far more manageable than the threading issue.
            oto Colin Chapman made changes -
            Attachment image-2019-08-05-17-17-46-130.png [ 48183 ]
            oto Colin Chapman made changes -
            Attachment thread-dump-ft.png [ 48184 ]
            Hide
            oto Colin Chapman added a comment - - edited

            We also noticed an increase in thread count following a jenkins update last month, after applying 1.19 and doing a thread dump, most of the threads in WAITING state are named EventDispatcher.retryProcessor :

            Show
            oto Colin Chapman added a comment - - edited We also noticed an increase in thread count following a jenkins update last month, after applying 1.19 and doing a thread dump, most of the threads in WAITING state are named EventDispatcher.retryProcessor :
            2bluesc Kyle Manna made changes -
            Hide
            2bluesc Kyle Manna added a comment -

            A heap dump from today where there were 20k+ threads shows that EventDispatcher.retryProcessor is what's affecting me as well.

             

            Show
            2bluesc Kyle Manna added a comment - A heap dump from today where there were 20k+ threads shows that EventDispatcher.retryProcessor is what's affecting me as well.  
            Hide
            olamy Olivier Lamy added a comment -

            I have started work. 

            Draft pr here: https://github.com/jenkinsci/sse-gateway-plugin/pull/35

            Not sure if you guys can test with a SNAPSHOT version?

            If yes it's available here: https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/sse-gateway-1.20-20190806.070938-1.hpi

             

            Show
            olamy Olivier Lamy added a comment - I have started work.  Draft pr here:  https://github.com/jenkinsci/sse-gateway-plugin/pull/35 Not sure if you guys can test with a SNAPSHOT version? If yes it's available here: https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/sse-gateway-1.20-20190806.070938-1.hpi  
            olamy Olivier Lamy made changes -
            Assignee Olivier Lamy [ olamy ]
            Hide
            olamy Olivier Lamy added a comment - - edited

            I wonder how many users do you have using BlueOcean UI guys when the problem happen?

            Anyone could use finest logging for logger : org.jenkinsci.plugins.ssegateway.sse.EventDispatcher and post the log file here?

            This would definitely help.

             

            Show
            olamy Olivier Lamy added a comment - - edited I wonder how many users do you have using BlueOcean UI guys when the problem happen? Anyone could use finest logging for logger : org.jenkinsci.plugins.ssegateway.sse.EventDispatcher and post the log file here? This would definitely help.  
            oto Colin Chapman made changes -
            Attachment logs-jenkins-06.08.txt [ 48219 ]
            Hide
            oto Colin Chapman added a comment -

            Hello Olivier,

            We don't hav that much users, I'd say max 15 concurrent users, you can see below the logs during a high threads episodes, the job that you can see on the logs is scheduled to run every 5 minutes :

            logs-jenkins-06.08.txt

             

             

            Show
            oto Colin Chapman added a comment - Hello Olivier, We don't hav that much users, I'd say max 15 concurrent users, you can see below the logs during a high threads episodes, the job that you can see on the logs is scheduled to run every 5 minutes : logs-jenkins-06.08.txt    
            Hide
            olamy Olivier Lamy added a comment -

            Colin Chapman thanks a lot I will read those logs maybe it can help

            by chance have you tried the snapshot?

            Show
            olamy Olivier Lamy added a comment - Colin Chapman thanks a lot I will read those logs maybe it can help by chance have you tried the snapshot?
            Hide
            oto Colin Chapman added a comment -

            I can't reproduce the issue on our staging jenkins and obviously I can't push a snapshot on production

            Show
            oto Colin Chapman added a comment - I can't reproduce the issue on our staging jenkins and obviously I can't push a snapshot on production
            Hide
            olamy Olivier Lamy added a comment -

            Colin Chapman I understand but THANKS! a lot for the testing.

            Looking at your log files I can see a lot of

            Error dispatching retry event to SSE channel 
            
            

            but the the reason is not printed and that could help but this need some changes in the code.

            I will keep you updated here if I need some help for testing

            Thanks again!

             

            Show
            olamy Olivier Lamy added a comment - Colin Chapman I understand but THANKS! a lot for the testing. Looking at your log files I can see a lot of Error dispatching retry event to SSE channel but the the reason is not printed and that could help but this need some changes in the code. I will keep you updated here if I need some help for testing Thanks again!  
            Hide
            olamy Olivier Lamy added a comment -

            I have pushed some changes with better logging of errors.

            SNAPSHOT available here 

            https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/sse-gateway-1.20-20190813.033926-2.hpi

             

            Show
            olamy Olivier Lamy added a comment - I have pushed some changes with better logging of errors. SNAPSHOT available here  https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/sse-gateway-1.20-20190813.033926-2.hpi  
            dnusbaum Devin Nusbaum made changes -
            Link This issue relates to JENKINS-51057 [ JENKINS-51057 ]
            2bluesc Kyle Manna made changes -
            Hide
            2bluesc Kyle Manna added a comment -

            Update: I installed "sse-gateway-1.20-20190813.033926-2.hpi" at roughly 1565760000 (shortly after the last peak above 20k) and have been recording data.  I've yet to see it peak with largely a similar development cycle and load on the build server.  I thought the sse-gateway update was only to improve logging?  Seems to have reduced the occurrence?

            X-axis is Unix timestamp

            Y-Axis is task cnt for jenkins cgroup

            I've avoided updating any plugins while testing the sse-gateway changes.

            Show
            2bluesc Kyle Manna added a comment - Update: I installed "sse-gateway-1.20-20190813.033926-2.hpi" at roughly 1565760000 (shortly after the last peak above 20k) and have been recording data.  I've yet to see it peak with largely a similar development cycle and load on the build server.  I thought the sse-gateway update was only to improve logging?  Seems to have reduced the occurrence? X-axis is Unix timestamp Y-Axis is task cnt for jenkins cgroup I've avoided updating any plugins while testing the sse-gateway changes.
            dnusbaum Devin Nusbaum made changes -
            Remote Link This issue links to "jenkinsci/sse-gateway-plugin#35 (Web Link)" [ 23416 ]
            Hide
            dnusbaum Devin Nusbaum added a comment -

            Kyle Manna That build is based on https://github.com/jenkinsci/sse-gateway-plugin/pull/35, which in addition the logging fixes, switches to using a thread pool with 4 threads where the original code was continually creating new Timer threads, so it should fix the thread leak (but maybe trade it off for a memory leak if the queue of retry events gets too big). I think the logging Olivier added was to try to figure out why so many events are failing to send and being added to the retry queue.

            That said, I think there is a bug in the version you have installed that means that the tasks in the thread pool never actually get retried, so events that fail to be sent are not actually resent. Does Blue Ocean seem to work correctly in that version? Maybe we should just completely remove all of the retry handling Olivier Lamy?

            Show
            dnusbaum Devin Nusbaum added a comment - Kyle Manna That build is based on  https://github.com/jenkinsci/sse-gateway-plugin/pull/35 , which in addition the logging fixes, switches to using a thread pool with 4 threads where the original code was continually creating new Timer threads, so it should fix the thread leak (but maybe trade it off for a memory leak if the queue of retry events gets too big). I think the logging Olivier added was to try to figure out why so many events are failing to send and being added to the retry queue. That said, I think there is a bug in the version you have installed that means that the tasks in the thread pool never actually get retried, so events that fail to be sent are not actually resent. Does Blue Ocean seem to work correctly in that version? Maybe we should just completely remove all of the retry handling Olivier Lamy ?
            Hide
            reinholdfuereder Reinhold Füreder added a comment - - edited

            I am pretty sure to have stumbled over this issue as well (using sse-gateway v1.19) today:

            • there are sometimes more than 1000 threads with name "EventDispatcher.retryProcessor", then again "just" ~250
            • and one pipeline build (on Jenkins master – yes, I know this is an antipattern) failed with "unable to create new native thread" (in Jenkins docker plugin leading to remaining Docker container)
            • (actually yesterday occurred another weird problem that I have not experienced before, namely a hanging pipeline; but I just killed it instead of trying to get a java stacktrace, so I don't really know if it is related to this issue here)
            • interestingly I updated to this version already on 2019-08-02
              • but I updated Jenkins (core) yesterday (2019-08-20) late afternoon to 2.190 (from 2.189); and a few minutes later also two Jenkins Plugins: "pubsub-light" to 1.13 (from 1.12); and presumably very innocent "github-branch-source"

            Devin Nusbaum/Olivier Lamy => What is the current (workaround) recommendation?

            I have now (cleaned up) and restarted Jenkins => not a single thread with name "EventDispatcher.retryProcessor" is initially around

            Show
            reinholdfuereder Reinhold Füreder added a comment - - edited I am pretty sure to have stumbled over this issue as well (using sse-gateway v1.19) today: there are sometimes more than 1000 threads with name "EventDispatcher.retryProcessor", then again "just" ~250 and one pipeline build (on Jenkins master – yes, I know this is an antipattern) failed with "unable to create new native thread" (in Jenkins docker plugin leading to remaining Docker container) (actually yesterday occurred another weird problem that I have not experienced before, namely a hanging pipeline; but I just killed it instead of trying to get a java stacktrace, so I don't really know if it is related to this issue here) interestingly I updated to this version already on 2019-08-02 but I updated Jenkins (core) yesterday (2019-08-20) late afternoon to 2.190 (from 2.189); and a few minutes later also two Jenkins Plugins: "pubsub-light" to 1.13 (from 1.12); and presumably very innocent "github-branch-source" Devin Nusbaum / Olivier Lamy => What is the current (workaround) recommendation? trying to install the latest snapshot from https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/ ? I have now (cleaned up) and restarted Jenkins => not a single thread with name "EventDispatcher.retryProcessor" is initially around
            Hide
            djviking Sverre Moe added a comment -

            Cleaned up, how?
            We also have experienced this problem that past few months.
            Recently we upgraded our Jenkins server Linux distribution from SLES12.3 to SLES15.0. This was done by creating a new VM and copied over the JENKINS_HOME. It has been running for 4 days now and we have not gotten the OutOfMemoryError. More time will tell if it is truly stable again.

            Show
            djviking Sverre Moe added a comment - Cleaned up, how? We also have experienced this problem that past few months. Recently we upgraded our Jenkins server Linux distribution from SLES12.3 to SLES15.0. This was done by creating a new VM and copied over the JENKINS_HOME. It has been running for 4 days now and we have not gotten the OutOfMemoryError. More time will tell if it is truly stable again.
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Sverre Moe Ad "cleaned up": Sorry for my lack of clarity => due to the OOM there were remaining running Docker Containers that I had to stop and remove; and it also appeared that about 2xx "EventDispatcher.retryProcessor" threads were remaining inside Jenkins Java process...

            Other notes (~15h after Jenkins master restart):

            • This morning no "EventDispatcher.retryProcessor" threads are there (or remaining) yet.
            • Interestingly there are 25 "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep" threads, even though just one pipeline build is running (without any parallelism)
            Show
            reinholdfuereder Reinhold Füreder added a comment - Sverre Moe Ad "cleaned up": Sorry for my lack of clarity => due to the OOM there were remaining running Docker Containers that I had to stop and remove; and it also appeared that about 2xx "EventDispatcher.retryProcessor" threads were remaining inside Jenkins Java process... Other notes (~15h after Jenkins master restart): This morning no "EventDispatcher.retryProcessor" threads are there (or remaining) yet. Interestingly there are 25 "org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep" threads, even though just one pipeline build is running (without any parallelism)
            Hide
            olamy Olivier Lamy added a comment -

            Yes please try last snapshot from  https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/

             Reinhold Füreder Sverre Moe  please tell us what is the result 

            We are close to merge the PR just need to another change and test it with BlueOcean full build.

            Thanks for your patience (and your report  )

             

            Show
            olamy Olivier Lamy added a comment - Yes please try last snapshot from   https://repo.jenkins-ci.org/snapshots/org/jenkins-ci/plugins/sse-gateway/1.20-SNAPSHOT/   Reinhold Füreder Sverre Moe   please tell us what is the result  We are close to merge the PR just need to another change and test it with BlueOcean full build. Thanks for your patience (and your report  )  
            Hide
            efroemling Eric Froemling added a comment -

            Just chiming in here:

            I've been running into this issue for a few weeks on my old Mac I've got set up as a CI server; at some point after using the Blue Ocean UI I'd start seeing the thread count on Jenkins rise rapidly from its usual ~70ish into the thousands and then I'd start getting 'could not create native thread' errors and other strange side-effects.  I had been avoiding Blue Ocean for the last week or so due to this.

            I went ahead and tried the sse-gateway 1.20-SNAPSHOT posted yesterday and went back to using Blue Ocean and am happy to report I've not observed the thread leak since.  I'll keep my eye on it but so far things are looking good.  Thanks everyone who has been looking into this!

            Show
            efroemling Eric Froemling added a comment - Just chiming in here: I've been running into this issue for a few weeks on my old Mac I've got set up as a CI server; at some point after using the Blue Ocean UI I'd start seeing the thread count on Jenkins rise rapidly from its usual ~70ish into the thousands and then I'd start getting 'could not create native thread' errors and other strange side-effects.  I had been avoiding Blue Ocean for the last week or so due to this. I went ahead and tried the sse-gateway 1.20-SNAPSHOT posted yesterday and went back to using Blue Ocean and am happy to report I've not observed the thread leak since.  I'll keep my eye on it but so far things are looking good.  Thanks everyone who has been looking into this!
            Hide
            olamy Olivier Lamy added a comment -

            1.20 version has been released

            Show
            olamy Olivier Lamy added a comment - 1.20 version has been released
            olamy Olivier Lamy made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            reinholdfuereder Reinhold Füreder added a comment -

            Olivier Lamy Sorry for my delayed feedback: AFAIK (no user complaints in my company – at least none that I know of...) the snapshot build of sse-gateway worked fine; and on Jenkins server neither remaining "EventDispatcher.retryProcessor" threads, nor thousands of them at the same time. => Looks good

            Show
            reinholdfuereder Reinhold Füreder added a comment - Olivier Lamy Sorry for my delayed feedback: AFAIK (no user complaints in my company – at least none that I know of...) the snapshot build of sse-gateway worked fine; and on Jenkins server neither remaining "EventDispatcher.retryProcessor" threads, nor thousands of them at the same time. => Looks good
            Hide
            2bluesc Kyle Manna added a comment -

            Updated on my server, thanks for all the hard work!

            Show
            2bluesc Kyle Manna added a comment - Updated on my server, thanks for all the hard work!
            Hide
            sorenfriis Søren Friis added a comment -

            There seems to be a problem with Showing Blue Ocean in MS Edge after this update.
            https://issues.jenkins-ci.org/browse/JENKINS-59291

             

            Show
            sorenfriis Søren Friis added a comment - There seems to be a problem with Showing Blue Ocean in MS Edge after this update. https://issues.jenkins-ci.org/browse/JENKINS-59291  
            prsingh Pradeep Singh made changes -
            Attachment Jenkins_thread.jpg [ 50639 ]
            Hide
            prsingh Pradeep Singh added a comment -

            Thread count accelerated to sky high (~22k) after upgrading Jenkins from 2.17 to 2.190.3.

            We had analysed thread dump which didn't help us to find the solution although, it showed couple of thread in blocked state.

            After upgrading Jenkins to LTS 2.204.4 solved this issue.

            Show
            prsingh Pradeep Singh added a comment - Thread count accelerated to sky high (~22k) after upgrading Jenkins from 2.17 to 2.190.3. We had analysed thread dump which didn't help us to find the solution although, it showed couple of thread in blocked state. After upgrading Jenkins to LTS  2.204.4  solved this issue.
            kshultz Karl Shultz made changes -
            Assignee Olivier Lamy [ olamy ] Karl Shultz [ kshultz ]
            kshultz Karl Shultz made changes -
            Assignee Karl Shultz [ kshultz ] Olivier Lamy [ olamy ]
            Hide
            kshultz Karl Shultz added a comment -

            I apologize for the noise just now, I must have hit a hotkey that assigned this to me. I've switched it back.

            Show
            kshultz Karl Shultz added a comment - I apologize for the noise just now, I must have hit a hotkey that assigned this to me. I've switched it back.

              People

              Assignee:
              olamy Olivier Lamy
              Reporter:
              2bluesc Kyle Manna
              Votes:
              9 Vote for this issue
              Watchers:
              18 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: