Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-17349

CPU on master node keeps increasing and never come down

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • all-changes-plugin

      We are using Jenkins ver. 1.480.3, upgraded on Mar 1st.
      During the past month, the CPU usage continuously creasing...
      The 1st week: <5%
      The 2ed week: ~=17%
      The 3ed week: ~= 25%,
      And today it increasd to ~= 44%!!!

        1. CPU.png
          CPU.png
          18 kB
        2. CPU incease pic.jpg
          CPU incease pic.jpg
          43 kB
        3. JavaMelody__CNRDGPS_3_26_13 (1).pdf
          805 kB
        4. JavaMelody__CNRDGPS_4_8_13.pdf
          813 kB
        5. percent http errors.png
          percent http errors.png
          23 kB

          [JENKINS-17349] CPU on master node keeps increasing and never come down

          evernat added a comment - - edited

          I have joined 2 interesting graphics from the PDF: % CPU for 1 month and % http errors for 1 month.

          It is strange to me that % CPU increases suddenly, about each week. (Note: this graphic is the %CPU of the Jenkins process, and not of the OS.)
          And in the second graph, it seems that you have many http errors lately.

          From the joined PDF:
          It can be seen in the http statistics, that you have some downloads of very large artifacts (100 to 400 Mb) taking a long time.
          And most importantly, in the "Detailed system information" from the PDF, it says:
          Perm gen memory : 80 Mb / 82 Mb, which is quite full.

          I suggest first to add a jvm parameter, probably in JENKINS_HOME/jenkins.xml: -XX:MaxPermSize=256m
          And restart Jenkins.

          evernat added a comment - - edited I have joined 2 interesting graphics from the PDF: % CPU for 1 month and % http errors for 1 month. It is strange to me that % CPU increases suddenly, about each week. (Note: this graphic is the %CPU of the Jenkins process, and not of the OS.) And in the second graph, it seems that you have many http errors lately. From the joined PDF: It can be seen in the http statistics, that you have some downloads of very large artifacts (100 to 400 Mb) taking a long time. And most importantly, in the "Detailed system information" from the PDF, it says: Perm gen memory : 80 Mb / 82 Mb, which is quite full. I suggest first to add a jvm parameter, probably in JENKINS_HOME/jenkins.xml: -XX:MaxPermSize=256m And restart Jenkins.

          sharon xia added a comment -

          Good catch!
          Today I found another thing: There is a slave node(used to do our daily build), in which we have many branch build jobs connected to it.

          I found the sudden increases were caused when a branch build started, and the CPU didn't decreased even if the branch build finished. However, during this week we might have >20 branch builds and only 1 of 2 builds caused the cpu sudden increase.

          If I disconnected the slave build node, then the cup will decrease from the plug-in, althought from task manager the java.exe still keeps high. And after I reconnect the slave build node back, the cpu went back to high.

          sharon xia added a comment - Good catch! Today I found another thing: There is a slave node(used to do our daily build), in which we have many branch build jobs connected to it. I found the sudden increases were caused when a branch build started, and the CPU didn't decreased even if the branch build finished. However, during this week we might have >20 branch builds and only 1 of 2 builds caused the cpu sudden increase. If I disconnected the slave build node, then the cup will decrease from the plug-in, althought from task manager the java.exe still keeps high. And after I reconnect the slave build node back, the cpu went back to high.

          sharon xia added a comment -

          Perm gen memory : 80 Mb / 82 Mb, which is quite full.
          This should not be the direct reason that CPU incresed, correct? I checked it will changes from 80Mb to 81Mb, 82Mb and turned to be 80Mb again...

          sharon xia added a comment - Perm gen memory : 80 Mb / 82 Mb, which is quite full. This should not be the direct reason that CPU incresed, correct? I checked it will changes from 80Mb to 81Mb, 82Mb and turned to be 80Mb again...

          evernat added a comment -

          The limited perm gen may cause some cascading problems, for example in GC and in class loading, and perhaps introduce an overhead in CPU.

          If your large artifacts are copied between node and master or sent from master to browsers, this may also be a cause of high cpu usage. In fact, there are some other Jenkins issues, with some threads currently executing "java.util.zip.Deflater.deflateBytes", like your threads.

          The recent fix of JENKINS-7813, in the future Jenkins v1.509, may help perhaps for communication between node and master.

          evernat added a comment - The limited perm gen may cause some cascading problems, for example in GC and in class loading, and perhaps introduce an overhead in CPU. If your large artifacts are copied between node and master or sent from master to browsers, this may also be a cause of high cpu usage. In fact, there are some other Jenkins issues, with some threads currently executing "java.util.zip.Deflater.deflateBytes", like your threads. The recent fix of JENKINS-7813 , in the future Jenkins v1.509, may help perhaps for communication between node and master.

          sharon xia added a comment -

          Thanks evernat!

          I've modified the configuration file and restart the server(restart Jenkins service won't take effect), and the CPU goes back to <5%, let's wait for longer time to check whether it will address the CPU increase issue.

          JVM arguments: -Xrs
          -Xmx2048m
          -XX:MaxPermSize=512m
          -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle

          And the large artifact download are expected users' behavior. Currently the download speed is fast so I don't think there is an issue.

          sharon xia added a comment - Thanks evernat! I've modified the configuration file and restart the server(restart Jenkins service won't take effect), and the CPU goes back to <5%, let's wait for longer time to check whether it will address the CPU increase issue. JVM arguments: -Xrs -Xmx2048m -XX:MaxPermSize=512m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle And the large artifact download are expected users' behavior. Currently the download speed is fast so I don't think there is an issue.

          sharon xia added a comment -

          JavaMelody pdf after memory inceased.

          sharon xia added a comment - JavaMelody pdf after memory inceased.

          sharon xia added a comment -

          After the JVM arguments changed to -Xmx2048m -XX:MaxPermSize=512m, there is still a CPU increase after several days run.

          sharon xia added a comment - After the JVM arguments changed to -Xmx2048m -XX:MaxPermSize=512m, there is still a CPU increase after several days run.

          sharon xia added a comment -

          In the middle of the picture, the CPU decreased dramatically after a system restart to let new parameters take effect. However, after about one week, the cpu increased to ~= 6%.

          sharon xia added a comment - In the middle of the picture, the CPU decreased dramatically after a system restart to let new parameters take effect. However, after about one week, the cpu increased to ~= 6%.

          evernat added a comment -

          Ok for MaxPermSize.

          At the end of the first report (cpu = 31%), there were 4 "RequestHandler[...]" threads with very high cpu usage. They were executing "java.util.zip.Deflater.deflateBytes(Native Method)".
          At the end of the second report (cpu = 6%), there was 1 "RequestHandlerThread2944" thread with high cpu usage. It was executing "java.util.zip.Deflater.deflateBytes(Native Method)".

          If you look at the threads at the bottom of the report today, is "RequestHandlerThread2944" still there and executing "java.util.zip.Deflater.deflateBytes(Native Method)"?
          Are there other threads like this doing the same thing today?

          If yes, what are the stack-traces of these threads? You may click on "Dump threads as text" below the threads table.

          Note that you can kill those threads with the red button at the right of the table, it may decrease the cpu used. But that's not a long term solution.

          evernat added a comment - Ok for MaxPermSize. At the end of the first report (cpu = 31%), there were 4 "RequestHandler [...] " threads with very high cpu usage. They were executing "java.util.zip.Deflater.deflateBytes(Native Method)". At the end of the second report (cpu = 6%), there was 1 "RequestHandlerThread 2944 " thread with high cpu usage. It was executing "java.util.zip.Deflater.deflateBytes(Native Method)". If you look at the threads at the bottom of the report today, is "RequestHandlerThread 2944 " still there and executing "java.util.zip.Deflater.deflateBytes(Native Method)"? Are there other threads like this doing the same thing today? If yes, what are the stack-traces of these threads? You may click on "Dump threads as text" below the threads table. Note that you can kill those threads with the red button at the right of the table, it may decrease the cpu used. But that's not a long term solution.

          Michael Tharp added a comment -

          I'm seeing this on 1.511 but have not yet tried adjusting launch options. Each "step" is another thread consuming 100% CPU. The one thread I have stuck right now is also in java.util.zip.Deflater.deflateBytes.

          Michael Tharp added a comment - I'm seeing this on 1.511 but have not yet tried adjusting launch options. Each "step" is another thread consuming 100% CPU. The one thread I have stuck right now is also in java.util.zip.Deflater.deflateBytes.

          Roger Myung added a comment -

          We have this issue too.
          Two flags that we've tried unsuccessfully are
          --handlerCountMax=100 --handlerCountMaxIdle=20

          I'm curious if there is a performance difference between Winstone and Tomcat. I was planning on moving to Tomcat to try to separate the web server process from the core Jenkins process.

          Roger Myung added a comment - We have this issue too. Two flags that we've tried unsuccessfully are --handlerCountMax=100 --handlerCountMaxIdle=20 I'm curious if there is a performance difference between Winstone and Tomcat. I was planning on moving to Tomcat to try to separate the web server process from the core Jenkins process.

          Daniel Pinyol added a comment -

          We have this issue too with 1.514

          Daniel Pinyol added a comment - We have this issue too with 1.514

          I have this issue with 1.515 as well. Forgive my ignorance, but how is limiting the memory with "-Xmx2048m -XX:MaxPermSize=512m" going to limit CPU usage?

          Martin-Louis Bright added a comment - I have this issue with 1.515 as well. Forgive my ignorance, but how is limiting the memory with "-Xmx2048m -XX:MaxPermSize=512m" going to limit CPU usage?

          Yinghua Wang added a comment -

          I have the same issue on the latest 1.517. The master CPU usage keeps increasing to 400% until restart the Jenkins service - it drops to 95%.

          No such problem on 1.502 before I upgrade to 1.517.

          Yinghua Wang added a comment - I have the same issue on the latest 1.517. The master CPU usage keeps increasing to 400% until restart the Jenkins service - it drops to 95%. No such problem on 1.502 before I upgrade to 1.517.

          evernat added a comment -

          This issue may be related to the JENKINS-14362 issue, where Jesse Glick has submitted in the comments, a patched Jenkins 1.513 for anyone to test.

          See https://issues.jenkins-ci.org/browse/JENKINS-14362?focusedCommentId=179526&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-179526

          evernat added a comment - This issue may be related to the JENKINS-14362 issue, where Jesse Glick has submitted in the comments, a patched Jenkins 1.513 for anyone to test. See https://issues.jenkins-ci.org/browse/JENKINS-14362?focusedCommentId=179526&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-179526

          evernat added a comment -

          resolving this issue as fixed, since the related issue JENKINS-14362 was resolved as fixed (for Jenkins v1.520)

          evernat added a comment - resolving this issue as fixed, since the related issue JENKINS-14362 was resolved as fixed (for Jenkins v1.520)

            wolfs Stefan Wolf
            sharon_xia sharon xia
            Votes:
            6 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: