Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63840

Extreme CPU utilization leading to bad performance and failing jobs

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Jenkins consumes vast amounts of CPU resources. E.g. on one machine during 103 days uptime htop reports 7387 CPU hours, i.e. in the mean 3 out of 8 cores are 100% utilzed during uptime. Actually it seems to be worse:

        1  [||||||||||||||||||||||||||||||||||||||||||||           71.7%]   5  [|||||||||||||||||||||||||||||||||||||||||||||||||||    84.0%]
        2  [|||||||||||||||||||||||||||||||||||||||||||||||||      79.6%]   6  [|||||||||||||||||||||||||||||||||||||||||||            69.4%]
        3  [|||||||||||||||||||||||||||||||||||||||||||||||        76.6%]   7  [|||||||||||||||||||||||||||||||||||||||||||||||||||||| 88.7%]
        4  [||||||||||||||||||||||||||||||||||||||||||||||||||||   85.2%]   8  [|||||||||||||||||||||||||||||||||||||||||||||||||||    82.4%]
        Mem[|||||||||||||||||||||||||||||||||||||||||||||||||4.03G/7.79G]   Tasks: 48; 6 running
        Swp[|                                                 253M/16.0G]   Load average: 8.30 8.02 8.28
                                                                            Uptime: 103 days(!), 20:54:02
      
      603 jenkins    20   0 7779M 2524M  5564 S 421. 31.6    7387h ├─ /java/jdk8/bin/java -XX:ErrorFile=/.mntpts/sdb1/jenkins/hs_err_pidjenkins...
      

      A possible cause of this might be {{AsyncResourceDisposer.worker}}s that run into errors due to insufficient file permissions for the user (jenkins) that runs the Jenkins daemon. It seems that these workers are unable to recover and skip the offending files and directories.

      In our specific case the files have been created by Docker containers, but it may have other causes as well, so it's probably a quite common source of problems.

      As a sidenote, I did not find the error in Jenkin's log files, which made it difficult to track down possible reasons for the bad Jenkins performance and Jobs that failed for no good reasons.

      Name: AsyncResourceDisposer.worker [#9]
      State: RUNNABLE
      Total blocked: 33  Total waited: 11.060
      
      Stack trace: 
      java.lang.Throwable.fillInStackTrace(Native Method)
      java.lang.Throwable.fillInStackTrace(Throwable.java:783)
         - locked java.nio.file.AccessDeniedException@45ff02ae
      java.lang.Throwable.<init>(Throwable.java:265)
      java.lang.Exception.<init>(Exception.java:66)
      java.io.IOException.<init>(IOException.java:58)
      java.nio.file.FileSystemException.<init>(FileSystemException.java:73)
      java.nio.file.AccessDeniedException.<init>(AccessDeniedException.java:66)
      sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
      sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
      sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
      sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244)
      sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108)
      java.nio.file.Files.deleteIfExists(Files.java:1165)
      jenkins.util.io.PathRemover.removeOrMakeRemovableThenRemove(PathRemover.java:237)
      jenkins.util.io.PathRemover.tryRemoveFile(PathRemover.java:205)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:216)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.tryRemoveDirectoryContents(PathRemover.java:226)
      jenkins.util.io.PathRemover.tryRemoveRecursive(PathRemover.java:215)
      jenkins.util.io.PathRemover.forceRemoveRecursive(PathRemover.java:96)
      hudson.Util.deleteRecursive(Util.java:296)
      hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1310)
      hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1306)
      hudson.FilePath.act(FilePath.java:1076)
      hudson.FilePath.act(FilePath.java:1059)
      hudson.FilePath.deleteRecursive(FilePath.java:1304)
      hudson.plugins.ws_cleanup.Wipeout.performDelete(Wipeout.java:82)
      hudson.plugins.ws_cleanup.Wipeout$DisposableImpl.dispose(Wipeout.java:114)
      org.jenkinsci.plugins.resourcedisposer.AsyncResourceDisposer$WorkItem.run(AsyncResourceDisposer.java:283)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      java.util.concurrent.FutureTask.run(FutureTask.java:266)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      java.lang.Thread.run(Thread.java:748)
      

        Attachments

          Activity

          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Hi thanks for the report!

          > A possible cause of this might be {{AsyncResourceDisposer.worker}}s that run into errors due to insufficient file permissions for the user (jenkins) that runs the Jenkins daemon. It seems that these workers are unable to recover and skip the offending files and directories.

          It is plausible, but it needs more investigation and analysis. One particular stack trace might be a red herring. If you have a test instance where you could reproduce the behavior, connecting a profiler would be a good step to collect the details (e.g. Java Flight Recorded). Tools like jvmtop or Monitoring plugin could also help to collect some per-thread diagnostics information from the instance.

           

          > As a sidenote, I did not find the error in Jenkin's log files, which made it difficult to track down possible reasons

          it is possible to go through the code in the stacktrace and discover what exactly suppresses it 

          Show
          oleg_nenashev Oleg Nenashev added a comment - Hi thanks for the report! > A possible cause of this might be {{AsyncResourceDisposer.worker}}s that run into errors due to insufficient file permissions for the user (jenkins) that runs the Jenkins daemon. It seems that these workers are unable to recover and skip the offending files and directories. It is plausible, but it needs more investigation and analysis. One particular stack trace might be a red herring. If you have a test instance where you could reproduce the behavior, connecting a profiler would be a good step to collect the details (e.g. Java Flight Recorded). Tools like jvmtop or Monitoring plugin could also help to collect some per-thread diagnostics information from the instance.   > As a sidenote, I did not find the error in Jenkin's log files, which made it difficult to track down possible reasons it is possible to go through the code in the stacktrace and discover what exactly suppresses it 

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            alexanderv Alexander Veit
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: