Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-63975

Jenkins controller failing with java.lang.OutOfMemoryError: GC overhead limit exceeded

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Hi,

      Since last few weeks we are facing issue with Jenkins master, it crashes every now and then with exception

      java.lang.OutOfMemoryError: GC overhead limit exceeded

      Below is what we found in logs, attached file "log_trace.txt" contains full exception stack trace.

      2020-10-14 15:02:48.604+0000 [id=13455] WARNING h.i.i.InstallUncaughtExceptionHandler#handleException org.apache.commons.jelly.JellyTagException: jar:file:/var/jenkins_home/war/WEB-INF/lib/jenkins-core-2.235.3.jar!/hudson/model/View/index.jelly:42:43: <st:include> org.apache.commons.jelly.JellyTagException: jar:file:/var/jenkins_home/war/WEB-INF/lib/jenkins-core-2.235.3.jar!/lib/hudson/projectViewRow.jelly:35:52: <st:include> java.lang.OutOfMemoryError: GC overhead limit exceeded     at org.apache.commons.jelly.impl.TagScript.handleException(TagScript.java:726)     at org.apache.commons.jelly.impl.TagScript.run(TagScript.java:281)     at org.apache.commons.jelly.impl.ScriptBlock.run(ScriptBlock.java:95)     at org.kohsuke.stapler.jelly.CallTagLibScript$1.run(CallTagLibScript.java:99)
      
      

      When created heap dump and histogram below stack trace is looks suspicious with large size. (attached the full histogram details)

      3: 1007784 32249088 java.lang.StackTraceElement

      Also attached the GC Root from dump file for same stack trace. We have observed same trace in all our slave dumpExportTable.

      Memory parameters configured for Jenkins "-Xmx4096m -XX:MaxPermSize=1024m"

      Is this a memory leak causing the failures ? During the same time our CPU usage also clocks as 100%, are they both related to each other ?

      Please note this Jenkins instance is running on a dedicated VM.

       

        Attachments

        1. gc.log
          5.04 MB
        2. heap_dump_gc_root.PNG
          heap_dump_gc_root.PNG
          171 kB
        3. heap_histo.txt
          1.31 MB
        4. log_trace.txt
          48 kB
        5. node_dump_export_table.txt
          1.05 MB

          Activity

          Hide
          rgaduput Reddysekhar Gaduputi added a comment -

          May be you are right its not really a bug but higher usage of Jenkins itself.

          But please note after above mentioned changes it works fine with same resources (even much higher load these days compared to earlier)

          I agree remoting stack traces in nodes is something need to be looked into.

           

          Show
          rgaduput Reddysekhar Gaduputi added a comment - May be you are right its not really a bug but higher usage of Jenkins itself. But please note after above mentioned changes it works fine with same resources (even much higher load these days compared to earlier) I agree remoting stack traces in nodes is something need to be looked into.  
          Hide
          raihaan Raihaan Shouhell added a comment - - edited

          This doesn't seem like a bug AFAICT, it seems like the usage of jenkins might have grown which caused its memory usage to have grown. The CPU usage is likely from constant GC cycles due to low memory available.

          The large usage of stacktraceelement seems to be comming from remoting.

          CC: Jeff Thompson, perhaps remoting should not capture stack traces by default as from my understanding its purely for debugging purposes, we could save quite a bit of memory here. WDYT?

          Sample implementation to disabling traces https://github.com/jenkinsci/remoting/pull/441

          Show
          raihaan Raihaan Shouhell added a comment - - edited This doesn't seem like a bug AFAICT, it seems like the usage of jenkins might have grown which caused its memory usage to have grown. The CPU usage is likely from constant GC cycles due to low memory available. The large usage of stacktraceelement seems to be comming from remoting. CC: Jeff Thompson , perhaps remoting should not capture stack traces by default as from my understanding its purely for debugging purposes, we could save quite a bit of memory here. WDYT? Sample implementation to disabling traces https://github.com/jenkinsci/remoting/pull/441
          Hide
          rgaduput Reddysekhar Gaduputi added a comment - - edited

          We have managed to fix this issue by tuning our Jenkins/pipelines, In case if any one faces same below might help 

          1) Run pipelines with Durability Level as Performance-Optimized, this will decrease the write operations done by Jenkins to save pipelines state. On other hand with this option Pipelines wont be able to resume after master restart (which is fine in our case)

          2) Loose couple the Jenkins nodes by setting the option to bring them online only when there is demand, otherwise keep them in offline mode (because we have observed many exceptions in node logs as attached)

          3) Restrict log size of the pipelines using logfilesizechecker plugin, log size directly impacts memory (some of our pipelines were generating GB's of logs some times which was abnormal, so configured logfilesizechecker to abort the pipeline after specific size of the log)

          Show
          rgaduput Reddysekhar Gaduputi added a comment - - edited We have managed to fix this issue by tuning our Jenkins/pipelines, In case if any one faces same below might help  1) Run pipelines with Durability Level as Performance-Optimized , this will decrease the write operations done by Jenkins to save pipelines state. On other hand with this option Pipelines wont be able to resume after master restart (which is fine in our case) 2) Loose couple the Jenkins nodes by setting the option to bring them online only when there is demand, otherwise keep them in offline mode (because we have observed many exceptions in node logs as attached) 3) Restrict log size of the pipelines using logfilesizechecker  plugin, log size directly impacts memory (some of our pipelines were generating GB's of logs some times which was abnormal, so configured logfilesizechecker to abort the pipeline after specific size of the log)
          Hide
          rgaduput Reddysekhar Gaduputi added a comment -

          Hi Oleg Nenashev,

          attached the gc log.

          Show
          rgaduput Reddysekhar Gaduputi added a comment - Hi Oleg Nenashev , attached the gc log.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          I would need a GC log to say for sure. Right now there is no evidence that there is a memory leak on Jenkins controller. It might be a legitimate system load

          Show
          oleg_nenashev Oleg Nenashev added a comment - I would need a GC log to say for sure. Right now there is no evidence that there is a memory leak on Jenkins controller. It might be a legitimate system load

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            rgaduput Reddysekhar Gaduputi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: