Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21422

Jenkins crashing due to out of memory when rebuilding jobs

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • build-pipeline-plugin, core
    • None
    • linux

      After an upgrade to version 1.546 Jenkins became very unstable crashing 2-4 times per day. From what I can tell it is because of missing junitResult.xml files when jenkins tries to load the test trend.

      Attached is a snippet from the log where you can see the FileNotFoundException for the junitResult.xml file followed by a StackOverFlow error and then an out of memory error.

      Also attached is a screen shot from a heap analysis showing that the request handler thread building the test trend is holding a considerable amount of the heap. Please note that this particular heap analysis is from a different crash than the log snippet. I did not analyze the heap on the crash shown in the log, but I am sure it would show the test trend for the same job that throw the StackOverFlow.

          [JENKINS-21422] Jenkins crashing due to out of memory when rebuilding jobs

          John Carlile added a comment - - edited

          Attached is a different screen shot from the heap dump analysis that provides a little more context. The heap dump is 7GB making it difficult to share.

          John Carlile added a comment - - edited Attached is a different screen shot from the heap dump analysis that provides a little more context. The heap dump is 7GB making it difficult to share.

          John Carlile added a comment - - edited

          I have been trying to get to the bottom of this, but not having much luck yet. The one thing that is consistent with all occurrences of this crash is there is a series of FileNotFoundExceptions logged for one of the junitResult.xml files in a build (we only keep about a week's worth of builds). Immediately following the FileNotFound is the StackOverflow, which ultimately leads to OOM. To make matters more confusing I do see the junitResult.xml on disk that is shown in the FileNotFoundException and once Jenkins is restarted the job seems fine.

          I spent a little time spelunking in the jenkins code. Also wrote myself a little junit test to see if I could reproduce. No luck there yet. Any ideas on how to narrow down the root cause would be helpful.

          John Carlile added a comment - - edited I have been trying to get to the bottom of this, but not having much luck yet. The one thing that is consistent with all occurrences of this crash is there is a series of FileNotFoundExceptions logged for one of the junitResult.xml files in a build (we only keep about a week's worth of builds). Immediately following the FileNotFound is the StackOverflow, which ultimately leads to OOM. To make matters more confusing I do see the junitResult.xml on disk that is shown in the FileNotFoundException and once Jenkins is restarted the job seems fine. I spent a little time spelunking in the jenkins code. Also wrote myself a little junit test to see if I could reproduce. No luck there yet. Any ideas on how to narrow down the root cause would be helpful.

          John Carlile added a comment - - edited

          I just updated the description of this. After some more investigation it seems like the StackOverFlow may be a byproduct of jenkins running out of heap.

          I was able to produce this. Looks like it is related to rebuilding a job from the build-pipeline view. Here are the steps to reproduce:

          1) run a job
          2) from build-pipeline-plugin view hit rebuild button

          Result: OOM

          Note this did not happen on 1.536. Must be some incompatibility in the newer version.

          We have a 1.3.x variety of the build pipeline plugin

          John Carlile added a comment - - edited I just updated the description of this. After some more investigation it seems like the StackOverFlow may be a byproduct of jenkins running out of heap. I was able to produce this. Looks like it is related to rebuilding a job from the build-pipeline view. Here are the steps to reproduce: 1) run a job 2) from build-pipeline-plugin view hit rebuild button Result: OOM Note this did not happen on 1.536. Must be some incompatibility in the newer version. We have a 1.3.x variety of the build pipeline plugin

          Ladislav Toldy added a comment - - edited

          Same bug on 1.532.2
          Maven Project Plugin 2.0.3
          Build Pipeline Plugin 1.3.3

          Happens while viewing maven jobs(rendering test trend graph) via build-pipeline view.

          Result: OOM and restart is needed

          Ladislav Toldy added a comment - - edited Same bug on 1.532.2 Maven Project Plugin 2.0.3 Build Pipeline Plugin 1.3.3 Happens while viewing maven jobs(rendering test trend graph) via build-pipeline view. Result: OOM and restart is needed

          Daniel Beck added a comment - - edited

          After an upgrade to version 1.546 Jenkins became very unstable

          Which is the last (non LTS) version this did not occur with? 1.545? Does this still occur in recent Jenkins versions?

          Daniel Beck added a comment - - edited After an upgrade to version 1.546 Jenkins became very unstable Which is the last (non LTS) version this did not occur with? 1.545? Does this still occur in recent Jenkins versions?

          Daniel Beck added a comment -

          May have been caused by fcdf74991226fb6869052caa89ed8d678944b6fc if introduced in 1.545.

          Daniel Beck added a comment - May have been caused by fcdf74991226fb6869052caa89ed8d678944b6fc if introduced in 1.545.

          Yes it occurs also in 1.580.3

          Ladislav Toldy added a comment - Yes it occurs also in 1.580.3

          Clint Parham added a comment -

          I'm also running into "OutOfMemoryError: Java heap space" errors using the Build Pipeline View. Jenkins was fine running our jobs using ~130MB of heap. But since adding the Build Pipeline plugin we see heap memory spike to over 1.5GB when opening a single job page belonging to a pipeline. As soon as we disabled the Pipeline plugin, we could open the same job page and saw no increase in heap usage.

          Running Jenkins 1.602 and Build Pipline 1.4.7

          Partial stacktrace:
          Jun 2, 2015 1:39:25 PM org.eclipse.jetty.util.log.JavaUtilLog warn
          WARNING: Error while serving http://192.168.2.85:8081/job/Pipeline_MAT_Build/test/trendMap
          java.lang.reflect.InvocationTargetException
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:298)
          at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:161)
          ...
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
          at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:662)
          Caused by: java.lang.OutOfMemoryError: Java heap space

          Clint Parham added a comment - I'm also running into "OutOfMemoryError: Java heap space" errors using the Build Pipeline View. Jenkins was fine running our jobs using ~130MB of heap. But since adding the Build Pipeline plugin we see heap memory spike to over 1.5GB when opening a single job page belonging to a pipeline. As soon as we disabled the Pipeline plugin, we could open the same job page and saw no increase in heap usage. Running Jenkins 1.602 and Build Pipline 1.4.7 Partial stacktrace: Jun 2, 2015 1:39:25 PM org.eclipse.jetty.util.log.JavaUtilLog warn WARNING: Error while serving http://192.168.2.85:8081/job/Pipeline_MAT_Build/test/trendMap java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:298) at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:161) ... at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.OutOfMemoryError: Java heap space

          Daniel Beck added a comment -

          My guess would be the excessive serialization in Build Pipeline's build cause. The affected job's build.xml files would be interesting.

          Daniel Beck added a comment - My guess would be the excessive serialization in Build Pipeline's build cause. The affected job's build.xml files would be interesting.

          Henrik Kirk added a comment -

          Have the same problem. Attached is the build.xml file. Hope it helps in some way.

          <project>
          <actions/>
          <description/>
          <keepDependencies>false</keepDependencies>
          <properties>
          <com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty plugin="naginator@1.16">
          <optOut>false</optOut>
          </com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty>
          <com.gmail.ikeike443.PlayAutoTestJobProperty plugin="play-autotest-plugin@0.0.12"/>
          <jenkins.plugins.slack.SlackNotifier_-SlackJobProperty plugin="slack@1.8">
          <teamDomain>busywait</teamDomain>
          <token>p9kF0Mugbid5lGNqossRRoIm</token>
          <room/>
          <startNotification>false</startNotification>
          <notifySuccess>false</notifySuccess>
          <notifyAborted>false</notifyAborted>
          <notifyNotBuilt>false</notifyNotBuilt>
          <notifyUnstable>false</notifyUnstable>
          <notifyFailure>true</notifyFailure>
          <notifyBackToNormal>true</notifyBackToNormal>
          <notifyRepeatedFailure>false</notifyRepeatedFailure>
          <includeTestSummary>true</includeTestSummary>
          <showCommitList>false</showCommitList>
          <includeCustomMessage>false</includeCustomMessage>
          <customMessage/>
          </jenkins.plugins.slack.SlackNotifier_-SlackJobProperty>
          </properties>
          <scm class="hudson.plugins.git.GitSCM" plugin="git@2.4.0">
          <configVersion>2</configVersion>
          <userRemoteConfigs>
          <hudson.plugins.git.UserRemoteConfig>
          <url>git@10.0.0.1:henrik/project.git</url>
          <credentialsId>1013fa52-89a7-42d3-8007-ff92c65fb56a</credentialsId>
          </hudson.plugins.git.UserRemoteConfig>
          </userRemoteConfigs>
          <branches>
          <hudson.plugins.git.BranchSpec>
          <name>*/master</name>
          </hudson.plugins.git.BranchSpec>
          </branches>
          <doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
          <submoduleCfg class="list"/>
          <extensions/>
          </scm>
          <canRoam>true</canRoam>
          <disabled>false</disabled>
          <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
          <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
          <triggers>
          <hudson.triggers.SCMTrigger>
          <spec>H/5 * * * *</spec>
          <ignorePostCommitHooks>false</ignorePostCommitHooks>
          </hudson.triggers.SCMTrigger>
          </triggers>
          <concurrentBuild>false</concurrentBuild>
          <builders>
          <hudson.tasks.Shell>
          <command>
          cd web; /opt/activator-1.3.6-minimal/activator test;
          </command>
          </hudson.tasks.Shell>
          </builders>
          <publishers>
          <hudson.tasks.junit.JUnitResultArchiver plugin="junit@1.9">
          <testResults>web/target/test-reports/*.xml</testResults>
          <keepLongStdio>false</keepLongStdio>
          <healthScaleFactor>1.0</healthScaleFactor>
          </hudson.tasks.junit.JUnitResultArchiver>
          <com.chikli.hudson.plugin.naginator.NaginatorPublisher plugin="naginator@1.16">
          <regexpForRerun/>
          <rerunIfUnstable>true</rerunIfUnstable>
          <rerunMatrixPart>false</rerunMatrixPart>
          <checkRegexp>false</checkRegexp>
          <regexpForMatrixParent>false</regexpForMatrixParent>
          <delay class="com.chikli.hudson.plugin.naginator.FixedDelay">
          <delay>30</delay>
          </delay>
          <maxSchedule>1</maxSchedule>
          </com.chikli.hudson.plugin.naginator.NaginatorPublisher>
          </publishers>
          <buildWrappers>
          <hudson.plugins.build__timeout.BuildTimeoutWrapper plugin="build-timeout@1.15">
          <strategy class="hudson.plugins.build_timeout.impl.AbsoluteTimeOutStrategy">
          <timeoutMinutes>10</timeoutMinutes>
          </strategy>
          <operationList>
          <hudson.plugins.build__timeout.operations.FailOperation/>
          </operationList>
          </hudson.plugins.build__timeout.BuildTimeoutWrapper>
          </buildWrappers>
          </project>

          For Im 99% sure it is only happening when rebuilding after a failed build.

          Henrik Kirk added a comment - Have the same problem. Attached is the build.xml file. Hope it helps in some way. <project> <actions/> <description/> <keepDependencies>false</keepDependencies> <properties> <com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty plugin="naginator@1.16"> <optOut>false</optOut> </com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty> <com.gmail.ikeike443.PlayAutoTestJobProperty plugin="play-autotest-plugin@0.0.12"/> <jenkins.plugins.slack.SlackNotifier_-SlackJobProperty plugin="slack@1.8"> <teamDomain>busywait</teamDomain> <token>p9kF0Mugbid5lGNqossRRoIm</token> <room/> <startNotification>false</startNotification> <notifySuccess>false</notifySuccess> <notifyAborted>false</notifyAborted> <notifyNotBuilt>false</notifyNotBuilt> <notifyUnstable>false</notifyUnstable> <notifyFailure>true</notifyFailure> <notifyBackToNormal>true</notifyBackToNormal> <notifyRepeatedFailure>false</notifyRepeatedFailure> <includeTestSummary>true</includeTestSummary> <showCommitList>false</showCommitList> <includeCustomMessage>false</includeCustomMessage> <customMessage/> </jenkins.plugins.slack.SlackNotifier_-SlackJobProperty> </properties> <scm class="hudson.plugins.git.GitSCM" plugin="git@2.4.0"> <configVersion>2</configVersion> <userRemoteConfigs> <hudson.plugins.git.UserRemoteConfig> <url>git@10.0.0.1:henrik/project.git</url> <credentialsId>1013fa52-89a7-42d3-8007-ff92c65fb56a</credentialsId> </hudson.plugins.git.UserRemoteConfig> </userRemoteConfigs> <branches> <hudson.plugins.git.BranchSpec> <name>*/master</name> </hudson.plugins.git.BranchSpec> </branches> <doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations> <submoduleCfg class="list"/> <extensions/> </scm> <canRoam>true</canRoam> <disabled>false</disabled> <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding> <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding> <triggers> <hudson.triggers.SCMTrigger> <spec>H/5 * * * *</spec> <ignorePostCommitHooks>false</ignorePostCommitHooks> </hudson.triggers.SCMTrigger> </triggers> <concurrentBuild>false</concurrentBuild> <builders> <hudson.tasks.Shell> <command> cd web; /opt/activator-1.3.6-minimal/activator test; </command> </hudson.tasks.Shell> </builders> <publishers> <hudson.tasks.junit.JUnitResultArchiver plugin="junit@1.9"> <testResults>web/target/test-reports/*.xml</testResults> <keepLongStdio>false</keepLongStdio> <healthScaleFactor>1.0</healthScaleFactor> </hudson.tasks.junit.JUnitResultArchiver> <com.chikli.hudson.plugin.naginator.NaginatorPublisher plugin="naginator@1.16"> <regexpForRerun/> <rerunIfUnstable>true</rerunIfUnstable> <rerunMatrixPart>false</rerunMatrixPart> <checkRegexp>false</checkRegexp> <regexpForMatrixParent>false</regexpForMatrixParent> <delay class="com.chikli.hudson.plugin.naginator.FixedDelay"> <delay>30</delay> </delay> <maxSchedule>1</maxSchedule> </com.chikli.hudson.plugin.naginator.NaginatorPublisher> </publishers> <buildWrappers> <hudson.plugins.build__timeout.BuildTimeoutWrapper plugin="build-timeout@1.15"> <strategy class="hudson.plugins.build_timeout.impl.AbsoluteTimeOutStrategy"> <timeoutMinutes>10</timeoutMinutes> </strategy> <operationList> <hudson.plugins.build__timeout.operations.FailOperation/> </operationList> </hudson.plugins.build__timeout.BuildTimeoutWrapper> </buildWrappers> </project> For Im 99% sure it is only happening when rebuilding after a failed build.

          Henrik Kirk added a comment -

          This also deletes the "Test Trends"

          Henrik Kirk added a comment - This also deletes the "Test Trends"

          Matthew Weiss added a comment -

          Hi is there any update on this ticket? I am running into the same issue using Jenkins version 1.642.2 and the Multijob plugin version 1.20 and believe I am getting the same behavior. After rebuilding eventually the Jenkins instance crashes until I restart it at which point it will break eventually if I run the Multijob multiple times.

          Matthew Weiss added a comment - Hi is there any update on this ticket? I am running into the same issue using Jenkins version 1.642.2 and the Multijob plugin version 1.20 and believe I am getting the same behavior. After rebuilding eventually the Jenkins instance crashes until I restart it at which point it will break eventually if I run the Multijob multiple times.

          Daniel Beck added a comment -

          henrikkirk That's a config.xml, not a build.xml.

          Daniel Beck added a comment - henrikkirk That's a config.xml, not a build.xml.

          Marc Popp added a comment -

          As a temp. workaround: We increased the JavaVM's Memory to 2G and did not run in the problem very often anymore.

          Marc Popp added a comment - As a temp. workaround: We increased the JavaVM's Memory to 2G and did not run in the problem very often anymore.

          Dan Alvizu added a comment -

          I do not have an update on this - are you sure this is the correct ticket matthew? this is specific to the build pipeline plugin.

          Dan Alvizu added a comment - I do not have an update on this - are you sure this is the correct ticket matthew? this is specific to the build pipeline plugin.

          Matthew Weiss added a comment -

          dalvizu sorry, to be honest I'm not 100% sure, but it seems eerily similar. I'm going to watch my master's memory as the job runs tonight and see if it's a similar memory problem.

          Matthew Weiss added a comment - dalvizu sorry, to be honest I'm not 100% sure, but it seems eerily similar. I'm going to watch my master's memory as the job runs tonight and see if it's a similar memory problem.

          Dan Alvizu added a comment -

          The issue in this ticket happens if you have a build pipeline windows open for a long period while status is refreshed - previous snippets are serialized to the session and they can't be freed, eventually causing OOM. There isn't an easy fix - either finding a way to free them (which is deep in stapler i believe) or use a different UI technology are not quick or easy options.

          Dan Alvizu added a comment - The issue in this ticket happens if you have a build pipeline windows open for a long period while status is refreshed - previous snippets are serialized to the session and they can't be freed, eventually causing OOM. There isn't an easy fix - either finding a way to free them (which is deep in stapler i believe) or use a different UI technology are not quick or easy options.

            dalvizu Dan Alvizu
            jocarli John Carlile
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: