-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
linux
-
Powered by SuggestiMate
After an upgrade to version 1.546 Jenkins became very unstable crashing 2-4 times per day. From what I can tell it is because of missing junitResult.xml files when jenkins tries to load the test trend.
Attached is a snippet from the log where you can see the FileNotFoundException for the junitResult.xml file followed by a StackOverFlow error and then an out of memory error.
Also attached is a screen shot from a heap analysis showing that the request handler thread building the test trend is holding a considerable amount of the heap. Please note that this particular heap analysis is from a different crash than the log snippet. I did not analyze the heap on the crash shown in the log, but I am sure it would show the test trend for the same job that throw the StackOverFlow.
[JENKINS-21422] Jenkins crashing due to out of memory when rebuilding jobs
I have been trying to get to the bottom of this, but not having much luck yet. The one thing that is consistent with all occurrences of this crash is there is a series of FileNotFoundExceptions logged for one of the junitResult.xml files in a build (we only keep about a week's worth of builds). Immediately following the FileNotFound is the StackOverflow, which ultimately leads to OOM. To make matters more confusing I do see the junitResult.xml on disk that is shown in the FileNotFoundException and once Jenkins is restarted the job seems fine.
I spent a little time spelunking in the jenkins code. Also wrote myself a little junit test to see if I could reproduce. No luck there yet. Any ideas on how to narrow down the root cause would be helpful.
I just updated the description of this. After some more investigation it seems like the StackOverFlow may be a byproduct of jenkins running out of heap.
I was able to produce this. Looks like it is related to rebuilding a job from the build-pipeline view. Here are the steps to reproduce:
1) run a job
2) from build-pipeline-plugin view hit rebuild button
Result: OOM
Note this did not happen on 1.536. Must be some incompatibility in the newer version.
We have a 1.3.x variety of the build pipeline plugin
Same bug on 1.532.2
Maven Project Plugin 2.0.3
Build Pipeline Plugin 1.3.3
Happens while viewing maven jobs(rendering test trend graph) via build-pipeline view.
Result: OOM and restart is needed
After an upgrade to version 1.546 Jenkins became very unstable
Which is the last (non LTS) version this did not occur with? 1.545? Does this still occur in recent Jenkins versions?
May have been caused by fcdf74991226fb6869052caa89ed8d678944b6fc if introduced in 1.545.
I'm also running into "OutOfMemoryError: Java heap space" errors using the Build Pipeline View. Jenkins was fine running our jobs using ~130MB of heap. But since adding the Build Pipeline plugin we see heap memory spike to over 1.5GB when opening a single job page belonging to a pipeline. As soon as we disabled the Pipeline plugin, we could open the same job page and saw no increase in heap usage.
Running Jenkins 1.602 and Build Pipline 1.4.7
Partial stacktrace:
Jun 2, 2015 1:39:25 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING: Error while serving http://192.168.2.85:8081/job/Pipeline_MAT_Build/test/trendMap
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.kohsuke.stapler.Function$InstanceFunction.invoke(Function.java:298)
at org.kohsuke.stapler.Function.bindAndInvoke(Function.java:161)
...
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: Java heap space
My guess would be the excessive serialization in Build Pipeline's build cause. The affected job's build.xml files would be interesting.
Have the same problem. Attached is the build.xml file. Hope it helps in some way.
<project>
<actions/>
<description/>
<keepDependencies>false</keepDependencies>
<properties>
<com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty plugin="naginator@1.16">
<optOut>false</optOut>
</com.chikli.hudson.plugin.naginator.NaginatorOptOutProperty>
<com.gmail.ikeike443.PlayAutoTestJobProperty plugin="play-autotest-plugin@0.0.12"/>
<jenkins.plugins.slack.SlackNotifier_-SlackJobProperty plugin="slack@1.8">
<teamDomain>busywait</teamDomain>
<token>p9kF0Mugbid5lGNqossRRoIm</token>
<room/>
<startNotification>false</startNotification>
<notifySuccess>false</notifySuccess>
<notifyAborted>false</notifyAborted>
<notifyNotBuilt>false</notifyNotBuilt>
<notifyUnstable>false</notifyUnstable>
<notifyFailure>true</notifyFailure>
<notifyBackToNormal>true</notifyBackToNormal>
<notifyRepeatedFailure>false</notifyRepeatedFailure>
<includeTestSummary>true</includeTestSummary>
<showCommitList>false</showCommitList>
<includeCustomMessage>false</includeCustomMessage>
<customMessage/>
</jenkins.plugins.slack.SlackNotifier_-SlackJobProperty>
</properties>
<scm class="hudson.plugins.git.GitSCM" plugin="git@2.4.0">
<configVersion>2</configVersion>
<userRemoteConfigs>
<hudson.plugins.git.UserRemoteConfig>
<url>git@10.0.0.1:henrik/project.git</url>
<credentialsId>1013fa52-89a7-42d3-8007-ff92c65fb56a</credentialsId>
</hudson.plugins.git.UserRemoteConfig>
</userRemoteConfigs>
<branches>
<hudson.plugins.git.BranchSpec>
<name>*/master</name>
</hudson.plugins.git.BranchSpec>
</branches>
<doGenerateSubmoduleConfigurations>false</doGenerateSubmoduleConfigurations>
<submoduleCfg class="list"/>
<extensions/>
</scm>
<canRoam>true</canRoam>
<disabled>false</disabled>
<blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
<blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
<triggers>
<hudson.triggers.SCMTrigger>
<spec>H/5 * * * *</spec>
<ignorePostCommitHooks>false</ignorePostCommitHooks>
</hudson.triggers.SCMTrigger>
</triggers>
<concurrentBuild>false</concurrentBuild>
<builders>
<hudson.tasks.Shell>
<command>
cd web; /opt/activator-1.3.6-minimal/activator test;
</command>
</hudson.tasks.Shell>
</builders>
<publishers>
<hudson.tasks.junit.JUnitResultArchiver plugin="junit@1.9">
<testResults>web/target/test-reports/*.xml</testResults>
<keepLongStdio>false</keepLongStdio>
<healthScaleFactor>1.0</healthScaleFactor>
</hudson.tasks.junit.JUnitResultArchiver>
<com.chikli.hudson.plugin.naginator.NaginatorPublisher plugin="naginator@1.16">
<regexpForRerun/>
<rerunIfUnstable>true</rerunIfUnstable>
<rerunMatrixPart>false</rerunMatrixPart>
<checkRegexp>false</checkRegexp>
<regexpForMatrixParent>false</regexpForMatrixParent>
<delay class="com.chikli.hudson.plugin.naginator.FixedDelay">
<delay>30</delay>
</delay>
<maxSchedule>1</maxSchedule>
</com.chikli.hudson.plugin.naginator.NaginatorPublisher>
</publishers>
<buildWrappers>
<hudson.plugins.build__timeout.BuildTimeoutWrapper plugin="build-timeout@1.15">
<strategy class="hudson.plugins.build_timeout.impl.AbsoluteTimeOutStrategy">
<timeoutMinutes>10</timeoutMinutes>
</strategy>
<operationList>
<hudson.plugins.build__timeout.operations.FailOperation/>
</operationList>
</hudson.plugins.build__timeout.BuildTimeoutWrapper>
</buildWrappers>
</project>
For Im 99% sure it is only happening when rebuilding after a failed build.
Hi is there any update on this ticket? I am running into the same issue using Jenkins version 1.642.2 and the Multijob plugin version 1.20 and believe I am getting the same behavior. After rebuilding eventually the Jenkins instance crashes until I restart it at which point it will break eventually if I run the Multijob multiple times.
As a temp. workaround: We increased the JavaVM's Memory to 2G and did not run in the problem very often anymore.
I do not have an update on this - are you sure this is the correct ticket matthew? this is specific to the build pipeline plugin.
dalvizu sorry, to be honest I'm not 100% sure, but it seems eerily similar. I'm going to watch my master's memory as the job runs tonight and see if it's a similar memory problem.
The issue in this ticket happens if you have a build pipeline windows open for a long period while status is refreshed - previous snippets are serialized to the session and they can't be freed, eventually causing OOM. There isn't an easy fix - either finding a way to free them (which is deep in stapler i believe) or use a different UI technology are not quick or easy options.
Attached is a different screen shot from the heap dump analysis that provides a little more context. The heap dump is 7GB making it difficult to share.