-
Bug
-
Resolution: Fixed
-
Major
-
None
-
We're running Jenkins on a RHEL 5.2, 32 bits box, with Java 1.6.0_04. The build agent runs on a RHEL 5.4, 64 bits box, with Java 1.6.0_23.
-
Powered by SuggestiMate
Hi. We upgraded our old Hudson server to Jenkins 1.409.2 on Sep, 20th. Since then we're facing bigger build times, like twice the time it used to take with Hudson.
We're running Jenkins on a RHEL 5.2, 32 bits box, with Java 1.6.0_04. The build agent runs on a RHEL 5.4, 64 bits box, with Java 1.6.0_23.
I'm attaching the System_Information_Jenkins.html page which contains the configuration of our Jenkins server.
I'm also attaching the config.xml file of a particular job which shows the problem well.
I'm also attaching the etics-6.1.x-es_76_Console.html file with the Console output of a particular build. The output lines are prefixed with the time of day (via the timestamper plugin) so that we can see where most of the time was spent. As you can see, the build started at 10:47:17 and ended at 11:26:36, having taken 39 minutes and 19 seconds in the whole. However, as you can see at the end of the log, maven tells us that the build took only 11 minutes and 3 seconds. Most of the remaining 28 minutes and 16 seconds was spent between the lines prefixed with times 10:47:54 and 11:15:21.
I don't know what the Jenkins server or the agent were doing during this quiet period. I tried to see which processes were busy during this period and the only strange thing I noticed was that there was a process in the agent machine very busy, which was this:
/l/disk0/ipscate/tools/java/jdk1.6.0_23/bin/java -jar slave.jar
I straced it during a few seconds and saw that it was busy opening lots of files under the /l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd directory. In fact, in a short period of about 15 or 20 seconds it opened more than 40000 files, some of them being repeatedly opened thousands of times. These were the most often opened ones:
# sort /tmp/trace | uniq -c | sort -n | tail -5 3502 32319 open("/l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd/public/cpqd-public-bom/6.5.2/_maven.repositories", O_RDONLY) = 16 3502 32319 stat("/l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd/public/cpqd-public-bom/6.5.2/cpqd-public-bom-6.5.2.pom", {st_mode=S_IFREG|0644, st_size=45211, ...}) = 0 3793 32319 open("/l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd/parent/cpqd-super-parent/6.5.2/cpqd-super-parent-6.5.2.pom", O_RDONLY) = 16 3793 32319 stat("/l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd/parent/cpqd-super-parent/6.5.2/cpqd-super-parent-6.5.2.pom", {st_mode=S_IFREG|0644, st_size=14792, ...}) = 0 7004 32319 stat("/l/disk0/ipscate/ip/nexus/.m2/repository/br/com/cpqd/public/cpqd-public-bom/6.5.2/_maven.repositories", {st_mode=S_IFREG|0644, st_size=163, ...}) = 0
I'm not sure this behaviour is directly related with the delay I was seeing in the build, but it doesn't seem right to me.
Do you have any suggestion as to how can I go about investigating this?
- depends on
-
JENKINS-15935 Can't build using maven 3.1.0
-
- Resolved
-
- is related to
-
JENKINS-8390 POMs parsing fails in m2 projects which has a wrong inheritence (m3 constraint)
-
- Closed
-
[JENKINS-11362] Strange delay during a maven job build
I also noticed that the /l/disk0/ipscate/ip/nexus/.m2 directory had 141GB of data below it!
I renamed it and created another empty .m2. Then I submitted another build, but the problem remains.
Sounds like this could be related to this maven issue https://jira.codehaus.org/browse/MNG-5125 (See http://vivin.net/2011/07/20/fixing-maven-3-0-3s-dependency-resolution-performance-regression/ for some more info)
Maven 3.0.3 is currently used by Jenkins internally to parse the POMs
Correcting my previous comment, when I renamed the .m2 directory the next build stalled forever just after the Parsing POMs log line. I waited a few hours and gave up. Now that I returned the previous .m2 directory it seems to be going as previously.
Another thing I tried was to kill all agent.jar processes that were running in the agent machine, but it seemed to have no effect.
@kutzi: it makes sense to me. I'm not a Java guy, so, please forgive me if this is a dumb question, but, what would it take to have that aether-1.12 being used inside Jenkins? Would I have to rebuild Jenkins from scratch after editing some POM file?
How can I know which version of maven does Jenkins use? Perhaps, if I go back to Jenkins 1.409.1 it uses a previous version of maven which doesn't have that problem.
Thank you very much.
That's not a dumb question at all and I'm not 100% sure how to get it work.
I think it could work, if you replace the aether-*.jar's in the plugins/maven-plugin/WEB-INF/lib directory with the 1.12 versions
You'll probably also have to 'pin' the maven-plugin so it won't be replaced on startup with the bundled one.
Going back to an 1.409.1 (or any other 1.40x version) won't work, as they use AFAIK the same version of maven/aether
Olivier, wouldn't it make sense to bundle the aether 1.12 jars with the maven embedder?
should fix that issue sure
But maybe can introduce some discrepancy between the pom parser and a build with mvn 3.x except if users change aether in their mvn installation too.
There's always a potential discrepancy between the embedded Maven version for parsing the POMs and the one used to build - that has always been the case in Jenkins. So we wouldn't make the situation worse IMHO:
Bad news.
We've made a new installation of Jenkins 1.409.2 and recreated the single job we've been using to test. We've run it again to make sure the problem manifested itself, which it did.
Then we removed from the plugins/maven-plugin/WEB-INF/lib directory the following jars and put there the corresponding ones of version 1.12:
- aether-api-1.11.jar
- aether-connector-wagon-1.11.jar
- aether-impl-1.11.jar
- aether-spi-1.11.jar
- aether-util-1.11.jar
(There is a maven-aether-provider-3.0.3.jar there too, but we've left it as is.)
Then we started jenkins again and used lsof to verify that it was using the 1.12 jars. Then we run the build again but it took the same amount of time to finish. The problem remains...
(Next we're going to try to use another machine as build agent, just to change one more thing.)
Would be good if you could take some stack traces from the Jenkins process while it is in this delay period (https://wiki.jenkins-ci.org/display/JENKINS/Build+is+hanging)
Several (~3) traces with a couple of seconds delay between each would be best
Next we're going to try to use another machine as build agent, just to change one more thing.
Building in another machine took nearly the same amount of time to finish.
I will try to take some stack traces.
I zipped in attached JENKINS-11362_stack-traces.zip 5 of each page (client and server) taken by 'lynx -dump' with 2 seconds of delay.
The dumps already point in the direction that dependency parsing might be extremely slow. I thought that this should have been resolved by the aether update.
To verify this hypothesis I would be helpful if you could take additional traces:
at the beginning, in the middle and at the end of the delay period (all times roughly only of course)
I know that this may be difficult to time these, but these traces would be really helpful
I create a small script to take a 'lynx -dump -nolist -width=1024' of these pages every 15 seconds. These files are in attached fine-stack-traces.zip.
From console output we take some notable times:
11:33:09 builds start
11:33:18 builds hang
12:01:31 builds continues
12:13:01 builds ends
TIA!
Have you tried to build the same project in a shell with Maven 3.0.3?
It's obvious that the POM parsing takes a lot of time (seems to be related to depenencyManagement sections), but I cannot see, yet, where the time goes and escpecially if it's more a Jenkins or a Maven problem
I tried to build it using maven 3.0.3 but I got some errors with the maven-dependency-plugin. I'll need some help from the developers here in order to make it through.
Anyway, I'm attaching the file m3.log.gz which shows the try. Perhaps you can see if the dependency phase had already been passed.
The test took less than 20 seconds until the error message.
Still on Maven 3 investigation track, building in Jenkins with Timestamper plugin we can see some different times:
Using Maven 2:
11:33:09 Started by user anonymous
[...]
11:33:14 Found mavenVersion 2.2.1 from ...
[...]
12:01:39 Executing Maven: -B -f
/home/ipscate/hudson-gti/workspace/teste-de-delay-de-build/components/pom.xml
-s /l/disk0/ipscate/ip/nexus/settings.xml -P es,win64 clean deploy
-Darch.classifier=win64 -Dmaven.test.skip=true
Using Maven 3:
21:28:52 Started by user anonymous [...] 21:28:53 Found mavenVersion 3.0.3 from... [...] 21:29:06 Executing Maven: -B -f /home/ipscate/hudson-gti/workspace/teste-de-delay-de-build/components/pom.xml -s /l/disk0/ipscate/ip/nexus/settings.xml -P es,win64 clean package -Darch.classifier=win64 -Dmaven.test.skip=true
BTW, since our product is not ready for Maven 3 we can't complete the build cicle (install & deploy goals) and we get some warnings and error. We will submit these errors to dev/conf team tomorrow.
@kutzi: did you notice that using Maven 3 to build solves the delay problem? It seems that there is some strange conflict between the Maven 3.0.3 used internally by Jenkins and the Maven 2 invoked externally to build the package.
Unfortunately, it would be difficult for us to upgrade our build infrastructure to Maven 3 right now.
Do you have any other idea? Or any other information that we could provide to try to understand the root cause?
Yes, I've seen your comment. IMO it doesn't really fit, as the part where the delay seems to happen should be completely independent from the Maven version which is actually used to built: Maven 3.0.3 is used for parsing the POMs in any case.
So please check if you didn't change anything else when changing to maven 3 for building.
Apart from that I'm sorry that I don't have any more ideas currently.
If you could provide a minimal test project which reproduces the issue, however, that would help a lot.
@kutzi: I'm attaching two files that may let you reproduce the problem:
- simplificado-config.xml
is the Jenkins configuration for the job.
- simplificado.zip
has the complete Jenkins workspace for the job, with the local repository included.
I've tried to eliminate every dpendency from Subversion and from our internal Nexus instance. I've also pruned the workspace to contain just two submodules, so that the total delay now is about 25 seconds, as you can see in this log after 09:20:52:
09:20:52 Started by user anonymous 09:20:52 Building remotely on scate 09:20:52 ERROR: Ignore Problem expanding maven opts macros org.jenkinsci.plugins.tokenmacro.TokenMacro 09:20:52 Found mavenVersion 2.2.1 from file jar:file:/l/disk0/ipscate/tools/maven/maven-2.2.1/lib/maven-2.2.1-uber.jar!/META-INF/maven/org.apache.maven/maven-core/pom.properties 09:20:52 Parsing POMs 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/oss/commons/cpqd-etics-oss-commons-parent/6.1.0-SNAPSHOT/maven-metadata.xml 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/eai/cpqd-eai-parent/3.1.0-SNAPSHOT/maven-metadata.xml 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/landbase/cpqd-landbase-parent/2.13.0-SNAPSHOT/maven-metadata.xml 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/eng/cpqd-eng-parent/6.1.0-SNAPSHOT/maven-metadata.xml 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/osp/cpqd-osp-parent/6.1.0-SNAPSHOT/maven-metadata.xml 09:20:52 downloaded artifact file:///home/ipscate/hudson-gti/workspace/simplificado/.repository//br/com/cpqd/isp/cpqd-isp-parent/6.1.0-SNAPSHOT/maven-metadata.xml 09:21:16 ERROR: Ignore Problem expanding maven opts macros org.jenkinsci.plugins.tokenmacro.TokenMacro 09:21:16 [simplificado] $ /home/ipscate/tools/java/jdk1.6.0_23_32/bin/java -Xmx1200m -XX:MaxPermSize=256m -cp /home/ipscate/hudson-gti/maven-agent.jar:/home/ipscate/hudson-gti/classworlds.jar hudson.maven.agent.Main /l/disk0/ipscate/tools/maven/maven-2.2.1 /home/ipscate/hudson-gti/slave.jar /home/ipscate/hudson-gti/maven-interceptor.jar 9739 /home/ipscate/hudson-gti/maven2.1-interceptor.jar 09:21:17 <===[HUDSON REMOTING CAPACITY]===>channel started <...>
During this delay I straced the slave process and saw that it openned some files in the local repository thousands of times. It also didn't fetch anything from the network.
Please, tell me if you can reproduce the problem with these files.
Thank you a lot.
@kutzi: I forgot to mention (although I'm sure you would have figured this out by yourself) that you need to edit the .m2/settins.xml file to change the two "file://" URLs to make them point to the location where you expanded the workspace zip.
Maybe this is related to JENKINS-7535 (Rebuilding dependency graph slow on large installations)?
Gustavo: I've tried the sample project now and I cannot build it. Currently, it's complaining about missing dependency:
Path to dependency:
1) br.com.cpqd.etics:cpqd-etics-parent:pom:6.1.0-SNAPSHOT
2) org.codehaus.plexus:plexus-utils:jar:1.5.15
I've checked that it's indeed not in the local repository you had attached.
There's one strange thing which I've seen already: you've specified to use a private repository in Jenkins AND you provide an alternate settings.xml in which you specify a repository - which on your computer probably evaluates to the same path as Jenkins private repo?
Is there a reason why you've chosen this configuration?
Hi, Kutzi. The build suceeds for us. Andreyev is going to move everything to another Jenkins instance in another machine to make sure we're working in an imaculated environment. We'll let you know how that goes.
Regarding the specification of a repository in the settings.xml, I simply don't know any better.
I tried to delete the repo specification in it but then the build failed with this error:
14:05:00 [FATAL] Non-resolvable parent POM: Could not find artifact br.com.cpqd.parent:cpqd-super-parent:pom:6.5.2 in central (http://repo1.maven.org/maven2) and 'parent.relativePath' points at wrong local POM @ line 2, column 13
It seems that even with the -o option Maven is trying to fetch this artifact from outside...
Now that I look more closely into it, the error occurs before the mvn invocation. So, it seems that it is the maven embedded in the Jenkins maven plugin who is fetching these artifacts, not the maven used in the build.
How can I specify to the embedded maven to use the same local repository, so that I can make maven populate the repository?
> Now that I look more closely into it, the error occurs before the mvn invocation.
Yes, that's what I thought all time. Sorry, if I didn't make it clear.
But, I've now also seen that I was wrong about that the same code is used there regardless if it's maven 2 or 3 build. I'll post my initial suspicions shortly.
> How can I specify to the embedded maven to use the same local repository, so that I can make maven populate the repository?
That's what the private repository feature is made for
I've indeed seen longer POM parsing phases with Maven 2.2.1 than with Maven 3 (the build only fails later).
I've the suspicion that this may have been introduced with JENKINS-8390 and specifically https://github.com/jenkinsci/jenkins/commit/b84efb908f5c5d80959d16300d22fac312109ac3
There a 'correct', but also potential much slower POM parsing was introduced for Maven 2. In older Hudson version this was probably no problem, since the Maven embedder used was itself based on Maven 2.x
I still like to point out that at my company we're also still using Maven 2.2.1 jobs in Jenkins 1.409.2 and we haven't seen any (at least no dramatic) slow down compared to older versions. So it must still be something special about your project which worsens the situation.
I've chatted with olamy about your project setup and we've come to the conclusion that your settings.xml might be the case of the whole problem. The repositories and pluginRepositories configs in the settings.xml are thought to point to an external repository (as an in-house enterprise repository server) and definitely not to a local Maven repository. Seems like Maven has serious problems to cope with your setup.
I'd advice you to get rid of the setting or (preferrably) use an enterprise repository server in-house.
In our production environment the settings.xml configuration point to our internal Nexus instance. I changed it in an attempt to make sure nothing would be fetched from remote repositories. But the delay is the same one way or another.
Regarding the issue with the missing artifacts, I don't understand it. The job is configured with a private repository, but after a successful build I don't find those missing artifacts under $WORKSPACE/.repository. It seems that either the embedded maven is placing them in another local repository or it's not saving them anywhere. Perhaps the "private repository" option only effect is to pass the -Dmaven.repo.local=/path/to/repo option to the maven that will perform the build, having no effect on the embedded maven invocation...
I've checked it again and it looks like the private repo settings is propagated everywhere.
Also: if it's passed succesfully to the maven that performs the build, the artifacts should end up in that repository, because that's the job which does the 'install', right?
Code changed in jenkins
User: Christoph Kutzinski
Path:
maven-plugin/pom.xml
http://jenkins-ci.org/commit/jenkins/f8774f0599850a1c920c4da9e058009b77219703
Log:
Using lib-jenkins-maven-embedder version with workaround for https://jira.codehaus.org/browse/MNG-5125.
I still hope this gives some ease regarding JENKINS-11362
Please use the artifact from this build http://ci.jenkins-ci.org/job/jenkins_main_trunk/1302/ (or later)
olamy fixed my fix and included aether 1.13 which includes some additional performance fixes.
Nothing changed.
I'm trying to make a self-contained job so you can try it there.
Last week we upgraded our instance to Jenkins 1.458 and confirmed that the problem persists.
So far we are still managing this by segregating our big maven jobs to an old Hudson instance in which the problem doesn't happen.
Found http://jira.codehaus.org/browse/MNG-5312 as the root cause. The hudson.maven.PomInfo constructor calls MavenProject.getParent and this can be exceedingly slow. Options for Jenkins include
- Wait for a Maven release (3.0.5?) with the performance problem fixed.
- Bundle a patched version of maven-core with the plugin.
- Stop calling getParent from PomInfo, at least at the user's option. But this might cause Jenkins to be missing dependency information needed for build triggers: an update to a parent POM (other than the relativePath default of ../pom.xml) would not be "noticed" by modules using it.
Provisionally going with option #1, hence closing this issue since we would anyway bundle any new Maven release with the plugin when it becomes available. But I could work on #2 or #3 instead if anyone thinks it would be a good idea. Unclear if many users are affected by this issue; the impact is on projects using lots of <scope>import</scope>.
Note: 3.1.0-alpha-1 with this fix has been proposed. Looks like it will take close to a year to get this simple fix into an official release, sigh.
Playing with a Maven 3.1.0 upgrade but it is not trivial, due to Aether and SLF4J changes. Created maven-3.1.0 branches in lib-jenkins-maven-embedder and jenkins. @olamy do you want to pick up the rest, specifically making Maven3Builder and EmbedderLoggerImpl compilable and testing the result?
Code changed in jenkins
User: Christoph Kutzinski
Path:
pom.xml
http://jenkins-ci.org/commit/maven-plugin/ebf86b4567dfec37c9e21a4491ff7ca27a77d0c9
Log:
Using lib-jenkins-maven-embedder version with workaround for https://jira.codehaus.org/browse/MNG-5125.
I still hope this gives some ease regarding JENKINS-11362
Originally-Committed-As: f8774f0599850a1c920c4da9e058009b77219703
Two weeks ago we've upgraded to 1.530 the Jenkins instance that was causing us this problem and it seems to be working ok. The Maven 3.1.0 release finally incorporated the commits that solved the problem (e778ea6~3..e778ea6) and when it was incorporated in Jenkins 1.526 the problem has finally been solved.
I have same issue when running maven build on slave node
Jenkins ver. 1.580
maven 3.2.1
master node : AIX 7.1
Slave node windows 7 64bit
please advise
I'd say unless it is a regression it must be another problem because the bug was fixed on Jenkins 1.526.
Are you using maven style or free style jobs?
A while ago we decided to ditch maven style and go for free style jobs due to other considerations. However, since only maven style jobs use the maven bundled on Jenkins to preprocess POMs it may be the source of your problems too. If you can, try to use a free style job instead and see if it solves the delay problem.
amgad whatever issue you are seeing may or may not be related. Would need to be separately diagnosed, or filed with a reproducible test case. Anyway you need to specify the Maven plugin version at a bare minimum.
I noticed that I had the maven-plugin pinned at version 1.401. So, I unpinned it and restarted the server, so that now it is at version 1.409.2. Unfortunatelly, the problem remains.