Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13814

java.lang.OutOfMemoryError exception when getting the remote log

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • cvs-plugin
    • Jenkins 1.464 on Windows 7 64-bit, 12GB RAM; CVS Plug-in 2.4-SNAPSHOT

      We are getting a java.lang.OutOfMemoryError exception when getting the remote log.

      Started by an SCM change
      Building in workspace C:\x\workspace\project_name_here
      cvs checkout -r BRANCH_NAME_HERE -D 16 May 2012 23:27:38 -0700 -d path\projects path/projects
      cvs checkout: Updating path\projects
      cvs checkout: Updating path\projects/dir1
      cvs checkout: Updating path\projects/dir2/dirA
      ... (about 7,224 directories) ...
      cvs checkout: Updating path\projects/dirN
      cvs rlog: Logging path\projects
      ... (about 7,224 directories) ...
      cvs rlog: Logging path\projects/dirN
      FATAL: Java heap space
      java.lang.OutOfMemoryError: Java heap space
      at java.lang.StringCoding$StringDecoder.decode(Unknown Source)
      at java.lang.StringCoding.decode(Unknown Source)
      at java.lang.StringCoding.decode(Unknown Source)
      at java.lang.String.<init>(Unknown Source)
      at java.io.ByteArrayOutputStream.toString(Unknown Source)
      at hudson.scm.CVSSCM.getRemoteLogForModule(CVSSCM.java:540)
      at hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:415)
      at hudson.scm.CVSSCM.checkout(CVSSCM.java:825)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1218)
      at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586)
      at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475)
      at hudson.model.Run.run(Run.java:1434)
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:239)

      Some notes:

      • The path/ and path\ comes from the "Remote Name"/"Local Name" settings solution from Issue JENKINS-13264
      • We are using Jenkins 1.464 with the 2.4-SNAPSHOT version of the CVS plugin, but may not have been caused by this specific version
      • The arguments setting in the jenkins.xml file currently has -Xms1024m (anything larger and Jenkins service won't start)
      • The CVS module has many legacy projects spread out, and thus, contains more than 7,000 directories and many more files
      • Things had worked with the 1.6 plug-in & CVSNT client, but it also occasionally timed out getting the changelog, hence our testing of the 2.x plug-ins

          [JENKINS-13814] java.lang.OutOfMemoryError exception when getting the remote log

          Gene: it looks like your build is failing as you're either using a non Sun/Oracle JDK or too new a version of the JDK for the dependencies of the plugins for the dependent POM. You should be able to get round this by using a Sun/Oracle JDK (if you're not already using one) or by locally modifying your POM to use a newer version of Jenkins (don't check-in this change though).

          If you look back through your job log you should find a line telling you the last RLOG command issued. Could you run it from your command line and attach the output here (obviously obscure any sensitive values in the output).

          Michael Clarke added a comment - Gene: it looks like your build is failing as you're either using a non Sun/Oracle JDK or too new a version of the JDK for the dependencies of the plugins for the dependent POM. You should be able to get round this by using a Sun/Oracle JDK (if you're not already using one) or by locally modifying your POM to use a newer version of Jenkins (don't check-in this change though). If you look back through your job log you should find a line telling you the last RLOG command issued. Could you run it from your command line and attach the output here (obviously obscure any sensitive values in the output).

          Gene Liu added a comment - - edited

          Michael: Thanks! You are right, I used jdk1.7 to build the plugin. The build succeeds with jdk1.6.

          I backed to 1.6 for my other testing. So my question is –
          Does the current 2.5-snapshot include your fix?

          I do not know how to run RLOG command from a command line, could you please suggest? I could do that with 2.4 after my test currently on version 1.6

          Thanks!

          Gene Liu added a comment - - edited Michael: Thanks! You are right, I used jdk1.7 to build the plugin. The build succeeds with jdk1.6. I backed to 1.6 for my other testing. So my question is – Does the current 2.5-snapshot include your fix? I do not know how to run RLOG command from a command line, could you please suggest? I could do that with 2.4 after my test currently on version 1.6 Thanks!

          Gene: if you look back in the job log then you should find a statement of 'cvs rlog -S -d 00/00/00 00:00:00+0000<11/11/11 11:11:11+1111 moduleName' where the numbers are a correctly formatted timestamp and 'moduleName' is the name of the module currently being operated on. If you paste this command into a terminal then it should invoke your machine's CVS client to run rlog (you may want to change into the directory in your workspace that one of your modules is in so it picks up the correct authentication settings).

          Michael Clarke added a comment - Gene: if you look back in the job log then you should find a statement of 'cvs rlog -S -d 00/00/00 00:00:00+0000<11/11/11 11:11:11+1111 moduleName' where the numbers are a correctly formatted timestamp and 'moduleName' is the name of the module currently being operated on. If you paste this command into a terminal then it should invoke your machine's CVS client to run rlog (you may want to change into the directory in your workspace that one of your modules is in so it picks up the correct authentication settings).

          Gene Liu added a comment -

          Michael: As you suggested I run the cvs rlog from command line and works normal. There are values which we consider them as private data. I am sorry that I can not past it here.

          I will have a try of my newly build 2.5-SNAPSHOT version. I suppose that your fix has been checked into.

          Gene Liu added a comment - Michael: As you suggested I run the cvs rlog from command line and works normal. There are values which we consider them as private data. I am sorry that I can not past it here. I will have a try of my newly build 2.5-SNAPSHOT version. I suppose that your fix has been checked into.

          Gene Liu added a comment -

          Installed 2.5-SNAPSHOT (build from the latest source), still get following error. Seems like the fix is not in yet.
          -----------
          11:23:51 FATAL: Java heap space
          11:23:51 java.lang.OutOfMemoryError: Java heap space
          11:23:51 at java.util.Arrays.copyOf(Arrays.java:2786)
          11:23:51 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
          11:23:51 at java.io.PrintStream.write(PrintStream.java:430)
          11:23:51 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
          11:23:51 at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
          11:23:51 at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85)
          11:23:51 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168)
          11:23:51 at java.io.PrintStream.write(PrintStream.java:477)
          11:23:51 at java.io.PrintStream.print(PrintStream.java:619)
          11:23:51 at java.io.PrintStream.println(PrintStream.java:756)
          11:23:51 at org.netbeans.lib.cvsclient.commandLine.BasicListener.messageSent(BasicListener.java:104)
          11:23:51 at org.netbeans.lib.cvsclient.event.MessageEvent.fireEvent(MessageEvent.java:161)
          11:23:51 at org.netbeans.lib.cvsclient.event.EventManager.fireCVSEvent(EventManager.java:170)
          11:23:51 at org.netbeans.lib.cvsclient.response.MessageResponse.process(MessageResponse.java:104)
          11:23:51 at org.netbeans.lib.cvsclient.Client.handleResponse(Client.java:648)
          11:23:51 at org.netbeans.lib.cvsclient.Client.processRequests(Client.java:598)
          11:23:51 at org.netbeans.lib.cvsclient.command.log.RlogCommand.execute(RlogCommand.java:402)
          11:23:51 at org.netbeans.lib.cvsclient.Client.executeCommand(Client.java:710)
          11:23:51 at hudson.scm.CVSSCM.getRemoteLogForModule(CVSSCM.java:524)
          11:23:51 at hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:415)
          11:23:51 at hudson.scm.CVSSCM.checkout(CVSSCM.java:834)
          11:23:51 at hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
          11:23:51 at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565)
          11:23:51 at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453)
          11:23:51 at hudson.model.Run.run(Run.java:1376)
          11:23:51 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          11:23:51 at hudson.model.ResourceController.execute(ResourceController.java:88)
          11:23:51 at hudson.model.Executor.run(Executor.java:175)

          Gene Liu added a comment - Installed 2.5-SNAPSHOT (build from the latest source), still get following error. Seems like the fix is not in yet. ----------- 11:23:51 FATAL: Java heap space 11:23:51 java.lang.OutOfMemoryError: Java heap space 11:23:51 at java.util.Arrays.copyOf(Arrays.java:2786) 11:23:51 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) 11:23:51 at java.io.PrintStream.write(PrintStream.java:430) 11:23:51 at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) 11:23:51 at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) 11:23:51 at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85) 11:23:51 at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168) 11:23:51 at java.io.PrintStream.write(PrintStream.java:477) 11:23:51 at java.io.PrintStream.print(PrintStream.java:619) 11:23:51 at java.io.PrintStream.println(PrintStream.java:756) 11:23:51 at org.netbeans.lib.cvsclient.commandLine.BasicListener.messageSent(BasicListener.java:104) 11:23:51 at org.netbeans.lib.cvsclient.event.MessageEvent.fireEvent(MessageEvent.java:161) 11:23:51 at org.netbeans.lib.cvsclient.event.EventManager.fireCVSEvent(EventManager.java:170) 11:23:51 at org.netbeans.lib.cvsclient.response.MessageResponse.process(MessageResponse.java:104) 11:23:51 at org.netbeans.lib.cvsclient.Client.handleResponse(Client.java:648) 11:23:51 at org.netbeans.lib.cvsclient.Client.processRequests(Client.java:598) 11:23:51 at org.netbeans.lib.cvsclient.command.log.RlogCommand.execute(RlogCommand.java:402) 11:23:51 at org.netbeans.lib.cvsclient.Client.executeCommand(Client.java:710) 11:23:51 at hudson.scm.CVSSCM.getRemoteLogForModule(CVSSCM.java:524) 11:23:51 at hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:415) 11:23:51 at hudson.scm.CVSSCM.checkout(CVSSCM.java:834) 11:23:51 at hudson.model.AbstractProject.checkout(AbstractProject.java:1193) 11:23:51 at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565) 11:23:51 at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453) 11:23:51 at hudson.model.Run.run(Run.java:1376) 11:23:51 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) 11:23:51 at hudson.model.ResourceController.execute(ResourceController.java:88) 11:23:51 at hudson.model.Executor.run(Executor.java:175)

          Gene Liu added a comment -

          Michael: I have sent you the cvs rlog output via an email.

          Gene Liu added a comment - Michael: I have sent you the cvs rlog output via an email.

          Gene Liu added a comment -

          More inputs of cvs rlog stuck (for ever) –

          Executed method:
          java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)

          Executor #1 for sambstage : executing MAIN_DEV #13
          java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
          java.util.regex.Pattern$Curly.match(Pattern.java:3737)
          java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
          java.util.regex.Pattern$Curly.match1(Pattern.java:3797)
          java.util.regex.Pattern$Curly.match(Pattern.java:3746)
          java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
          java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
          java.util.regex.Pattern$Curly.match(Pattern.java:3744)
          java.util.regex.Pattern$Start.match(Pattern.java:3055)
          java.util.regex.Matcher.search(Matcher.java:1105)
          java.util.regex.Matcher.find(Matcher.java:535)
          hudson.scm.CvsChangeLogHelper.mapCvsLog(CvsChangeLogHelper.java:169)
          hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:419)
          hudson.scm.CVSSCM.checkout(CVSSCM.java:831)
          hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
          hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565)
          hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453)
          hudson.model.Run.run(Run.java:1376)
          hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
          hudson.model.ResourceController.execute(ResourceController.java:88)
          hudson.model.Executor.run(Executor.java:175)

          Gene Liu added a comment - More inputs of cvs rlog stuck (for ever) – Executed method: java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366) Executor #1 for sambstage : executing MAIN_DEV #13 java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366) java.util.regex.Pattern$Curly.match(Pattern.java:3737) java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) java.util.regex.Pattern$Curly.match1(Pattern.java:3797) java.util.regex.Pattern$Curly.match(Pattern.java:3746) java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) java.util.regex.Pattern$Curly.match0(Pattern.java:3782) java.util.regex.Pattern$Curly.match(Pattern.java:3744) java.util.regex.Pattern$Start.match(Pattern.java:3055) java.util.regex.Matcher.search(Matcher.java:1105) java.util.regex.Matcher.find(Matcher.java:535) hudson.scm.CvsChangeLogHelper.mapCvsLog(CvsChangeLogHelper.java:169) hudson.scm.CVSSCM.calculateChangeLog(CVSSCM.java:419) hudson.scm.CVSSCM.checkout(CVSSCM.java:831) hudson.model.AbstractProject.checkout(AbstractProject.java:1193) hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:565) hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:453) hudson.model.Run.run(Run.java:1376) hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) hudson.model.ResourceController.execute(ResourceController.java:88) hudson.model.Executor.run(Executor.java:175)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/scm/CVSChangeLogSet.java
          src/main/java/hudson/scm/CVSSCM.java
          src/main/java/hudson/scm/CvsChangeLogHelper.java
          src/main/java/hudson/scm/CvsLog.java
          src/test/java/hudson/scm/CvsChangeLogHelperTest.java
          http://jenkins-ci.org/commit/cvs-plugin/2565a8f550f61bbca9311aff60293806f4d6f67b
          Log:
          JENKINS-13814 don't buffer the entire rlog output in memory.

          The size of the data "cvs rlog" produces is roughtly O(N*M) where
          N is the # of files in the directory and M is the amount of changes.

          So even when a delta is small, on a large repository this can produce
          significant amount of data. Use of ByteArrayOutputStream causes a large
          spike that can kill a VM, so spill the data over to disk if we are
          getting large output.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/scm/CVSChangeLogSet.java src/main/java/hudson/scm/CVSSCM.java src/main/java/hudson/scm/CvsChangeLogHelper.java src/main/java/hudson/scm/CvsLog.java src/test/java/hudson/scm/CvsChangeLogHelperTest.java http://jenkins-ci.org/commit/cvs-plugin/2565a8f550f61bbca9311aff60293806f4d6f67b Log: JENKINS-13814 don't buffer the entire rlog output in memory. The size of the data "cvs rlog" produces is roughtly O(N*M) where N is the # of files in the directory and M is the amount of changes. So even when a delta is small, on a large repository this can produce significant amount of data. Use of ByteArrayOutputStream causes a large spike that can kill a VM, so spill the data over to disk if we are getting large output.

          I went ahead and made some performance improvements. I hope this didn't step on the toe of Michael.

          My CVS knowledge is rusty, but there's inherent problem in retaining the entire "cvs rlog" output in memory. CVS rlog returns some data for every file in the repository, so on a large repository, it'll produce huge data even if there haven't been many files that have changed. I fixed this.

          I'm also bit disappointed that the optimization we had in the CVS plugin 1.x appears to be gone, where we observed what files have changed during update and use that list to reduce the target of the log command. This was very useful for typical CI situation where your changes between builds are small. I suspect this also has a performance implication.

          Finally, I'd like to advise against relying regular expression matching on entire "cvs log" output block, as it is both error prone and fragile.

          When I look at https://github.com/jenkinsci/cvs-plugin/blob/master/src/main/java/hudson/scm/CvsLog.java#L273 I spot a number of problems right away — for example,

          [\r|\n]+

          is a mistake of

          [\r\n]+

          and there are various

          .+?

          that can incorrectly match multiple lines when the commit message contains stuff that looks suspiciously like cvs log output.

          And in some cases regular expression matching will result in very inefficient backtracking, which is what I suspect to be the cause when people report "cvs rlog stuck".

          I thought we used to bundle package-renamed CVS log parsing code taken from Ant, which I thought did the job better. I'm curious what the motivation is for replacing that with a custom parsing code.

          Kohsuke Kawaguchi added a comment - I went ahead and made some performance improvements. I hope this didn't step on the toe of Michael. My CVS knowledge is rusty, but there's inherent problem in retaining the entire "cvs rlog" output in memory. CVS rlog returns some data for every file in the repository, so on a large repository, it'll produce huge data even if there haven't been many files that have changed. I fixed this. I'm also bit disappointed that the optimization we had in the CVS plugin 1.x appears to be gone, where we observed what files have changed during update and use that list to reduce the target of the log command. This was very useful for typical CI situation where your changes between builds are small. I suspect this also has a performance implication. Finally, I'd like to advise against relying regular expression matching on entire "cvs log" output block, as it is both error prone and fragile. When I look at https://github.com/jenkinsci/cvs-plugin/blob/master/src/main/java/hudson/scm/CvsLog.java#L273 I spot a number of problems right away — for example, [\r|\n]+ is a mistake of [\r\n]+ and there are various .+? that can incorrectly match multiple lines when the commit message contains stuff that looks suspiciously like cvs log output. And in some cases regular expression matching will result in very inefficient backtracking, which is what I suspect to be the cause when people report "cvs rlog stuck". I thought we used to bundle package-renamed CVS log parsing code taken from Ant, which I thought did the job better. I'm curious what the motivation is for replacing that with a custom parsing code.

          Gene Liu added a comment -

          We expect an upgrade of cvs plugin with a shorter interval of changelog. However the result is that the upgraded version (2.x) takes much longer. If we want to use the upgraded one, we have to turn off the changelog feature.

          Gene Liu added a comment - We expect an upgrade of cvs plugin with a shorter interval of changelog. However the result is that the upgraded version (2.x) takes much longer. If we want to use the upgraded one, we have to turn off the changelog feature.

            mc1arke Michael Clarke
            jimg888 James Gustafson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: