Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33809

Find out what's wrong with the tests on the 2.0 branch requiring so much RAM

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Tests on the 2.0 branch require crazy amounts of RAM or that the VM be killed after every test. This appears to be a regression from 1.x.

      Running the full test suite (on 2.0, Maven 3.3.3 and Java 8):

      Surefire forked booter 1:

      Surefire forked booter 2 (there is a little jconsole disconnection of ~4 minutes in my local env, but nothing remarkable happened in the middle):

      Maven launcher process:

      Running the full test suite (on 2.0, Maven 3.3.3 and Java 7 and -Xmx600m - lowered from the current value of -Xmx1g):

      Surefire forked booter 1:

      Surefire forked booter 2:

      Maven launcher process:

      Running the full test suite (on 1.x, Maven 3.3.3 and Java 7 and -Xmx256):

      Surefire forked booter 1:

      Surefire forked booter 2:

      Maven launcher process:

        Attachments

        1. forked-booter-1.png
          forked-booter-1.png
          354 kB
        2. forked-booter-2.png
          forked-booter-2.png
          331 kB
        3. java7-1.x-maven-launcher.png
          java7-1.x-maven-launcher.png
          20 kB
        4. java7-1.x-surefire-booter1.png
          java7-1.x-surefire-booter1.png
          56 kB
        5. java7-1.x-surefire-booter2.png
          java7-1.x-surefire-booter2.png
          50 kB
        6. java7-2.0-maven-launcher.png
          java7-2.0-maven-launcher.png
          21 kB
        7. java7-2.0-surefire-booter1.png
          java7-2.0-surefire-booter1.png
          70 kB
        8. java7-2.0-surefire-booter2.png
          java7-2.0-surefire-booter2.png
          75 kB
        9. jenkins1-maven-launcher.png
          jenkins1-maven-launcher.png
          253 kB
        10. jenkins1-surefire-booter.png
          jenkins1-surefire-booter.png
          261 kB
        11. jenkins2-maven-launcher.png
          jenkins2-maven-launcher.png
          230 kB
        12. jenkins2-surefire-booter.png
          jenkins2-surefire-booter.png
          259 kB
        13. maven-laucher.png
          maven-laucher.png
          152 kB

          Activity

          Hide
          amuniz Antonio Muñiz added a comment -

          It's visible in attached screenshots that the major difference on used RAM is on Maven launcher process, not surefire booter. Investigating why...

          Show
          amuniz Antonio Muñiz added a comment - It's visible in attached screenshots that the major difference on used RAM is on Maven launcher process, not surefire booter. Investigating why...
          Hide
          amuniz Antonio Muñiz added a comment -

          It seems not to be an issue introduced in 2.0, but in 1.x, workarounded by artificially limiting the memory to 800MB (https://github.com/jenkinsci/jenkins/commit/1015b395c155774d8c9c31d52c5b2821c27825c3).

          Bisecting on 1.x to see the commit that introduced the problem...

          Show
          amuniz Antonio Muñiz added a comment - It seems not to be an issue introduced in 2.0, but in 1.x, workarounded by artificially limiting the memory to 800MB ( https://github.com/jenkinsci/jenkins/commit/1015b395c155774d8c9c31d52c5b2821c27825c3 ). Bisecting on 1.x to see the commit that introduced the problem...
          Hide
          amuniz Antonio Muñiz added a comment -

          Well, this seems to be a Maven behavior (not sure if I should call it issue) not a Jenkins build process issue.

          Just tried to build a 1 year old commit and the same amount of memory (around 1.5 GB) is allocated by Maven during the build. The memory analysis shows that Maven allocated a big amount of memory at the very beginning of the build which is quickly freed by the GC (when the objects are still in Eden Space), that's why memory quickly goes down to ~400-500 MB and keeps there during the rest of the build.

          As demonstrated by this fix, the real amount of required memory is much lower than 1.5 GB, we just have to get the GC working a bit more during the initial moments of the build, that's all.

          Show
          amuniz Antonio Muñiz added a comment - Well, this seems to be a Maven behavior (not sure if I should call it issue) not a Jenkins build process issue. Just tried to build a 1 year old commit and the same amount of memory (around 1.5 GB) is allocated by Maven during the build. The memory analysis shows that Maven allocated a big amount of memory at the very beginning of the build which is quickly freed by the GC (when the objects are still in Eden Space), that's why memory quickly goes down to ~400-500 MB and keeps there during the rest of the build. As demonstrated by this fix , the real amount of required memory is much lower than 1.5 GB, we just have to get the GC working a bit more during the initial moments of the build, that's all.
          Hide
          amuniz Antonio Muñiz added a comment -

          Daniel Beck My recommendation is to pick 1015b395c155774d8c9c31d52c5b2821c27825c3 into the 2.0 branch and close this ticket.

          Show
          amuniz Antonio Muñiz added a comment - Daniel Beck My recommendation is to pick 1015b395c155774d8c9c31d52c5b2821c27825c3 into the 2.0 branch and close this ticket.
          Hide
          danielbeck Daniel Beck added a comment -

          Antonio Muñiz

          The comment at https://github.com/jenkinsci/jenkins/pull/2161#issuecomment-199963693 triggered me to file this issue.

          What concerns me is that https://github.com/jenkinsci/jenkins/blob/2.0/test/pom.xml#L205 was needed for 2.0 only (PR 2033).

          If, for the same surefire execution, I configure `reuseForks=false`, it also doesn't go OOM. So this suggest something's leaking that does not on 1.x.

          Show
          danielbeck Daniel Beck added a comment - Antonio Muñiz The comment at https://github.com/jenkinsci/jenkins/pull/2161#issuecomment-199963693 triggered me to file this issue. What concerns me is that https://github.com/jenkinsci/jenkins/blob/2.0/test/pom.xml#L205 was needed for 2.0 only (PR 2033). If, for the same surefire execution, I configure `reuseForks=false`, it also doesn't go OOM. So this suggest something's leaking that does not on 1.x.
          Hide
          amuniz Antonio Muñiz added a comment -

          Daniel Beck Ok. I'm running the full suite of tests to collect memory data, let's take a decision with real data on the table.

          Show
          amuniz Antonio Muñiz added a comment - Daniel Beck Ok. I'm running the full suite of tests to collect memory data, let's take a decision with real data on the table.
          Hide
          amuniz Antonio Muñiz added a comment - - edited

          Description updated with monitoring data of a full test suite run.

          Nothing makes me think there is a memory leak anywhere. The only suspicious is the Maven launcher process which reaches 886MB at the end of the build, but there is no process surpassing the current limit of 1GB. By the way, I don't think -Xmx1g is required in the surefire configuration as neither surefire fork 1 nor fork 2 is consuming more than 500MB.

          In conclusion, I can not reproduce the memory issue locally and there is nothing pointing to a memory leak. Why are there builds throwing OOM? I don't know, actually there are other builds finishing successfully (near in time), like this one. Could it be related to the infrastructure of https://ci.jenkins-ci.org? How is the configuration of celery node?

          Show
          amuniz Antonio Muñiz added a comment - - edited Description updated with monitoring data of a full test suite run. Nothing makes me think there is a memory leak anywhere. The only suspicious is the Maven launcher process which reaches 886MB at the end of the build, but there is no process surpassing the current limit of 1GB. By the way, I don't think -Xmx1g is required in the surefire configuration as neither surefire fork 1 nor fork 2 is consuming more than 500MB. In conclusion, I can not reproduce the memory issue locally and there is nothing pointing to a memory leak. Why are there builds throwing OOM? I don't know, actually there are other builds finishing successfully (near in time), like this one . Could it be related to the infrastructure of https://ci.jenkins-ci.org? How is the configuration of celery node?
          Hide
          amuniz Antonio Muñiz added a comment -

          Running the suite again and monitoring using JDK 7.

          Show
          amuniz Antonio Muñiz added a comment - Running the suite again and monitoring using JDK 7.
          Hide
          amuniz Antonio Muñiz added a comment -

          Description updated with reports of a full run on JDK 7. Memory usage does not show anything abnormal. Forked surefire booters use 400-600 MB as much and Maven launcher process doesn't go up to 300MB.

          I still think -Xmx1g is not really required.

          Maybe we could activate -XX:+HeapDumpOnOutOfMemoryError on both agents and master JVMs in https://ci.jenkins-ci.org so we get some hint next time it happens.

          I don't see anything else we can do here.

          Show
          amuniz Antonio Muñiz added a comment - Description updated with reports of a full run on JDK 7. Memory usage does not show anything abnormal. Forked surefire booters use 400-600 MB as much and Maven launcher process doesn't go up to 300MB. I still think -Xmx1g is not really required. Maybe we could activate -XX:+HeapDumpOnOutOfMemoryError on both agents and master JVMs in https://ci.jenkins-ci.org so we get some hint next time it happens. I don't see anything else we can do here.
          Hide
          amuniz Antonio Muñiz added a comment -

          In the last run I set -Xmx600m (it was the full test suite on Jenkins 2.0 and JDK 7, which seems to be the problematic configuration).
          I didn't get any OOM (monitoring results added to the description).

          Show
          amuniz Antonio Muñiz added a comment - In the last run I set -Xmx600m (it was the full test suite on Jenkins 2.0 and JDK 7, which seems to be the problematic configuration). I didn't get any OOM (monitoring results added to the description).
          Hide
          danielbeck Daniel Beck added a comment -

          Behavior looks absolutely sane. Maybe something environment dependent?

          Show
          danielbeck Daniel Beck added a comment - Behavior looks absolutely sane. Maybe something environment dependent?
          Hide
          amuniz Antonio Muñiz added a comment - - edited

          Today I was looking at OldDataMonitorTest for some unrelated reason and realized about this call: MemoryAssert.assertGC(ref). This call fills all the available memory until it generates an OOM, then it is caught and the memory freed so the JVM continues working. But this means that this test will consume all available memory always at some point.

          In Java 7, an OOM could happen - mainly - by two reasons: the JVM does not see more available virtual memory or the OS is not able to give the JVM more physical memory (even when it didn't reach the maximum amount of allowed virtual memory).

          When running the full test suite there are 3 Java processes running: currently with -Xmx800, -Xmx1g and -Xmx1g. Given the fact that at some point one of the processes will consume 1GB, perhaps the sum with the two others is reaching the maximum amount of physical memory (it really depends on what tests are concurrently running at that point), so it would explain why the OOM only happens sometimes.

          Daniel Beck How much physical memory celery has?

          Show
          amuniz Antonio Muñiz added a comment - - edited Today I was looking at OldDataMonitorTest for some unrelated reason and realized about this call: MemoryAssert.assertGC(ref) . This call fills all the available memory until it generates an OOM, then it is caught and the memory freed so the JVM continues working. But this means that this test will consume all available memory always at some point. In Java 7, an OOM could happen - mainly - by two reasons: the JVM does not see more available virtual memory or the OS is not able to give the JVM more physical memory (even when it didn't reach the maximum amount of allowed virtual memory). When running the full test suite there are 3 Java processes running: currently with -Xmx800 , -Xmx1g and -Xmx1g . Given the fact that at some point one of the processes will consume 1GB, perhaps the sum with the two others is reaching the maximum amount of physical memory (it really depends on what tests are concurrently running at that point), so it would explain why the OOM only happens sometimes . Daniel Beck How much physical memory celery has?
          Hide
          danielbeck Daniel Beck added a comment -

          Antonio Muñiz I have 16GB on my laptop that also had problems with tests. Celery has 8GB.

          Show
          danielbeck Daniel Beck added a comment - Antonio Muñiz I have 16GB on my laptop that also had problems with tests. Celery has 8GB.
          Hide
          amuniz Antonio Muñiz added a comment -

          Daniel Beck Ok, so my theory is... wrong

          Show
          amuniz Antonio Muñiz added a comment - Daniel Beck Ok, so my theory is... wrong
          Hide
          danielbeck Daniel Beck added a comment -

          Antonio Muñiz Do we need to reopen this?

          Show
          danielbeck Daniel Beck added a comment - Antonio Muñiz Do we need to reopen this?
          Hide
          amuniz Antonio Muñiz added a comment -

          Daniel Beck I don't think so. I could not reproduce the OOM locally (after many full runs). The issues we saw yesterday in https://ci.jenkins-ci.org did not reproduce after cleaning all zombie processes in celery node.

          So not much more to do, just merge https://github.com/jenkinsci/jenkins/pull/2220 and wait for the OOM to happen again (or perhaps it does not happen anymore).

          Show
          amuniz Antonio Muñiz added a comment - Daniel Beck I don't think so. I could not reproduce the OOM locally (after many full runs). The issues we saw yesterday in https://ci.jenkins-ci.org did not reproduce after cleaning all zombie processes in celery node. So not much more to do, just merge https://github.com/jenkinsci/jenkins/pull/2220 and wait for the OOM to happen again (or perhaps it does not happen anymore).
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Daniel Beck
          Path:
          test/pom.xml
          http://jenkins-ci.org/commit/jenkins/0d0314ddee72b227a770d294549d12ded3b3fbac
          Log:
          JENKINS-33809 Don't reuse forks

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: test/pom.xml http://jenkins-ci.org/commit/jenkins/0d0314ddee72b227a770d294549d12ded3b3fbac Log: JENKINS-33809 Don't reuse forks
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Daniel Beck
          Path:
          test/pom.xml
          http://jenkins-ci.org/commit/jenkins/574f83a962a62021cc60b6af1e3de5c0f1f008b8
          Log:
          Merge pull request #2264 from daniel-beck/reuseForks=false

          JENKINS-33809 Don't reuse forks

          Compare: https://github.com/jenkinsci/jenkins/compare/4f51944cf1f9...574f83a962a6

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Daniel Beck Path: test/pom.xml http://jenkins-ci.org/commit/jenkins/574f83a962a62021cc60b6af1e3de5c0f1f008b8 Log: Merge pull request #2264 from daniel-beck/reuseForks=false JENKINS-33809 Don't reuse forks Compare: https://github.com/jenkinsci/jenkins/compare/4f51944cf1f9...574f83a962a6

            People

            Assignee:
            amuniz Antonio Muñiz
            Reporter:
            danielbeck Daniel Beck
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: