• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • remoting
    • None
    • Platform: All, OS: Windows XP

      We have experienced for some time now that Slaves die with PermGen space
      OutOfMemoryError.

      We are currently using Hudson 1.285, but it has been a problem with a number of
      releases. Possibly all releases we have ever used - not entirely sure.

      Our Master is restarted (hard via its Windows Service, without preparing it for
      shutdown) at least once every day.

      After a week or so, the Slaves (4 of them) start dropping like flies with
      exhausted PermGen space.

      All projects run via Ant. The builder setup looks like this:

      <hudson.tasks.Ant>
      <targets>ci.build ci.validate</targets>
      <antOpts>-Xmx512m</antOpts>
      <buildFile>build.ear.xml</buildFile>
      <properties></properties>
      </hudson.tasks.Ant>

      I have changed the slaves to run with -XX:+HeapDumpOnOutOfMemoryError. I have
      the dump available if anyone wants to have a look - it is 5MB zipped.

      Looking at the dump with Eclipse's Memory Analyzer the prime suspects (according
      to the tool - I'm a rookie on the subject) is:

      46 instances of "hudson.remoting.RemoteClassLoader", loaded by
      "sun.misc.Launcher$AppClassLoader @ 0x3007630" occupy 3.606.384 (60,70%) bytes.

          [JENKINS-3406] PermGen space outofmemory on slaves

          adding myself as CC

          Krystian Nowak added a comment - adding myself as CC

          Yes, that's a lot of classloaders. Please send the dump to me.

          Also, after you let the slave run for a while, go to
          http://server/hudson/computer/YOURSLAVENAME/dumpExportTable and capture that
          page, too, which should show us what classloaders from the master is exposed to
          this slave.

          Kohsuke Kawaguchi added a comment - Yes, that's a lot of classloaders. Please send the dump to me. Also, after you let the slave run for a while, go to http://server/hudson/computer/YOURSLAVENAME/dumpExportTable and capture that page, too, which should show us what classloaders from the master is exposed to this slave.

          Created an attachment (id=668)
          Dump from a slave after a few builds

          jskovjyskebankdk added a comment - Created an attachment (id=668) Dump from a slave after a few builds

          I had to restart Hudson to get the dump (to disable security), so it may not be
          the rich dump you wished for.
          If not, let me know, and I'll let it run for longer next time.

          jskovjyskebankdk added a comment - I had to restart Hudson to get the dump (to disable security), so it may not be the rich dump you wished for. If not, let me know, and I'll let it run for longer next time.

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=24470
          Log:
          [FIXED JENKINS-3406] I fixed one leak of the class loader caused by JNLP slaves that reconnect to the master. So I tentatively mark this bug as closed. This fix will be in 1.337.

          SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=24470 Log: [FIXED JENKINS-3406] I fixed one leak of the class loader caused by JNLP slaves that reconnect to the master. So I tentatively mark this bug as closed. This fix will be in 1.337.

          I have been using the new build for the past two weeks and have disabled nightly restart of the slaves. Still running, so I guess you nailed it.
          Thanks!

          jskovjyskebankdk added a comment - I have been using the new build for the past two weeks and have disabled nightly restart of the slaves. Still running, so I guess you nailed it. Thanks!

            Unassigned Unassigned
            jskovjyskebankdk jskovjyskebankdk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: