Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-4045

hudson.remoting.Channel.unexport attempts to unexport oid instead of object

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Done
    • Component/s: core
    • Labels:
      None
    • Environment:
      Platform: All, OS: All
    • Similar Issues:

      Description

      We came across an issue where we were seeing persistent memory leaks in Hudson.
      It appeared to be similar if not the same as the latter part of this thread:

      http://www.nabble.com/Possible-memory-leak-in-hudson.remoting.ExportTable-td12000299.html

      It was extremely subtle to track this thing down, but I've found the cause of it.

      It looks like in the course of handling Unexport and EOF commands, Hudson calls
      hudson.remoting.Channel.unexport(oid). This in turn calls
      hudson.remoting.ExportTable.unexport(oid), but unfortunately the latter method
      expects to be passed the exported object itself, not the oid. It dutifully looks
      to see if anyone has exported the Integer oid and when it finds nobody has, it
      returns without releasing the reference from the intended Entry.

      The fix is simple; introduce an unexportByOid method onto
      hudson.remoting.ExportTable and have hudson.remoting.Channel call that instead.
      Patch forthcoming.

        Attachments

          Activity

          Hide
          mdillon mdillon added a comment -

          Created an attachment (id=783)
          Patch to add unexportByOid to ExportTable and call it from Channel

          Show
          mdillon mdillon added a comment - Created an attachment (id=783) Patch to add unexportByOid to ExportTable and call it from Channel
          Hide
          statlor Scott Tatum added a comment -

          Adding myself to the cc list - I plan to confirm the fix on my local installs.

          Show
          statlor Scott Tatum added a comment - Adding myself to the cc list - I plan to confirm the fix on my local installs.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java
          trunk/hudson/main/remoting/src/main/java/hudson/remoting/ExportTable.java
          http://fisheye4.cenqua.com/changelog/hudson/?cs=19879
          Log:
          JENKINS-4045 The actual change was not committed in the previous one.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/remoting/src/main/java/hudson/remoting/Channel.java trunk/hudson/main/remoting/src/main/java/hudson/remoting/ExportTable.java http://fisheye4.cenqua.com/changelog/hudson/?cs=19879 Log: JENKINS-4045 The actual change was not committed in the previous one.
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=19878
          Log:
          [FIXED JENKINS-4045] Thanks for a great detective work! If you are interested in other bug fixes and you are not a committer yet, please let me know so that I can add you as a committer.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=19878 Log: [FIXED JENKINS-4045] Thanks for a great detective work! If you are interested in other bug fixes and you are not a committer yet, please let me know so that I can add you as a committer.
          Hide
          statlor Scott Tatum added a comment -

          I built from trunk and deployed a war with this fix. Before that, I had enabled
          my slaves for a couple of days. I was getting OOME after about 12 hours or so
          consistently, and visualvm showed the issue with the ExportTable entries was the
          culprit, taking up ~500 megs.

          After running the patched version for most of a day, I can see that used heap is
          staying low, and a heap dump shows nothing out of the ordinary. So, fix
          confirmed here. I can finally use slaves again. Thank you mdillon!!

          Show
          statlor Scott Tatum added a comment - I built from trunk and deployed a war with this fix. Before that, I had enabled my slaves for a couple of days. I was getting OOME after about 12 hours or so consistently, and visualvm showed the issue with the ExportTable entries was the culprit, taking up ~500 megs. After running the patched version for most of a day, I can see that used heap is staying low, and a heap dump shows nothing out of the ordinary. So, fix confirmed here. I can finally use slaves again. Thank you mdillon!!
          Hide
          kohsuke Kohsuke Kawaguchi added a comment -
              • Issue 4008 has been marked as a duplicate of this issue. ***
          Show
          kohsuke Kohsuke Kawaguchi added a comment - Issue 4008 has been marked as a duplicate of this issue. ***
          Hide
          torbent torbent added a comment -

          Just wanted to note that this fix didn't solve all of the leaks in (marked
          duplicate) issue 4008, but it has certainly helped.

          Show
          torbent torbent added a comment - Just wanted to note that this fix didn't solve all of the leaks in (marked duplicate) issue 4008, but it has certainly helped.

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            md5 Mike Dillon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: