Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71970

Memory leak due to channel listeners that are never cleared

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • workflow-api-plugin
    • Tested with Jenkins 2.375.4 and 2.414.1.
      workflow-api-plugin version: 1267.vd9b_a_ddd9eb_47

      Nodes have started experiencing memory leak issues after updating to workflow-api-plugin 1267.vd9b_a_ddd9eb_47. E.g. running a sh step in a pipeline will reserve memory which is never cleared. Eventually node's JVM will run out of memory as it gets filled up apparently with BufferedBuildListener$Replacements which each reserve 65 KB buffer. See screenshot.png and screenshot2.png.

      Stopping the job or performing a GC will not clear the reserved memory. Only solution currently is to disconnect the node from Jenkins and reconnect it to clear up its memory.

       

      Possible root cause

      https://github.com/jenkinsci/workflow-api-plugin/releases/tag/1248.v4b_91043341d2 --> https://github.com/jenkinsci/workflow-api-plugin/pull/294 - this change might have introduced a problem where a listener is added to the Channel but never removed.

       

      Reproduction steps

      Start a loop job that opens up a shell:

      timestamps {
        node("nodeName") {
          int i = 0
          while (true) {
            sh "echo ${i}"
            i++
            // Optionally use jmap to see the BufferedBuildListener count growing:
            //sh "jmap -histo <remoting-jar-pid>|egrep '(BufferedBuildListener|[[]B)'"
            //sleep time: 1, unit: "SECONDS"     
          }
        }
      } 

      Once run is started, see [jenkinsUrl]/manage/computer/[nodeName]/dumpExportTable → The dump will start showing an increasing number of object=hudson.CloseProofOutputStream@ABC123 type=hudson.CloseProofOutputStream interfaces=[java.io.OutputStream] - these will not get cleared until node is either rebooted or disconnected+connected from Jenkins.

       

      Affected component: tested only with workflow-api-plugin 1267.vd9b_a_ddd9eb_47 version. But could be already in version 1248.v4b_91043341d2 if analyzed root cause is correct.

       

      Workaround: Do a disconnect+connect to a node, this will clear up its memory.

       

      Possible similar issue: https://issues.jenkins.io/browse/JENKINS-70388

          [JENKINS-71970] Memory leak due to channel listeners that are never cleared

          Florent added a comment -

          We have same issue on Jenkins 2.414.1 after upgrading plugin to 1647

          Florent added a comment - We have same issue on Jenkins 2.414.1 after upgrading plugin to 1647

          Jesse Glick added a comment -

          Jesse Glick added a comment - https://github.com/jenkinsci/workflow-api-plugin/pull/309

          Jesse Glick added a comment -

          Looks like this would only affect static agents reused for a large number of builds.

          Jesse Glick added a comment - Looks like this would only affect static agents reused for a large number of builds.

          Alex added a comment - - edited

          mattik flok

          Did updating workflow-api-plugin (Pipeline: API) to 1275.veb_e0969ddf9e fix the issue for you?

          Alex added a comment - - edited mattik flok Did updating workflow-api-plugin ( Pipeline: API ) to 1275.veb_e0969ddf9e fix the issue for you?

          Alex added a comment - - edited

          I marked it "Open" since it doesn't appear to be fixed with my limited testing

          Alex added a comment - - edited I marked it "Open" since it doesn't appear to be fixed with my limited testing

          Matti added a comment -

          We're also still seeing the same problem after updating to 1275.veb_e0969ddf9e.

          Matti added a comment - We're also still seeing the same problem after updating to 1275.veb_e0969ddf9e.

          Devin Nusbaum added a comment -

          You can use a test like this to observe the behavior. There is one org.jenkinsci.plugins.workflow.log.BufferedBuildListener$Replacement$1@... entry for each sh step that ran.

           public class BufferedBuildListenerTest {
              @Rule
              public JenkinsRule r = new JenkinsRule();
              @Rule
              public BuildWatcher watcher = new BuildWatcher();
              @Test
              public void memoryLeak() throws Exception {
                  Node n = r.createOnlineSlave();
                  WorkflowJob p = r.createProject(WorkflowJob.class);
                  p.setDefinition(new CpsFlowDefinition(
                          "node('" + n.getNodeName() + "') {\n" +
                          "  for (int i = 0; i < 10; i++) {\n" +
                          "    sh(/echo ${i}/)\n" +
                          "  }\n" +
                          "}"));
                  r.buildAndAssertSuccess(p);
                  System.out.println(n.getChannel().call(new CheckChannelListeners()));
              }
          
              private static class CheckChannelListeners extends MasterToSlaveCallable<String, Exception> {
                  @Override
                  public String call() throws Exception {
                      Channel channel = getChannelOrFail();
                      Field listenersField = Channel.class.getDeclaredField("listeners");
                      listenersField.setAccessible(true);
                      List listeners = (List)listenersField.get(channel);
                      StringBuilder builder = new StringBuilder();
                      for (Object listener : listeners) {
                          builder.append(listener);
                          builder.append('\n');
                      }
                      return builder.toString();
                  }
              }
          }
          

          As far as a fix, I think the most natural thing would be for closing the remote stream to remove the listener, but I am not sure if the remote stream ever actually gets closed in this case. Perhaps making the channel listener hold the CloseableOutputStream as a weak reference would be good enough.

          Devin Nusbaum added a comment - You can use a test like this to observe the behavior. There is one org.jenkinsci.plugins.workflow.log.BufferedBuildListener$Replacement$1@... entry for each sh step that ran. public class BufferedBuildListenerTest {     @Rule     public JenkinsRule r = new JenkinsRule();     @Rule     public BuildWatcher watcher = new BuildWatcher(); @Test     public void memoryLeak() throws Exception {         Node n = r.createOnlineSlave();         WorkflowJob p = r.createProject(WorkflowJob.class);         p.setDefinition( new CpsFlowDefinition(                 "node( '" + n.getNodeName() + "' ) {\n" +                 "   for ( int i = 0; i < 10; i++) {\n" +                 "    sh(/echo ${i}/)\n" +                 "  }\n" +                 "}" ));         r.buildAndAssertSuccess(p);         System .out.println(n.getChannel().call( new CheckChannelListeners()));     } private static class CheckChannelListeners extends MasterToSlaveCallable< String , Exception> {         @Override         public String call() throws Exception {             Channel channel = getChannelOrFail();             Field listenersField = Channel. class. getDeclaredField( "listeners" );             listenersField.setAccessible( true );             List listeners = (List)listenersField.get(channel);             StringBuilder builder = new StringBuilder();             for ( Object listener : listeners) {                 builder.append(listener);                 builder.append( '\n' );             }             return builder.toString();         }     } } As far as a fix, I think the most natural thing would be for closing the remote stream to remove the listener, but I am not sure if the remote stream ever actually gets closed in this case. Perhaps making the channel listener hold the CloseableOutputStream as a weak reference would be good enough.

          Jesse Glick added a comment -

          Jesse Glick added a comment - https://github.com/jenkinsci/workflow-api-plugin/pull/311

          Alex added a comment -

          mattik please test if the fix 1281 works

          Alex added a comment - mattik please test if the fix 1281 works

          Matti added a comment -

          Tested with 1281.vca_5fddb_3fceb_ version, could not reproduce the problem anymore. Thank you for the fix jglick!

          Matti added a comment - Tested with 1281.vca_5fddb_3fceb_ version, could not reproduce the problem anymore. Thank you for the fix jglick !

            jglick Jesse Glick
            mattik Matti
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: