Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28962

Memory leak on slaves when using Jnlp startup, listeners are registered but not removed any more

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • core, remoting
    • None
    • Jenkins ver. 1.617

    Description

      When I start slaves on Windows or Linux using the Jnlp startup method with the setting "take this slave offline when not needed", the slaves start with low memory usage, but quickly grow their usage up to the point where the slave-node itself goes OOM.

      When analyzing I found the following with Eclipse MAT (see below).

      This indicates to me that the class JnlpSlaveRestarterInstaller adds a listener, but never removes it.

      Thus some related data that is referenced from the JnlpSlaveRestarter is never freed as there are still listeners, even if new restarted were added in the meantime, quickly eating up the available memory on the slave.

      Attachments

        Issue Links

          Activity

            centic centic added a comment -

            The stacktrace of the periodic call is

            Daemon Thread [pool-1-thread-87 for channel] (Suspended (breakpoint at line 156 in hudson.remoting.Engine))	
            	hudson.remoting.Engine.addListener(hudson.remoting.EngineListener) line: 156	
            	jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call() line: 70	
            	jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call() line: 52	
            	hudson.remoting.UserRequest<RSP,EXC>.perform(hudson.remoting.Channel) line: 121	
            	hudson.remoting.UserRequest<RSP,EXC>.perform(hudson.remoting.Channel) line: 49	
            	hudson.remoting.Request$2.run() line: 325	
            	hudson.remoting.InterceptingExecutorService$1.call() line: 68	
            	java.util.concurrent.FutureTask<V>.run() line: not available	
            	java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) line: not available	
            	java.util.concurrent.ThreadPoolExecutor$Worker.run() line: not available	
            	hudson.remoting.Engine$1$1.run() line: 69	
            	java.lang.Thread.run() line: not available	
            
            centic centic added a comment - The stacktrace of the periodic call is Daemon Thread [pool-1-thread-87 for channel] (Suspended (breakpoint at line 156 in hudson.remoting.Engine)) hudson.remoting.Engine.addListener(hudson.remoting.EngineListener) line: 156 jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call() line: 70 jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2.call() line: 52 hudson.remoting.UserRequest<RSP,EXC>.perform(hudson.remoting.Channel) line: 121 hudson.remoting.UserRequest<RSP,EXC>.perform(hudson.remoting.Channel) line: 49 hudson.remoting.Request$2.run() line: 325 hudson.remoting.InterceptingExecutorService$1.call() line: 68 java.util.concurrent.FutureTask<V>.run() line: not available java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) line: not available java.util.concurrent.ThreadPoolExecutor$Worker.run() line: not available hudson.remoting.Engine$1$1.run() line: 69 java.lang.Thread.run() line: not available
            centic centic added a comment -

            The invocation of the JnlpSlaveRestarterInstaller is triggered by this stacktrace:

            Thread [Channel reader thread: channel] (Suspended (breakpoint at line 66 in hudson.remoting.InterceptingExecutorService))	
            	hudson.remoting.InterceptingExecutorService.wrap(java.lang.Runnable, V) line: 66	
            	hudson.remoting.InterceptingExecutorService.submit(java.lang.Runnable, T) line: 42	
            	hudson.remoting.InterceptingExecutorService.submit(java.lang.Runnable) line: 37	
            	hudson.remoting.UserRequest<RSP,EXC>(hudson.remoting.Request<RSP,EXC>).execute(hudson.remoting.Channel) line: 304	
            	hudson.remoting.Channel$2.handle(hudson.remoting.Command) line: 484	
            	hudson.remoting.SynchronousCommandTransport$ReaderThread.run() line: 60	
            
            centic centic added a comment - The invocation of the JnlpSlaveRestarterInstaller is triggered by this stacktrace: Thread [Channel reader thread: channel] (Suspended (breakpoint at line 66 in hudson.remoting.InterceptingExecutorService)) hudson.remoting.InterceptingExecutorService.wrap(java.lang.Runnable, V) line: 66 hudson.remoting.InterceptingExecutorService.submit(java.lang.Runnable, T) line: 42 hudson.remoting.InterceptingExecutorService.submit(java.lang.Runnable) line: 37 hudson.remoting.UserRequest<RSP,EXC>(hudson.remoting.Request<RSP,EXC>).execute(hudson.remoting.Channel) line: 304 hudson.remoting.Channel$2.handle(hudson.remoting.Command) line: 484 hudson.remoting.SynchronousCommandTransport$ReaderThread.run() line: 60
            danielbeck Daniel Beck added a comment -

            listeners are registered but not removed any more

            Has this ever worked? It looks this has been the behavior since 1.55x when this feature was first introduced.

            danielbeck Daniel Beck added a comment - listeners are registered but not removed any more Has this ever worked? It looks this has been the behavior since 1.55x when this feature was first introduced.
            centic centic added a comment -

            I think this is in there from the beginning. It only manifests itself if connections break repeatedely or are reestablished often, that is likely the reason why it usually goes unnoticed.

            centic centic added a comment - I think this is in there from the beginning. It only manifests itself if connections break repeatedely or are reestablished often, that is likely the reason why it usually goes unnoticed.
            oleg_nenashev Oleg Nenashev added a comment -

            The issue seems to be still actual. Conditionally assigning it to myself

            oleg_nenashev Oleg Nenashev added a comment - The issue seems to be still actual. Conditionally assigning it to myself
            oleg_nenashev Oleg Nenashev added a comment -

            Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

            oleg_nenashev Oleg Nenashev added a comment - Unfortunately I have no capacity to work on Remoting in medium term, so I will unassign it and let others to take it. If somebody is interested to submit a pull request, I will be happy to help to get it reviewed and released.

            People

              Unassigned Unassigned
              centic centic
              Votes:
              4 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: