Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-30823

Exception in thread "Channel reader thread: <node-name>" java.lang.OutOfMemoryError: Java heap space



    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Environment:
      Jenkins 1.609.1 LTS. Attaching support plugin bundle with additional details.
    • Similar Issues:


      Two days ago we encountered an issue where all builds attempted on two of our nodes failed. Console output for each failed build was the following:

      ERROR: SEVERE ERROR occurs 
      org.jenkinsci.lib.envinject.EnvInjectException: hudson.remoting.ChannelClosedException: channel is already closed 
      at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:75)
      at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:81) 
      at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:39) 
      at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:574) 
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:494) 
      at hudson.model.Run.execute(Run.java:1741) 
      at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 
      at hudson.model.ResourceController.execute(ResourceController.java:98) 
      at hudson.model.Executor.run(Executor.java:374) 
      Caused by: hudson.remoting.ChannelClosedException: channel is already closed 
      at hudson.remoting.Channel.send(Channel.java:550) 
      at hudson.remoting.Request.call(Request.java:129) 
      at hudson.remoting.Channel.call(Channel.java:752) 
      at hudson.FilePath.act(FilePath.java:1073) 
      at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:44)
      ... 8 more 
      Caused by: java.io.IOException 
      at hudson.remoting.Channel.close(Channel.java:1109) 
      at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) 
      at hudson.remoting.PingThread.ping(PingThread.java:126) 
      at hudson.remoting.PingThread.run(PingThread.java:85) 
      Caused by: java.util.concurrent.TimeoutException: Ping started at 1444039585493 hasn't completed by 1444039825494 
      ... 2 more

      These are windows slaves (JNLP) which run the slave jar as a service. We attempted to restart the slave services first, then rebooted the slaves and ultimately has to restart the Jenkins master to recover from the problem.

      Our after action root cause analysis turned up the following in the Jenkins masters's log file:

      Exception in thread "Channel reader thread: DC-JENWIN-001" java.lang.OutOfMemoryError: Java heap space
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
      Exception in thread "Channel reader thread: DC-JENWIN-003" java.lang.OutOfMemoryError: Java heap space
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)

      These entries are remarkable in that they don't have any date-time stamps. It seems as if they were not emitted by the normal logger. None the less by looking at the log entries immediately preceding and following them we were able to localize the entries to within a minute accuracy and they are clearly the cause of the problem.

      Moreover, after these two entires occurred, entries similar to the following appears in the master log a regular intervals:

      Oct 05, 2015 10:08:54 AM com.cloudbees.jenkins.support.AsyncResultCache run
      INFO: Could not retrieve metrics data from DC-JENWIN-001 for caching
      at hudson.remoting.Request$1.get(Request.java:272)
      at hudson.remoting.Request$1.get(Request.java:206)
      at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
      at com.cloudbees.jenkins.support.AsyncResultCache.run(AsyncResultCache.java:95)
      at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      These errors appeared for both of and only the affected slave nodes for retrieving "metrics" retrieving the "java version", retrieving the "environment", and all the various other statuses.

      Bottom Line

      Connectivity to these slaves suffered a critical and apparently unrecoverable failure. This failure originated on the the Master side of the connection due apparently to a heap space issue. Yet the master did not properly recognizing the nodes as off line and continued to send builds to them. Given that jobs tend to be "sticky" to a certain slave as long as it is available, this effectively rendered several critical jobs un-buildable.

      Recommended Resolution.

      Though exact reproduction of this problem may be impossible, it is hopped that some steps can be taken to enable Jenkins to correctly identify this problem and take corrective or at least mitigating action, such as re-initializing whatever thread/component was inoperable or marking the node as off-line.


          Issue Links


            kbaltrinic Kenneth Baltrinic created issue -
            danielbeck Daniel Beck made changes -
            Field Original Value New Value
            Component/s support-core-plugin [ 18146 ]
            Component/s core [ 15593 ]
            Assignee Steven Christou [ schristou ]
            oleg_nenashev Oleg Nenashev made changes -
            Labels jenkins slave jenkins remoting slave
            oleg_nenashev Oleg Nenashev made changes -
            Component/s remoting [ 15489 ]
            Component/s support-core-plugin [ 18146 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 165950 ] JNJira + In-Review [ 182221 ]
            schristou Steven Christou made changes -
            Assignee Steven Christou [ schristou ]
            oleg_nenashev Oleg Nenashev made changes -
            Link This issue is related to JENKINS-28492 [ JENKINS-28492 ]
            oleg_nenashev Oleg Nenashev made changes -
            Epic Link JENKINS-38833 [ 175240 ]
            oleg_nenashev Oleg Nenashev made changes -
            Assignee rsandell [ rsandell ]
            rsandell rsandell made changes -
            Assignee rsandell [ rsandell ]
            cloudbees CloudBees Inc. made changes -
            Remote Link This issue links to "CloudBees Internal OSS-809 (Web Link)" [ 18822 ]


              Unassigned Unassigned
              kbaltrinic Kenneth Baltrinic
              0 Vote for this issue
              6 Start watching this issue