On ci.jenkins-ci.org I noticed that there are a lot of channel reader threads claiming to be connected to remote-slave-6.

      ps output indicates that these threads are eating up CPUs, even though the method names suggest they should be blocking.

      "Channel reader thread: remote-slave-6" prio=10 tid=0x0000000004be5800 nid=0x1f59 runnable [0x00007fb6722e1000]
         java.lang.Thread.State: RUNNABLE
              at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:77)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:68)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      "Channel reader thread: remote-slave-6" prio=10 tid=0x0000000004be4000 nid=0x71d8 runnable [0x00007fb673ffe000]
         java.lang.Thread.State: RUNNABLE
              at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:68)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      "Channel reader thread: remote-slave-6" prio=10 tid=0x0000000004e48800 nid=0x4edc runnable [0x00007fb6719d8000]
         java.lang.Thread.State: RUNNABLE
              at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:67)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      "Channel reader thread: remote-slave-6" prio=10 tid=0x0000000004df2000 nid=0x406d runnable [0x00007fb67b432000]
         java.lang.Thread.State: RUNNABLE
              at com.trilead.ssh2.channel.ChannelInputStream.read(ChannelInputStream.java:79)
              at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:82)
              at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:68)
              at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:93)
              at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
              at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
              at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
      
      

          [JENKINS-23471] Channel reader thread consumes all CPUs

          Not sure exactly what triggers this, but based on the symptom, it appears to accumulate these threads over time

          Kohsuke Kawaguchi added a comment - Not sure exactly what triggers this, but based on the symptom, it appears to accumulate these threads over time

          Full thread dump attached for later investigatoin.

          Kohsuke Kawaguchi added a comment - Full thread dump attached for later investigatoin.

          As a stop-gap measure till we investigate, I've set up a cron job on my account on ci.jenkins-ci.org to restart Jenkins daily in the night.

          Kohsuke Kawaguchi added a comment - As a stop-gap measure till we investigate, I've set up a cron job on my account on ci.jenkins-ci.org to restart Jenkins daily in the night.

          Olaf Lenz added a comment -

          I also see this happening sometimes. Each of the threads causes a load of +1, so this can easily lead to the jenkins server causing a load of >20.
          It does not seem to be connected to anything happening on the node, like a job getting started, but it does seem to coincide with some action happening at the GUI.

          BTW, I am running the Jenkins LTS, i.e. Jenkins 1.565.1.

          If I can help to find the bug, let me know. My experiences with Java debugging are small.

          Olaf Lenz added a comment - I also see this happening sometimes. Each of the threads causes a load of +1, so this can easily lead to the jenkins server causing a load of >20. It does not seem to be connected to anything happening on the node, like a job getting started, but it does seem to coincide with some action happening at the GUI. BTW, I am running the Jenkins LTS, i.e. Jenkins 1.565.1. If I can help to find the bug, let me know. My experiences with Java debugging are small.

          Olaf Lenz added a comment -

          A further observation, which may give an idea: I just saw this happening when starting jenkins. In the jenkins log, I noticed that it tried to start the slave on a node twice. And that was apparently the thread that causes the high load. From the log:

          Aug 21, 2014 5:07:27 PM hudson.slaves.SimpleScheduledRetentionStrategy check
          INFO: Trying to launch computer cip20 as schedule says it should be on-line at this point in time
          ...
          [clip]
          ...
          Effective SlaveRestarter on master: null
          Aug 21, 2014 5:07:29 PM hudson.WebAppMain$3 run
          INFO: Jenkins is fully up and running
          [08/21/14 17:07:30] SSH Launch of cip12 on cip12 failed in 3,013 ms
          ...
          [clip]
          ...
          INFO: Trying to launch computer cip20 as schedule says it should be on-line at this point in time
          Aug 21, 2014 5:08:17 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>
          

          Olaf Lenz added a comment - A further observation, which may give an idea: I just saw this happening when starting jenkins. In the jenkins log, I noticed that it tried to start the slave on a node twice. And that was apparently the thread that causes the high load. From the log: Aug 21, 2014 5:07:27 PM hudson.slaves.SimpleScheduledRetentionStrategy check INFO: Trying to launch computer cip20 as schedule says it should be on-line at this point in time ... [clip] ... Effective SlaveRestarter on master: null Aug 21, 2014 5:07:29 PM hudson.WebAppMain$3 run INFO: Jenkins is fully up and running [08/21/14 17:07:30] SSH Launch of cip12 on cip12 failed in 3,013 ms ... [clip] ... INFO: Trying to launch computer cip20 as schedule says it should be on-line at this point in time Aug 21, 2014 5:08:17 PM hudson.node_monitors.AbstractNodeMonitorDescriptor$Record <init>

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          src/main/java/hudson/remoting/ChunkedInputStream.java
          http://jenkins-ci.org/commit/remoting/83da718ab06b2f881aa08434684a02ea95ff5135
          Log:
          [FIXED JENKINS-23471]

          if readUntilBreak hits stream EOF, it would end up hanging in a busy loop. EOF is a break boundary,
          so it should cause the method to return gracefully.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: src/main/java/hudson/remoting/ChunkedInputStream.java http://jenkins-ci.org/commit/remoting/83da718ab06b2f881aa08434684a02ea95ff5135 Log: [FIXED JENKINS-23471] if readUntilBreak hits stream EOF, it would end up hanging in a busy loop. EOF is a break boundary, so it should cause the method to return gracefully.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          pom.xml
          http://jenkins-ci.org/commit/jenkins/8fc609fe0952b285d5b26a59fd5ff4c29704d33d
          Log:
          [JENKINS-23471 JENKINS-24050]

          Integrated the fix in remoting to Jenkins 1.580.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html pom.xml http://jenkins-ci.org/commit/jenkins/8fc609fe0952b285d5b26a59fd5ff4c29704d33d Log: [JENKINS-23471 JENKINS-24050] Integrated the fix in remoting to Jenkins 1.580.

          Olaf Lenz added a comment -

          Will the patch also be included into the LTS?

          Olaf Lenz added a comment - Will the patch also be included into the LTS?

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          pom.xml
          http://jenkins-ci.org/commit/jenkins/9c82fc42eb08b89047c544aaa586291ad1485472
          Log:
          [JENKINS-23471 JENKINS-24050]

          Integrated the fix in remoting to Jenkins 1.580.

          (cherry picked from commit 8fc609fe0952b285d5b26a59fd5ff4c29704d33d)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html pom.xml http://jenkins-ci.org/commit/jenkins/9c82fc42eb08b89047c544aaa586291ad1485472 Log: [JENKINS-23471 JENKINS-24050] Integrated the fix in remoting to Jenkins 1.580. (cherry picked from commit 8fc609fe0952b285d5b26a59fd5ff4c29704d33d)

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          pom.xml
          http://jenkins-ci.org/commit/jenkins/91c5551d4c7682d4adba28fe591fa7772eee62e0
          Log:
          [JENKINS-23471 JENKINS-24050]

          Integrated the fix in remoting to Jenkins 1.580.

          (cherry picked from commit 8fc609fe0952b285d5b26a59fd5ff4c29704d33d)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html pom.xml http://jenkins-ci.org/commit/jenkins/91c5551d4c7682d4adba28fe591fa7772eee62e0 Log: [JENKINS-23471 JENKINS-24050] Integrated the fix in remoting to Jenkins 1.580. (cherry picked from commit 8fc609fe0952b285d5b26a59fd5ff4c29704d33d)

            Unassigned Unassigned
            kohsuke Kohsuke Kawaguchi
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: