• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • cli
    • None

      There's a report from users that indicates a CLI client can hang at the following spot:

      java.lang.Thread.State: RUNNABLE 
      at java.net.SocketInputStream.socketRead0(Native Method) 
      at java.net.SocketInputStream.read(Unknown Source) 
      at java.io.FilterInputStream.read(Unknown Source) 
      at java.io.FilterInputStream.read(Unknown Source) 
      at javax.crypto.CipherInputStream.a(DashoA13*..) 
      at javax.crypto.CipherInputStream.read(DashoA13*..) 
      at java.io.DataInputStream.readFully(Unknown Source) 
      at java.io.DataInputStream.readFully(Unknown Source) 
      at hudson.cli.Connection.readByteArray(Connection.java:132) 
      at hudson.cli.CLI.connectViaCliPort(CLI.java:243) 
      at hudson.cli.CLI.<init>(CLI.java:134) 
      at hudson.cli.CLIConnectionFactory.connect(CLIConnectionFactory.java:72) 
      at hudson.cli.CLI._main(CLI.java:469) 
      at hudson.cli.CLI.main(CLI.java:384)
      

      At the same time the server has already progressed to the following state:

         java.lang.Thread.State: RUNNABLE
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:150)
          at java.net.SocketInputStream.read(SocketInputStream.java:121)
          at java.io.FilterInputStream.read(FilterInputStream.java:133)
          at java.io.FilterInputStream.read(FilterInputStream.java:107)
          at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:103)
          at javax.crypto.CipherInputStream.read(CipherInputStream.java:224)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
          - locked <0x000000041fa9e560> (a java.io.BufferedInputStream)
          at hudson.remoting.ClassicCommandTransport.create(ClassicCommandTransport.java:98)
          at hudson.remoting.Channel.<init>(Channel.java:392)
          at hudson.remoting.Channel.<init>(Channel.java:388)
          at hudson.cli.CliProtocol$Handler.runCli(CliProtocol.java:48)
          at hudson.cli.CliProtocol2$Handler2.run(CliProtocol2.java:73)
          at hudson.cli.CliProtocol2.handle(CliProtocol2.java:32)
          at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:150)
      

      The issue appears to be that the server thinks it has sent the server's identity (CliProtocol2.Handler2.run() line 62) but the client is still waiting for it. The problem is reported as always sporadically reproducible.

      I'm failing to reproduce this problem locally. If other people see this problem, please report that here.

          [JENKINS-20709] CLI client hangs

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          core/src/main/java/hudson/cli/CliProtocol2.java
          http://jenkins-ci.org/commit/jenkins/2fc99b337af5774c4028e0735de309cfaca78c0a
          Log:
          suspecting bytes not getting flushed see JENKINS-20709

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: core/src/main/java/hudson/cli/CliProtocol2.java http://jenkins-ci.org/commit/jenkins/2fc99b337af5774c4028e0735de309cfaca78c0a Log: suspecting bytes not getting flushed see JENKINS-20709

          Possibly related to JENKINS-18058

          That has the opposite problem of spurious output corrupting the stream.

          Richard Mortimer added a comment - Possibly related to JENKINS-18058 That has the opposite problem of spurious output corrupting the stream.

          Marco Miller added a comment -

          Kohsuke,
          I also work on that issue of ours (Ericsson).
          -Thx for your help so far btw =)
          Since we disabled these pingers -below, it seems like the hanging no longer happens..
          Of course I cannot guarantee that 100%, but I wanted you to be aware of that info:

          -Dhudson.remoting.Launcher.pingIntervalSec=0
          -Dhudson.remoting.Launcher.pingTimeoutSec=0
          -Dhudson.slaves.ChannelPinger.pingInterval=0

          Now, we didn't try to disable only either one of those 2 pingers -yet.
          -All we did was to disable both of them -so far.
          Your thoughts on this hypothesis are welcome! =)
          Thx+ again!!
          PS: should pinger-disabling be a "solution", what could be likely consequences of doing so in prod? -we wonder..
          PPS: another hypo: master waiting forever for slave to send response, while master's channel is artificially kept alive/open -thx to pinger(s).

          Marco Miller added a comment - Kohsuke, I also work on that issue of ours (Ericsson). -Thx for your help so far btw =) Since we disabled these pingers -below, it seems like the hanging no longer happens.. Of course I cannot guarantee that 100%, but I wanted you to be aware of that info: -Dhudson.remoting.Launcher.pingIntervalSec=0 -Dhudson.remoting.Launcher.pingTimeoutSec=0 -Dhudson.slaves.ChannelPinger.pingInterval=0 Now, we didn't try to disable only either one of those 2 pingers -yet. -All we did was to disable both of them -so far. Your thoughts on this hypothesis are welcome! =) Thx+ again!! PS: should pinger-disabling be a "solution", what could be likely consequences of doing so in prod? -we wonder.. PPS: another hypo: master waiting forever for slave to send response, while master's channel is artificially kept alive/open -thx to pinger(s).

          Marco Miller added a comment -

          (Please note that the above pingers-disabling hypothesis was inconclusive as we were trying it out.)

          Marco Miller added a comment - (Please note that the above pingers-disabling hypothesis was inconclusive as we were trying it out.)

          Jeremy Van Haren added a comment - - edited

          We are still seeing this issue where cli commands can hang. We are getting this on 1.580.2. We kick off about 100-200 cli commands nightly, and we'll get a hang on one of them about 1-2 times a week. So, 1/1000 times approximately.

          Jeremy Van Haren added a comment - - edited We are still seeing this issue where cli commands can hang. We are getting this on 1.580.2. We kick off about 100-200 cli commands nightly, and we'll get a hang on one of them about 1-2 times a week. So, 1/1000 times approximately.

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            9 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: