Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-12037

CLI - I/O error in channel Chunked connection/Unexpected termination of the channel - still occurring in Jenkins 1.449

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • cli
    • None
    • * Running on SLES9 Linux server with 4 CPUs and plenty of diskspace.
      * Tomcat 7.0.22
      * JDK 1.6.0_14
      * Only ONE Master configuration - no slaves are configured
      * 3 Executors - (one less than the max number of CPUs)

      We reported an issue some time back that was also listed as fixed in Jenkins 1.441:
      Log:
      [FIXED JENKINS-11130] SEVERE: I/O error in channel Chunked connection when using jenkins-cli.jar

      Perhaps another bug should NOT be submitted so I have added the following comments below the line to the original defect 11130 comments section in case it can be reviewed/re-opened.

      We did NOT try to make any adjustments to the Tomcat configuration:

      Tomcat Connector connectionUploadTimeout

      but we are also now seeing the same problem with Winstone when at this same 1.441 level. We did revert to the 1.438 version of the CLI (leaving the WAR at 1.441 running in Winstone) and that is serving asthe current workaround.

      ================================================================================================

      We have downloaded and installed the LATEST 1.441 release that lists the fix for this problem. Currently we were running 1.438 on Winstone only (since with Tomcat 6 or 7, we had experienced the error HOWEVER yet under Winstone, it worked OK so that was our workaround - while running 1.438).

      Now with Jenkins 1.441 - we are getting the ERROR again and NOW WITH BOTH Winstone and the Tomcat configurations). We have left the Jenkins 1.441 WAR file in place running on Winstone, and reverted the CLI jar file back to the 1.438 version for now and that appears to work again with Winstone.

      Checked Manifest of CLI jar downloaded with the 1.441 WAR installation:

      Manifest-Version: 1.0
      Archiver-Version: Plexus Archiver
      Created-By: Apache Maven
      Built-By: kohsuke
      Build-Jdk: 1.6.0_26
      Main-Class: hudson.cli.CLI
      Jenkins-CLI-Version: 1.441

      Under Tomcat 7, we get this stacktrace:

      Started by command line
      [workspace] $ /bin/bash -xe /opt/apache-tomcat-7.0.22_jenkins/temp/hudson32817888834817830.sh
      + /opt/Sun/jdk1.6.0_14/bin/java -jar /opt/Sun/jdk1.6.0_14/lib/jenkins-cli.jar -s http://11.22.33.44:8082/jenkins/ build XYZ_Project-SharedLibs -s -p SVN_PATH=trunk
      Dec 5, 2011 12:59:11 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel Chunked connection to http://11.22.33.44:8082/jenkins/cli
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy2.main(Unknown Source)
      at hudson.cli.CLI.execute(CLI.java:200)
      at hudson.cli.CLI._main(CLI.java:330)
      at hudson.cli.CLI.main(CLI.java:245)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1115)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Build step 'Execute shell' marked build as failure
      Notifying upstream projects of job completion
      Finished: FAILURE

      Under Winstone, we get this stacktrace - it's somewhat different:

      Started by command line
      [workspace] $ /bin/bash -xe /tmp/hudson10791816374444704.sh
      + /opt/Sun/jdk1.6.0_14/bin/java -jar /opt/Sun/jdk1.6.0_14/lib/jenkins-cli.jar -s http://11.22.33.44:8082/jenkins/ build XYZ_Project-SharedLibs -s -p SVN_PATH=trunk
      Dec 5, 2011 1:18:22 PM hudson.remoting.Channel$ReaderThread run
      SEVERE: I/O error in channel Chunked connection to http://11.22.33.44:8082/jenkins/cli
      java.io.IOException: Premature EOF
      at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) (http://www.http.ChunkedInputStream.readAheadBlocking%28ChunkedInputStream.java:538%29)
      at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) (http://www.http.ChunkedInputStream.readAhead%28ChunkedInputStream.java:582%29)
      at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) (http://www.http.ChunkedInputStream.read%28ChunkedInputStream.java:669%29)
      at java.io.FilterInputStream.read(FilterInputStream.java:116)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2504) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2504%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2499) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2499%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2488) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2488%29)
      at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
      at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
      at hudson.remoting.Request.call(Request.java:149)
      at hudson.remoting.Channel.call(Channel.java:681)
      at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
      at $Proxy2.main(Unknown Source)
      at hudson.cli.CLI.execute(CLI.java:200)
      at hudson.cli.CLI._main(CLI.java:330)
      at hudson.cli.CLI.main(CLI.java:245)
      Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
      at hudson.remoting.Request.abort(Request.java:273)
      at hudson.remoting.Channel.terminate(Channel.java:732)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1139)
      Caused by: java.io.IOException: Premature EOF
      at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) (http://www.http.ChunkedInputStream.readAheadBlocking%28ChunkedInputStream.java:538%29)
      at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) (http://www.http.ChunkedInputStream.readAhead%28ChunkedInputStream.java:582%29)
      at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) (http://www.http.ChunkedInputStream.read%28ChunkedInputStream.java:669%29)
      at java.io.FilterInputStream.read(FilterInputStream.java:116)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2504) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2504%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2499) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2499%29)
      at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2488) (http://www.protocol.http.HttpURLConnection$HttpInputStream.read%28HttpURLConnection.java:2488%29)
      at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
      at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
      at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
      at hudson.remoting.Channel$ReaderThread.run(Channel.java:1109)
      Build step 'Execute shell' marked build as failure
      Notifying upstream projects of job completion
      Finished: FAILURE

          [JENKINS-12037] CLI - I/O error in channel Chunked connection/Unexpected termination of the channel - still occurring in Jenkins 1.449

          For completeness can you get the associated logs from the jenkins server side. That should confirm that the server side is closing the downstream socket because it has detected inactivity/timeout on the upstream side.

          How long does the cli command take to fail? A rough (to the nearest second or so) timing is all that is needed. The issue was due to webserver connection timeout before so it is likely to be something like 60, 30, 20 or 15 seconds.

          Are you able to get a packet trace of the network traffic between cli and jenkins? capturing from the cli side with wireshark or tcpdump should be fine. If you are able to get that and can share it then drop me an email at oldelvet@java.net so that we can arrange to transfer it in private.

          I don't have a lot of time at the moment so if anyone else wants to take a look instead of me please feel free to do so!

          Richard Mortimer added a comment - For completeness can you get the associated logs from the jenkins server side. That should confirm that the server side is closing the downstream socket because it has detected inactivity/timeout on the upstream side. How long does the cli command take to fail? A rough (to the nearest second or so) timing is all that is needed. The issue was due to webserver connection timeout before so it is likely to be something like 60, 30, 20 or 15 seconds. Are you able to get a packet trace of the network traffic between cli and jenkins? capturing from the cli side with wireshark or tcpdump should be fine. If you are able to get that and can share it then drop me an email at oldelvet@java.net so that we can arrange to transfer it in private. I don't have a lot of time at the moment so if anyone else wants to take a look instead of me please feel free to do so!

          mark streit added a comment -

          I will get logs from the Tomcat server. It also failed with Winstone this time whereas before it worked with Winstone when using the 1.441 CLI. I will check on those logs assuming they are somewhere in the location where we put the WAR. Failure hit within 10 sec each time.

          Thanks

          mark streit added a comment - I will get logs from the Tomcat server. It also failed with Winstone this time whereas before it worked with Winstone when using the 1.441 CLI. I will check on those logs assuming they are somewhere in the location where we put the WAR. Failure hit within 10 sec each time. Thanks

          mark streit added a comment -

          Have attached logs from the Tomcat server where we are running this. We have downloaded the last version 1.449 WAR and the corresponding CLI jar for 1.449. We repeated the test and are seeing this I/O error in chunked connection again. (right now only 1.441 running in Winstone will work without this error).

          Console output from the Jenkins web page is also included as is the Shell command text that occupies the Execute Shell section of the build job configuration page.

          This all works with 1.441 as long as the container is Winstone.

          mark streit added a comment - Have attached logs from the Tomcat server where we are running this. We have downloaded the last version 1.449 WAR and the corresponding CLI jar for 1.449. We repeated the test and are seeing this I/O error in chunked connection again. (right now only 1.441 running in Winstone will work without this error). Console output from the Jenkins web page is also included as is the Shell command text that occupies the Execute Shell section of the build job configuration page. This all works with 1.441 as long as the container is Winstone.

          I'm still max'd out with $dayjob and have not had time to look in much detail at this.

          I just did a quick test on my test system with 1.451 using tomcat 6 and do not see any timeouts there with the tomcat http data upload timeout set to 20 seconds. That rules out any basic brokenness (I hope!).

          Looking at your log files (see below) there is one instance where an error occurs 12 seconds after the previous timeout. That suggests that tomcat 7 is timing out in around 10 seconds. The jenkins channel keepalive ping is set for 15 seconds so it could be that the ping is still not aggressive enough.

          We probably just need to test with a 5 seconds ping time and see if that fixes things.

          First error reported in ConsoleOutput.log

          Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel Chunked connection to http://99.99.99.94:8082/jenkins/cli
          java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
          Caused by: java.io.EOFException
          

          This corresponds to the following two connections in localhost_access_log.2012-02-06.txt

          10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935
          10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
          

          The former is the downstream link and the latter is the upstream (sending from cli to jenkins) hence there being no output in that direction.

          catalina.out (equivalent to standalone jenkins.log) has the other side of the error
          This confirms that the upstream reader sees the input data closed by tomcat.

          Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel HTTP full-duplex channel b625b58a-d9d7-4a42-856c-287e953edb47
          java.net.SocketTimeoutException: Read timed out
          	at java.net.SocketInputStream.socketRead0(Native Method)
          

          The next error in the console output occurs 12 seconds later.

          Feb 6, 2012 2:48:00 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel Chunked connection to http://99.99.99.94:8082/jenkins/cli
          java.io.IOException: Unexpected termination of the channel
          	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
          
          	
          10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935
          10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
          

          The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command.

          10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156
          10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
          
          10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156
          10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 -
          

          I wonder if the http data upload inactivity timeout is 10 seconds in tomcat7.

          Richard Mortimer added a comment - I'm still max'd out with $dayjob and have not had time to look in much detail at this. I just did a quick test on my test system with 1.451 using tomcat 6 and do not see any timeouts there with the tomcat http data upload timeout set to 20 seconds. That rules out any basic brokenness (I hope!). Looking at your log files (see below) there is one instance where an error occurs 12 seconds after the previous timeout. That suggests that tomcat 7 is timing out in around 10 seconds. The jenkins channel keepalive ping is set for 15 seconds so it could be that the ping is still not aggressive enough. We probably just need to test with a 5 seconds ping time and see if that fixes things. First error reported in ConsoleOutput.log Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http: //99.99.99.94:8082/jenkins/cli java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) Caused by: java.io.EOFException This corresponds to the following two connections in localhost_access_log.2012-02-06.txt 10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935 10.25.50.94 - - [06/Feb/2012:14:47:38 -0500] "POST /jenkins/cli HTTP/1.1" 200 - The former is the downstream link and the latter is the upstream (sending from cli to jenkins) hence there being no output in that direction. catalina.out (equivalent to standalone jenkins.log) has the other side of the error This confirms that the upstream reader sees the input data closed by tomcat. Feb 6, 2012 2:47:38 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel HTTP full-duplex channel b625b58a-d9d7-4a42-856c-287e953edb47 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) The next error in the console output occurs 12 seconds later. Feb 6, 2012 2:48:00 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http: //99.99.99.94:8082/jenkins/cli java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) 10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 13935 10.25.50.94 - - [06/Feb/2012:14:48:00 -0500] "POST /jenkins/cli HTTP/1.1" 200 - The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command. 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 - 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 10156 10.25.50.94 - - [06/Feb/2012:14:47:39 -0500] "POST /jenkins/cli HTTP/1.1" 200 - I wonder if the http data upload inactivity timeout is 10 seconds in tomcat7.

          Richard Mortimer added a comment - - edited

          Oops comment added to wrong issue!

          Richard Mortimer added a comment - - edited Oops comment added to wrong issue!

          mark streit added a comment -

          with regard to your question about the cli commands showing in the logs -

          "The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command."

          This test was only done with no other jobs being run. Only the command shell being run that is running the jobs with the -s parameter. It "appears" to us that the -s parameter gets ignored and other jobs start running before we expect them to.

          mark streit added a comment - with regard to your question about the cli commands showing in the logs - "The access log has a couple of extra cli commands in the meantime. It isn't clear whether these are for the same job or for something else. Assuming they are for something else that means that the 2nd command timed out roughly 10 seconds after the previous cli command." This test was only done with no other jobs being run. Only the command shell being run that is running the jobs with the -s parameter. It "appears" to us that the -s parameter gets ignored and other jobs start running before we expect them to.

          If I build a test version for you with a 5 second heartbeat timeout are you able to test it? If so is there any particular Jenkins version that you would be testing with so that I can build a matching cli jar file.

          Regarding the -s parameter being ignored I vaguely remember that there is a bug reported in JIRA where it does not wait if there is already a queued instance of the same job. From what you describe this may not be applicable to your circumstances but it could.

          Richard Mortimer added a comment - If I build a test version for you with a 5 second heartbeat timeout are you able to test it? If so is there any particular Jenkins version that you would be testing with so that I can build a matching cli jar file. Regarding the -s parameter being ignored I vaguely remember that there is a bug reported in JIRA where it does not wait if there is already a queued instance of the same job. From what you describe this may not be applicable to your circumstances but it could.

          brianharris added a comment -

          Stack trace looks much like the popular JENKINS-6817.

          brianharris added a comment - Stack trace looks much like the popular JENKINS-6817 .

          mcs130 added a comment -

          Richard we obtained your 1.450 version of the JAR with heartbeat time interval change. We installed the 1.450 WAR into Tomcat 7.0.22 and run everything from jdk1.6.0_30 now - both the Tomcat startup and the JDK from which we run the CLI commands. We even set the connectionTimeout in the Tomcat server.xml from 20 seconds to as high as 60 seconds. Nothing is working - we still get the I/O chunked connection/Unexpected term of channel error. For now we have had to revert back to 1.441 running on Winstone only...and the original 1.441 CLI jar that came with it. When using this combination - it works but is not ideal. We can't get above 1.441 because even using Winstone with a later version fails in the same manner. And to your point, the -s parameter appears to get ignored when used. I will get the similar logs from the last experiment and send them as before.

          Thanks

          Mark

          mcs130 added a comment - Richard we obtained your 1.450 version of the JAR with heartbeat time interval change. We installed the 1.450 WAR into Tomcat 7.0.22 and run everything from jdk1.6.0_30 now - both the Tomcat startup and the JDK from which we run the CLI commands. We even set the connectionTimeout in the Tomcat server.xml from 20 seconds to as high as 60 seconds. Nothing is working - we still get the I/O chunked connection/Unexpected term of channel error. For now we have had to revert back to 1.441 running on Winstone only...and the original 1.441 CLI jar that came with it. When using this combination - it works but is not ideal. We can't get above 1.441 because even using Winstone with a later version fails in the same manner. And to your point, the -s parameter appears to get ignored when used. I will get the similar logs from the last experiment and send them as before. Thanks Mark

          Well I think that pretty much rules out the ping interval directly. Obviously there may be something that is blocking the channel/pings. Alternately there may be something external that is breaking the connection. The logs will definitely be useful to confirm that it is still the upstream (CLI to Jenkins) connection that is being closed.

          I'm not sure there is much more I can do without a reproducible test case that I can run here or unless someone who can reproduce can get a network packet dump (that they are willing to share) that will hopefully allow us to piece together what is going on at the network level.

          Richard Mortimer added a comment - Well I think that pretty much rules out the ping interval directly. Obviously there may be something that is blocking the channel/pings. Alternately there may be something external that is breaking the connection. The logs will definitely be useful to confirm that it is still the upstream (CLI to Jenkins) connection that is being closed. I'm not sure there is much more I can do without a reproducible test case that I can run here or unless someone who can reproduce can get a network packet dump (that they are willing to share) that will hopefully allow us to piece together what is going on at the network level.

          @brianharris agree that the stacktrace looks similar. The one difference that I'm aware of is that the connections in JENKINS-12037 are done over the cli channel using two HTTP connections rather than to the JNLP port. In this case the HTTP container can break connections if it thinks that there is a "bad" connection and we could reproduce this particular failure and trace it back to that.

          Richard Mortimer added a comment - @brianharris agree that the stacktrace looks similar. The one difference that I'm aware of is that the connections in JENKINS-12037 are done over the cli channel using two HTTP connections rather than to the JNLP port. In this case the HTTP container can break connections if it thinks that there is a "bad" connection and we could reproduce this particular failure and trace it back to that.

          Hi,

          I've just posted on the mailing list (http://groups.google.com/group/jenkinsci-users/browse_frm/thread/7f96cdf57ffed6cf# ) - perhaps this is the same issue.

          I have a multi-module project where I'm running the Maven goal "clean
          surefire-report:report".
          While the tests pass in our local machines, we're randomly having failures
          in one of the modules in Jenkins. The console shows:

           
          mavenExecutionResult exceptions not empty
          message : Failed to execute goal
          org.apache.maven.plugins:maven-surefire-plugin:2.12:test
          (default-test) on project cm-web: Error occurred in starting fork,
          check output in log
          cause : Error occurred in starting fork, check output in log
          Stack trace :
          org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
          execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test
          (default-test) on project cm-web: Error occurred in starting fork,
          check output in log
          

          We have Jenkins running in Tomcat and by running the build and inspecting
          tomcat's log I see:

           
          SEVERE: I/O error in channel Channel to Maven [java, -cp,
          /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-agent-1.2.jar:/usr/share/tomcat6/.jenkins/tools/Jenkins_maven/boot/plexus-classworlds-2.4.jar,
          org.jvnet.hudson.maven3.agent.Maven3Main,
          /usr/share/tomcat6/.jenkins/tools/Jenkins_maven,
          /usr/share/tomcat6/webapps/jenkins/WEB-INF/lib/remoting-2.12.jar,
          /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-interceptor-1.2.jar,
          40284]
          java.io.IOException: Unexpected termination of the channel
              at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133)
          Caused by: java.io.EOFException
              at
          java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
              at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) 
          

          Restarting tomcat makes the problem go away (until it "randomly" appears again after a few days)

          Miguel Almeida added a comment - Hi, I've just posted on the mailing list ( http://groups.google.com/group/jenkinsci-users/browse_frm/thread/7f96cdf57ffed6cf# ) - perhaps this is the same issue. I have a multi-module project where I'm running the Maven goal "clean surefire-report:report". While the tests pass in our local machines, we're randomly having failures in one of the modules in Jenkins. The console shows: mavenExecutionResult exceptions not empty message : Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project cm-web: Error occurred in starting fork, check output in log cause : Error occurred in starting fork, check output in log Stack trace : org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project cm-web: Error occurred in starting fork, check output in log We have Jenkins running in Tomcat and by running the build and inspecting tomcat's log I see: SEVERE: I/O error in channel Channel to Maven [java, -cp, /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-agent-1.2.jar:/usr/share/tomcat6/.jenkins/tools/Jenkins_maven/boot/plexus-classworlds-2.4.jar, org.jvnet.hudson.maven3.agent.Maven3Main, /usr/share/tomcat6/.jenkins/tools/Jenkins_maven, /usr/share/tomcat6/webapps/jenkins/WEB-INF/lib/remoting-2.12.jar, /usr/share/tomcat6/.jenkins/plugins/maven-plugin/WEB-INF/lib/maven3-interceptor-1.2.jar, 40284] java.io.IOException: Unexpected termination of the channel at hudson.remoting.Channel$ReaderThread.run(Channel.java:1133) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2553) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Restarting tomcat makes the problem go away (until it "randomly" appears again after a few days)

          I'm running version 1.454 on OSX Lion via the .pkg install. I believe that means I have it running via WinStone. I have a similar issue to this (the same I think). I call build -s JOB_NAME via cli. The channel times out after 1 minute exactly.

          Server log:

          Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel HTTP full-duplex channel fe769a28-6f68-4595-b207-be5ac016c678
          java.net.SocketTimeoutException: Read timed out
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(SocketInputStream.java:129)
          at java.net.SocketInputStream.read(SocketInputStream.java:182)
          at winstone.WinstoneInputStream.read(WinstoneInputStream.java:49)
          at hudson.util.ChunkedInputStream.getChunkSizeFromInputStream(ChunkedInputStream.java:223)
          at hudson.util.ChunkedInputStream.nextChunk(ChunkedInputStream.java:197)
          at hudson.util.ChunkedInputStream.read(ChunkedInputStream.java:120)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)

          Client Log:

          Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run
          SEVERE: I/O error in channel Chunked connection to http://10.1.11.19:8080/cli
          java.io.IOException: Premature EOF
          at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
          at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
          at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
          at java.io.FilterInputStream.read(FilterInputStream.java:116)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)
          Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
          at hudson.remoting.Request.call(Request.java:149)
          at hudson.remoting.Channel.call(Channel.java:681)
          at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158)
          at $Proxy2.main(Unknown Source)
          at hudson.cli.CLI.execute(CLI.java:271)
          at hudson.cli.CLI._main(CLI.java:417)
          at hudson.cli.CLI.main(CLI.java:322)
          Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF
          at hudson.remoting.Request.abort(Request.java:273)
          at hudson.remoting.Channel.terminate(Channel.java:732)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:1157)
          Caused by: java.io.IOException: Premature EOF
          at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538)
          at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582)
          at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669)
          at java.io.FilterInputStream.read(FilterInputStream.java:116)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667)
          at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656)
          at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248)
          at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541)
          at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551)
          at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296)
          at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
          at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127)

          I also connected WireShark to see if I could get any useful info from there and I noticed one thing among the mess of serialized data:

          "Execution of hudson.remoting.PingThread$Ping@2be548d is prohibited because the channel is restricted"

          I'm pretty sure my issue is that the server times out at 60 seconds and it seems like the line above might mean that the reason is that the timeout is set to 60 seconds and the PingThread is not running. Hope, this helps. Also, has anyone found a workaround for this yet?

          -George

          George Panayotov added a comment - I'm running version 1.454 on OSX Lion via the .pkg install. I believe that means I have it running via WinStone. I have a similar issue to this (the same I think). I call build -s JOB_NAME via cli. The channel times out after 1 minute exactly. Server log: Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel HTTP full-duplex channel fe769a28-6f68-4595-b207-be5ac016c678 java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at winstone.WinstoneInputStream.read(WinstoneInputStream.java:49) at hudson.util.ChunkedInputStream.getChunkSizeFromInputStream(ChunkedInputStream.java:223) at hudson.util.ChunkedInputStream.nextChunk(ChunkedInputStream.java:197) at hudson.util.ChunkedInputStream.read(ChunkedInputStream.java:120) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Client Log: Mar 13, 2012 6:53:59 PM hudson.remoting.Channel$ReaderThread run SEVERE: I/O error in channel Chunked connection to http://10.1.11.19:8080/cli java.io.IOException: Premature EOF at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) at java.io.FilterInputStream.read(FilterInputStream.java:116) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) Exception in thread "main" hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF at hudson.remoting.Request.call(Request.java:149) at hudson.remoting.Channel.call(Channel.java:681) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:158) at $Proxy2.main(Unknown Source) at hudson.cli.CLI.execute(CLI.java:271) at hudson.cli.CLI._main(CLI.java:417) at hudson.cli.CLI.main(CLI.java:322) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Premature EOF at hudson.remoting.Request.abort(Request.java:273) at hudson.remoting.Channel.terminate(Channel.java:732) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1157) Caused by: java.io.IOException: Premature EOF at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:538) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:582) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:669) at java.io.FilterInputStream.read(FilterInputStream.java:116) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2672) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2667) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2656) at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2248) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2541) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2551) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1296) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1127) I also connected WireShark to see if I could get any useful info from there and I noticed one thing among the mess of serialized data: "Execution of hudson.remoting.PingThread$Ping@2be548d is prohibited because the channel is restricted" I'm pretty sure my issue is that the server times out at 60 seconds and it seems like the line above might mean that the reason is that the timeout is set to 60 seconds and the PingThread is not running. Hope, this helps. Also, has anyone found a workaround for this yet? -George

          @Miguel Almeida thanks for the information. At the moment that seems sufficiently different to the situation/symptoms reported in this JIRA that I think it would be best if you could report that as a separate JIRA to stop confusion over symptoms. It may turn out to be related once the root cause is known as @brianharris points out.

          Richard Mortimer added a comment - @Miguel Almeida thanks for the information. At the moment that seems sufficiently different to the situation/symptoms reported in this JIRA that I think it would be best if you could report that as a separate JIRA to stop confusion over symptoms. It may turn out to be related once the root cause is known as @brianharris points out.

          @George Panayotov thanks for the data. This does indeed look like similar symptoms to JENKINS-12037.

          Thank you for taking a look using WireShark that might be a big clue as to what is going on. Looking at the source in FullDuplexHttpChannel.java that sets isRestricted based on user authentication so this could give a clue as to why it might occur for some and not others.

          That said if you still have the Wireshark output around can you take a look to see if you can answer the following:

          • Does the "is prohibited because the channel is restricted" message get repeated or does it just occur once at the start of the CLI session?
          • Do you see any other activity on either of the two http streams (upstream and downstream) during the 60 seconds after the initial activity?

          Richard

          Richard Mortimer added a comment - @George Panayotov thanks for the data. This does indeed look like similar symptoms to JENKINS-12037 . Thank you for taking a look using WireShark that might be a big clue as to what is going on. Looking at the source in FullDuplexHttpChannel.java that sets isRestricted based on user authentication so this could give a clue as to why it might occur for some and not others. That said if you still have the Wireshark output around can you take a look to see if you can answer the following: Does the "is prohibited because the channel is restricted" message get repeated or does it just occur once at the start of the CLI session? Do you see any other activity on either of the two http streams (upstream and downstream) during the 60 seconds after the initial activity? Richard

          This is the Wireshark capture of the failing transaction. Hope this helps.

          George Panayotov added a comment - This is the Wireshark capture of the failing transaction. Hope this helps.

          I think the "Channel is restricted" only happened once. I don't think there was any other activity during the 60 seconds. You should check jenkins-timeout to see if you can find anything.

          Thanks,
          George

          George Panayotov added a comment - I think the "Channel is restricted" only happened once. I don't think there was any other activity during the 60 seconds. You should check jenkins-timeout to see if you can find anything. Thanks, George

          @George Panayotov many thanks that is really good data.

          I've only had time for a very quick look at this but to answer my questions:

          • The ping thread only attempts to ping once at the start of the CLI session.
          • There is no activity on the thread at all after the ping thread fails until the 60 second timeout.

          What authentication schemes do you have enabled on your Jenkins instance? That might be key to why I don't see this issue.

          I'm not going to have time to look at this further this week due to other committments. But its clear that the ping thread is definitely failing in this case so hopefully fixing that will make things work.

          For the record I noticed the following in the packet capture file.

          Stream 1 62335 -> 8080

          • Starts at offset 0.0000
          • Download stream request started at packet 17 0.157159
          • Finish of stacktrace at packet 83 4.177468
          • No activity until 60.751270 when the server side closes the stream

          Stream 2 62336 -> 8080

          • Upload stream starts at packet 21 0.198207
          • Last upload activity at packet 73 0.591000
          • No activity until packet 86 60.751422
          • Winstone cleanly closes the upload stream.

          Richard Mortimer added a comment - @George Panayotov many thanks that is really good data. I've only had time for a very quick look at this but to answer my questions: The ping thread only attempts to ping once at the start of the CLI session. There is no activity on the thread at all after the ping thread fails until the 60 second timeout. What authentication schemes do you have enabled on your Jenkins instance? That might be key to why I don't see this issue. I'm not going to have time to look at this further this week due to other committments. But its clear that the ping thread is definitely failing in this case so hopefully fixing that will make things work. For the record I noticed the following in the packet capture file. Stream 1 62335 -> 8080 Starts at offset 0.0000 Download stream request started at packet 17 0.157159 Finish of stacktrace at packet 83 4.177468 No activity until 60.751270 when the server side closes the stream Stream 2 62336 -> 8080 Upload stream starts at packet 21 0.198207 Last upload activity at packet 73 0.591000 No activity until packet 86 60.751422 Winstone cleanly closes the upload stream.

          I'm using internal Jenkins users with Matrix based security (global, not project-based). I have 2 users one is "superuser" that has all rights and another is a least-privileged user who can basically just create builds and read info. I get this issue with either user. I'm not running through any sort of Apache or Tomcat proxy so I don't have SSL running. I plan to add this eventually but for now I wanted to get this working in the basic case.

          George Panayotov added a comment - I'm using internal Jenkins users with Matrix based security (global, not project-based). I have 2 users one is "superuser" that has all rights and another is a least-privileged user who can basically just create builds and read info. I get this issue with either user. I'm not running through any sort of Apache or Tomcat proxy so I don't have SSL running. I plan to add this eventually but for now I wanted to get this working in the basic case.

          mark streit added a comment -

          Based on the number of comments and information about the ping thread, is it safe to assume that it will still be some time before a fix is available? I understand that folks are buried with other work. We are trying to figure out what to do with a planned Jenkins upgrade in our environment because we are stuck with:

          Jenkins App WAR 1.441
          Jenkins CLI JAR 1.438
          Container: Winstone only / Tomcat 7 will not work

          Any attempt to use a combination involving later versions (and we ultimately know the CLI JAR and WAR versions need to align) does not seem to work - this I/O error in channel chunked connection exception shows up.

          Thanks

          mark streit added a comment - Based on the number of comments and information about the ping thread, is it safe to assume that it will still be some time before a fix is available? I understand that folks are buried with other work. We are trying to figure out what to do with a planned Jenkins upgrade in our environment because we are stuck with: Jenkins App WAR 1.441 Jenkins CLI JAR 1.438 Container: Winstone only / Tomcat 7 will not work Any attempt to use a combination involving later versions (and we ultimately know the CLI JAR and WAR versions need to align) does not seem to work - this I/O error in channel chunked connection exception shows up. Thanks

          @mark streit I think that I have enough clues now to say what the root cause is and am pretty sure that I have a workaround (although hack may be a better description).

          Progress since my last report.

          • I can reproduce this issue by forcing a restricted channel in the constructor for FullDuplexHttpChannel.java mentioned in an earlier comment.
          • From this I can see that the PingThread is getting stuck trying (and failing) to f.get but instead getting an ExecutionException.
            • Note I cannot see why the code would never terminate when remaining <= 0 but it does not seem to.
          • I have been able to make the PingThread work by adding an additional return after the onDead check and this makes things work without timeout.
          • It could be argued that the extra return just papers over the problem because whilst the timeout works the PingThread does not actually perform a successful ping because of the restricted channel.

          From the evidence above it seems that the problem was introduced by the fixes made for JENKINS-11097 which were delivered in 1.433.

          The fix that I have only requires changes in the cli-with-dependencies.jar that is used on the client side. If anyone can test in the next 24 hours I can build a version to match your Jenkins version. Please let me know what version is useful.

          Richard Mortimer added a comment - @mark streit I think that I have enough clues now to say what the root cause is and am pretty sure that I have a workaround (although hack may be a better description). Progress since my last report. I can reproduce this issue by forcing a restricted channel in the constructor for FullDuplexHttpChannel.java mentioned in an earlier comment. From this I can see that the PingThread is getting stuck trying (and failing) to f.get but instead getting an ExecutionException . Note I cannot see why the code would never terminate when remaining <= 0 but it does not seem to. I have been able to make the PingThread work by adding an additional return after the onDead check and this makes things work without timeout. It could be argued that the extra return just papers over the problem because whilst the timeout works the PingThread does not actually perform a successful ping because of the restricted channel. From the evidence above it seems that the problem was introduced by the fixes made for JENKINS-11097 which were delivered in 1.433. The fix that I have only requires changes in the cli-with-dependencies.jar that is used on the client side. If anyone can test in the next 24 hours I can build a version to match your Jenkins version. Please let me know what version is useful.

          Code changed in jenkins
          User: Richard Mortimer
          Path:
          changelog.html
          cli/src/main/java/hudson/cli/CLI.java
          http://jenkins-ci.org/commit/jenkins/ea1b80aebe85a9414d5befd58e976d85818a258d
          Log:
          [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection

          Use a ping timeout of 3/4 of the ping interval instead of the default
          4 minute timeout.
          This happens to workaround an issue in the remoting PingThread
          implementation which causes the ping to busy wait for the timeout
          period instead of keeping the channel alive.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: changelog.html cli/src/main/java/hudson/cli/CLI.java http://jenkins-ci.org/commit/jenkins/ea1b80aebe85a9414d5befd58e976d85818a258d Log: [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection Use a ping timeout of 3/4 of the ping interval instead of the default 4 minute timeout. This happens to workaround an issue in the remoting PingThread implementation which causes the ping to busy wait for the timeout period instead of keeping the channel alive.

          I've committed a change to the CLI code that brings the execution timeout in line with the ping period. This has the side effect of fixing the problem in my test environment (however the cli ping thread will busy wait roughly 75% of the time).

          In addition I've submitted pull request https://github.com/jenkinsci/remoting/pull/3 which will get rid of the busy waiting.

          Richard Mortimer added a comment - I've committed a change to the CLI code that brings the execution timeout in line with the ping period. This has the side effect of fixing the problem in my test environment (however the cli ping thread will busy wait roughly 75% of the time). In addition I've submitted pull request https://github.com/jenkinsci/remoting/pull/3 which will get rid of the busy waiting.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1624
          [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d)

          Result = SUCCESS
          richm : ea1b80aebe85a9414d5befd58e976d85818a258d
          Files :

          • cli/src/main/java/hudson/cli/CLI.java
          • changelog.html

          dogfood added a comment - Integrated in jenkins_main_trunk #1624 [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d) Result = SUCCESS richm : ea1b80aebe85a9414d5befd58e976d85818a258d Files : cli/src/main/java/hudson/cli/CLI.java changelog.html

          A snapshot build of 1.459 with the fix can be downloaded from
          cli-1.459-SNAPSHOT-jar-with-dependencies.jar this should allow testing against recent version of Jenkins.

          Richard Mortimer added a comment - A snapshot build of 1.459 with the fix can be downloaded from cli-1.459-SNAPSHOT-jar-with-dependencies.jar this should allow testing against recent version of Jenkins.

          mark streit added a comment -

          Richard

          THANK YOU for this... it appears to be WORKING NOW . We are using your snapshot build below and the GA Jenkins 1.457 version (WAR) and running on Tomcat 7.0.22. (not Winstone anymore). The connection is not dropped and the jobs sequence runs to completion.

          Is it safe to assume the fixed version will be part of the official 1.459 release?

          We really appreciate your time and help with getting this fixed.

          mark streit added a comment - Richard THANK YOU for this... it appears to be WORKING NOW . We are using your snapshot build below and the GA Jenkins 1.457 version (WAR) and running on Tomcat 7.0.22. (not Winstone anymore). The connection is not dropped and the jobs sequence runs to completion. Is it safe to assume the fixed version will be part of the official 1.459 release? We really appreciate your time and help with getting this fixed.

          Yes it will be part of 1.459. The snapshot build that you tested is direct from Jenkins on Jenkins builds.

          I mentioned it earlier but the current fix actually busy-waits during the ping cycle so you may see high cpu usage from the cli command. The busy-wait is fixed by the change that I have submitted to the underlying "remoting" library. That might take a release or two to filter through.

          Finally we should really say thank you to George who managed to get a packet trace with the failure so that we could work out what was going on.

          Richard Mortimer added a comment - Yes it will be part of 1.459. The snapshot build that you tested is direct from Jenkins on Jenkins builds. I mentioned it earlier but the current fix actually busy-waits during the ping cycle so you may see high cpu usage from the cli command. The busy-wait is fixed by the change that I have submitted to the underlying "remoting" library. That might take a release or two to filter through. Finally we should really say thank you to George who managed to get a packet trace with the failure so that we could work out what was going on.

          dogfood added a comment -

          Integrated in jenkins_ui-changes_branch #21
          [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d)

          Result = SUCCESS
          richm : ea1b80aebe85a9414d5befd58e976d85818a258d
          Files :

          • cli/src/main/java/hudson/cli/CLI.java
          • changelog.html

          dogfood added a comment - Integrated in jenkins_ui-changes_branch #21 [FIXED JENKINS-12037] CLI - I/O error in channel Chunked connection (Revision ea1b80aebe85a9414d5befd58e976d85818a258d) Result = SUCCESS richm : ea1b80aebe85a9414d5befd58e976d85818a258d Files : cli/src/main/java/hudson/cli/CLI.java changelog.html

          Moshe Sucaz added a comment -

          Hi,
          We have just downloaded Jenkins 1.461 (because we had this problem with 1.443), and I still see such failures in catalina logs. we are using tomcat 6.0.29. will upgrade to Tomcat 7.0.22 help?

          Moshe Sucaz added a comment - Hi, We have just downloaded Jenkins 1.461 (because we had this problem with 1.443), and I still see such failures in catalina logs. we are using tomcat 6.0.29. will upgrade to Tomcat 7.0.22 help?

          mark streit added a comment -

          A couple of things with this:

          1) you need to make sure you download and install the latest jenkins-cli.jar that comes with 1.460 or higher version of the WAR download...and have it in your classpath wherever you are utilizing the CLI jar.
          2) It's the CLI jar code that was fixed as part of this. Simply updating the WAR file on your Tomcat instance is only part of the upgrade.
          3) I checked our catalina.out logs on our Tomcat 7.0.22 installation and we don't see these anymore since the fix was delivered in 1.460 which is what we now run. We made sure wherever we use CLI commands that the 1.460 version of the jenkins-cli.jar is on the classpath.

          Hope this helps.

          mark streit added a comment - A couple of things with this: 1) you need to make sure you download and install the latest jenkins-cli.jar that comes with 1.460 or higher version of the WAR download...and have it in your classpath wherever you are utilizing the CLI jar. 2) It's the CLI jar code that was fixed as part of this. Simply updating the WAR file on your Tomcat instance is only part of the upgrade. 3) I checked our catalina.out logs on our Tomcat 7.0.22 installation and we don't see these anymore since the fix was delivered in 1.460 which is what we now run. We made sure wherever we use CLI commands that the 1.460 version of the jenkins-cli.jar is on the classpath. Hope this helps.

          Code changed in jenkins
          User: Richard Mortimer
          Path:
          src/main/java/hudson/remoting/PingThread.java
          http://jenkins-ci.org/commit/remoting/463b5427d272f5d0c2c97eb6c41ef95757eb6c15
          Log:
          Do not attempt to retry a channel get after an ExecutionException.
          Without this the pinger busy waits retrying the same operation for the
          timeout period. The timeout/retry mechanism was only intended to retry
          pings that were interrupted.

          This is part of the failure seen in JENKINS-12037 where the PingThread
          busy waits for 4 minutes when the Ping class fails to execute on a restricted
          channel.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Richard Mortimer Path: src/main/java/hudson/remoting/PingThread.java http://jenkins-ci.org/commit/remoting/463b5427d272f5d0c2c97eb6c41ef95757eb6c15 Log: Do not attempt to retry a channel get after an ExecutionException. Without this the pinger busy waits retrying the same operation for the timeout period. The timeout/retry mechanism was only intended to retry pings that were interrupted. This is part of the failure seen in JENKINS-12037 where the PingThread busy waits for 4 minutes when the Ping class fails to execute on a restricted channel.

            Unassigned Unassigned
            mcs13099 mark streit
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: