• Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Hudson 1.378-1.383

      Since 1.378, the remoting functions of hudson have become unstable. When using a Launcher to execute commands remotely on a slave, sometimes it fails to return any data. The following stack trace is found in the hudson log when this occurs.

      Oct 15, 2010 10:12:17 AM hudson.remoting.ProxyOutputStream$Chunk$1 run
      WARNING: Failed to write to stream
      java.io.IOException: Pipe closed
              at java.io.PipedInputStream.checkStateForReceive(PipedInputStream.java:244)
              at java.io.PipedInputStream.receive(PipedInputStream.java:210)
              at java.io.PipedOutputStream.write(PipedOutputStream.java:132)
              at java.io.OutputStream.write(OutputStream.java:58)
              at hudson.util.DelegatingOutputStream.write(DelegatingOutputStream.java:51)
              at hudson.remoting.ProxyOutputStream$Chunk$1.run(ProxyOutputStream.java:185)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      

      This is currently causing issues with the perforce plugin (JENKINS-7664), and I've been able to reproduce it using the hudson_main_trunk#331 build on ci.hudson-labs.org as well.

          [JENKINS-7809] Remote Launcher randomly returns no data.

          I wonder if this was a different manifestation of JENKINS-7745, which we fixed in 1.397. I encourage people to upgrade to 1.397 and report their results.

          Kohsuke Kawaguchi added a comment - I wonder if this was a different manifestation of JENKINS-7745 , which we fixed in 1.397. I encourage people to upgrade to 1.397 and report their results.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          remoting/src/main/java/hudson/remoting/PipeWindow.java
          http://jenkins-ci.org/commit/core/7b40e2e091db073df18a27d27e2019a290fa3bee
          Log:
          being more defensive, in seeing JENKINS-7809 and the following stack trace:

          java.lang.NegativeArraySizeException
          at hudson.remoting.ProxyOutputStream$Chunk.<init>(ProxyOutputStream.java:175)
          at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:123)
          at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:103)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: remoting/src/main/java/hudson/remoting/PipeWindow.java http://jenkins-ci.org/commit/core/7b40e2e091db073df18a27d27e2019a290fa3bee Log: being more defensive, in seeing JENKINS-7809 and the following stack trace: java.lang.NegativeArraySizeException at hudson.remoting.ProxyOutputStream$Chunk.<init>(ProxyOutputStream.java:175) at hudson.remoting.ProxyOutputStream._write(ProxyOutputStream.java:123) at hudson.remoting.ProxyOutputStream.write(ProxyOutputStream.java:103)

          Rob Petti added a comment -

          I was able to reproduce JENKINS-7664 with that fix, so that doesn't seem to be the cause.

          In the perforce plugin, we are joining the RemoteProc created by the RemoteLauncher, and assuming that all the data has made it through the channel at that point. Unfortunately this doesn't seem to be the case, and occasionally you can run into a race condition where the process is joined, but there are still Chunks on their way through the Channel. Given that we close the stream right after the join, this generates exceptions and missing/truncated data.

          IMO, joining a RemoteProc should ensure that ALL data generated by that process has successfully made it through the channel before terminating. In the meantime, I've reimplemented RemoteLauncher in the perforce plugin to close the pipes on the slave end when it's LocalLauncher is finished. This will ensure that all the data makes it through (since the EOF is the last thing to be sent).

          What I can't figure out is why this doesn't seem to be affecting things like build task logs... We're closing the stream as fast as possible, but perhaps the build listener normally stays open for longer periods?

          Rob Petti added a comment - I was able to reproduce JENKINS-7664 with that fix, so that doesn't seem to be the cause. In the perforce plugin, we are joining the RemoteProc created by the RemoteLauncher, and assuming that all the data has made it through the channel at that point. Unfortunately this doesn't seem to be the case, and occasionally you can run into a race condition where the process is joined, but there are still Chunks on their way through the Channel. Given that we close the stream right after the join, this generates exceptions and missing/truncated data. IMO, joining a RemoteProc should ensure that ALL data generated by that process has successfully made it through the channel before terminating. In the meantime, I've reimplemented RemoteLauncher in the perforce plugin to close the pipes on the slave end when it's LocalLauncher is finished. This will ensure that all the data makes it through (since the EOF is the last thing to be sent). What I can't figure out is why this doesn't seem to be affecting things like build task logs... We're closing the stream as fast as possible, but perhaps the build listener normally stays open for longer periods?

          OK, I think I understand what's going on here.

          In Jenkins, the process I/O goes the opposite direction — we provide OutputStream that the process writes its stdout to, as opposed to drain the stream via read. And it doesn't close the stream that it writes to when the process has closed its stdout to indicate EOF. So the plugin like perforce that needs to read from stdout gets no EOF signal in the stream.

          The only way out in the current design is to close the output stream after joining the process, which is what you do in HudsonPipedOutputStream. But this only works if the data has been indeed written to the output stream before the join successfully returns.

          This is the case for local execution, as Proc.join() internally waits for the stream pumping threads to complete, but for remote execution, this isn't a guarantee enough — it merely means that the last bits of data has left the remote end to start its journey, but it still involves additional steps before the data actually gets written to locally exported OutputStream, hence the race condition.

          So the short term fix is to ensure that RemoteProc.join() does make sure that all the data arrived locally before returning. The long term proper fix is to allow code to read directly from stdout/stderr as InputStream, thereby eliminating this pseudo-EOF business.

          Kohsuke Kawaguchi added a comment - OK, I think I understand what's going on here. In Jenkins, the process I/O goes the opposite direction — we provide OutputStream that the process writes its stdout to, as opposed to drain the stream via read. And it doesn't close the stream that it writes to when the process has closed its stdout to indicate EOF. So the plugin like perforce that needs to read from stdout gets no EOF signal in the stream. The only way out in the current design is to close the output stream after joining the process, which is what you do in HudsonPipedOutputStream. But this only works if the data has been indeed written to the output stream before the join successfully returns. This is the case for local execution, as Proc.join() internally waits for the stream pumping threads to complete, but for remote execution, this isn't a guarantee enough — it merely means that the last bits of data has left the remote end to start its journey, but it still involves additional steps before the data actually gets written to locally exported OutputStream, hence the race condition. So the short term fix is to ensure that RemoteProc.join() does make sure that all the data arrived locally before returning. The long term proper fix is to allow code to read directly from stdout/stderr as InputStream, thereby eliminating this pseudo-EOF business.

          Rob Petti added a comment -

          No longer a blocking issue, as a work-around has been found.

          Rob Petti added a comment - No longer a blocking issue, as a work-around has been found.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/Launcher.java
          core/src/main/java/hudson/Proc.java
          remoting/src/main/java/hudson/remoting/Channel.java
          test/src/test/java/hudson/ProcTest.java
          http://jenkins-ci.org/commit/core/3a3690b28135287ed78d6a966a5632a16277f05b
          Log:
          [FIXED JENKINS-7809] fixed a race condition in obtaining the tail of the output from remote process.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/Proc.java remoting/src/main/java/hudson/remoting/Channel.java test/src/test/java/hudson/ProcTest.java http://jenkins-ci.org/commit/core/3a3690b28135287ed78d6a966a5632a16277f05b Log: [FIXED JENKINS-7809] fixed a race condition in obtaining the tail of the output from remote process.

          dogfood added a comment -

          Integrated in jenkins_main_trunk #514
          [FIXED JENKINS-7809] fixed a race condition in obtaining the tail of the output from remote process.

          Kohsuke Kawaguchi :
          Files :

          • test/src/test/java/hudson/ProcTest.java
          • core/src/main/java/hudson/Proc.java
          • changelog.html
          • core/src/main/java/hudson/Launcher.java
          • remoting/src/main/java/hudson/remoting/Channel.java

          dogfood added a comment - Integrated in jenkins_main_trunk #514 [FIXED JENKINS-7809] fixed a race condition in obtaining the tail of the output from remote process. Kohsuke Kawaguchi : Files : test/src/test/java/hudson/ProcTest.java core/src/main/java/hudson/Proc.java changelog.html core/src/main/java/hudson/Launcher.java remoting/src/main/java/hudson/remoting/Channel.java

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/Launcher.java
          core/src/main/java/hudson/Proc.java
          core/src/main/java/hudson/os/windows/WindowsRemoteLauncher.java
          core/src/test/java/hudson/LauncherTest.java
          remoting/src/main/java/hudson/remoting/RemoteInputStream.java
          test/src/main/java/org/jvnet/hudson/test/FakeLauncher.java
          test/src/test/java/hudson/ProcTest.java
          http://jenkins-ci.org/commit/core/96dd84b5d0d37e1765c3615bcba048eb62e38ad6
          Log:
          [FIXED JENKINS-7809] expose child process stdout as InputStream.

          I added a mode where one can launch a child process without the built-in pumping support, making it trivial to read until EOF in the distributed environments.

          This concludes the fix.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/Launcher.java core/src/main/java/hudson/Proc.java core/src/main/java/hudson/os/windows/WindowsRemoteLauncher.java core/src/test/java/hudson/LauncherTest.java remoting/src/main/java/hudson/remoting/RemoteInputStream.java test/src/main/java/org/jvnet/hudson/test/FakeLauncher.java test/src/test/java/hudson/ProcTest.java http://jenkins-ci.org/commit/core/96dd84b5d0d37e1765c3615bcba048eb62e38ad6 Log: [FIXED JENKINS-7809] expose child process stdout as InputStream. I added a mode where one can launch a child process without the built-in pumping support, making it trivial to read until EOF in the distributed environments. This concludes the fix.

          dogfood added a comment -

          dogfood added a comment - Integrated in jenkins_main_trunk #517

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          test/src/main/java/org/jvnet/hudson/test/FakeLauncher.java
          http://jenkins-ci.org/commit/jenkins-test-harness/d4fe2dcee80c02264fa5b687f72e2ea0e8af3aeb
          Log:
          [FIXED JENKINS-7809] expose child process stdout as InputStream.

          I added a mode where one can launch a child process without the built-in pumping support, making it trivial to read until EOF in the distributed environments.

          This concludes the fix.

          Originally-Committed-As: 96dd84b5d0d37e1765c3615bcba048eb62e38ad6

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: test/src/main/java/org/jvnet/hudson/test/FakeLauncher.java http://jenkins-ci.org/commit/jenkins-test-harness/d4fe2dcee80c02264fa5b687f72e2ea0e8af3aeb Log: [FIXED JENKINS-7809] expose child process stdout as InputStream. I added a mode where one can launch a child process without the built-in pumping support, making it trivial to read until EOF in the distributed environments. This concludes the fix. Originally-Committed-As: 96dd84b5d0d37e1765c3615bcba048eb62e38ad6

            kohsuke Kohsuke Kawaguchi
            rpetti Rob Petti
            Votes:
            37 Vote for this issue
            Watchers:
            39 Start watching this issue

              Created:
              Updated:
              Resolved: