Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62999

INFO: Failed to synchronize IO streams on the channel

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Jenkins 2.235.1, AWS EC2 Plugin, Java 8
    • Similar Issues:

      Description

      I'm seeing the following error on our agents.  When this occurs the job hangs and does not make further progress.  We have to disconnect and reconnect the agent for the job to continue to move forward.  On the agent we are seeing the following in the logs.

       

      Jul 07, 2020 6:45:12 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warnJul 07, 2020 6:45:12 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warnWARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.pipeline.utility.steps.fs.TeeStep$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/Jul 07, 2020 7:45:42 PM hudson.Launcher$RemoteLaunchCallable$1 joinINFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@ed17bee:channeljava.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:997) at hudson.remoting.Channel.syncIO(Channel.java:1730) at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1328) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Jul 07, 2020 7:46:12 PM hudson.Launcher$RemoteLaunchCallable$1 joinINFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@ed17bee:channeljava.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:997) at hudson.remoting.Channel.syncIO(Channel.java:1730) at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1328) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      

       

      There are several more of the last error message in the logs with the same back trace.

        Attachments

          Activity

          Hide
          simaspenser Spenser Gilliland added a comment -

          So, after removing the "tee" step from my Jenkinsfile.  I'm no longer seeing this problem.  

          Show
          simaspenser Spenser Gilliland added a comment - So, after removing the "tee" step from my Jenkinsfile.  I'm no longer seeing this problem.  
          Hide
          jthompson Jeff Thompson added a comment -

          Interesting. I'm glad you were able to figure something out, to at least start to troubleshoot this. Often things that look like Remoting failures are caused by plugins (or various environmental, system, or configuration issues). Sometimes it's an interaction between different plugins or plugin operations that causes problems. These can be very difficult to diagnose and reproduce.

          If you can generate a simple reproduction case, someone might be able to take a look at it. Or you can dig into the details yourself.

          My guess is that something is going on with the tee operation and it's getting hung up. That then leads to other problems.

          Show
          jthompson Jeff Thompson added a comment - Interesting. I'm glad you were able to figure something out, to at least start to troubleshoot this. Often things that look like Remoting failures are caused by plugins (or various environmental, system, or configuration issues). Sometimes it's an interaction between different plugins or plugin operations that causes problems. These can be very difficult to diagnose and reproduce. If you can generate a simple reproduction case, someone might be able to take a look at it. Or you can dig into the details yourself. My guess is that something is going on with the tee operation and it's getting hung up. That then leads to other problems.
          Hide
          jthompson Jeff Thompson added a comment -

          [I've edited the original description to update to the accepted terminology. Please use "agent" as the accepted term.]

          Show
          jthompson Jeff Thompson added a comment - [I've edited the original description to update to the accepted terminology. Please use "agent" as the accepted term.]
          Hide
          simaspenser Spenser Gilliland added a comment -

          Thanks for taking a look at this.  I haven't verified this but we have a shared library function called record which looks like this.

           

          /**
           * Record a function and archive it's log
           *
           * @param filename filename to store the log in
           * @param func function to record
           * @param abort_on_failure propagate error on failure
           */
          def record(filename, func, abort_on_failure = false) {
           try {
           //tee(filename) {
           func()
           //}
           } catch(FlowInterruptedException e) {
           // If Timeout then throw
           throw e
           } catch (e) {
           unstable(e.toString())
           if (abort_on_failure) throw e
           } finally {
           //archiveArtifacts artifacts: filename
           }
          }

           

          To work around the problem, you can see how we commented out the "tee" function (and archiveArtifact).  This was hanging only when we are pushing a lot of console output from "func()" (this happened during a failing test which had too verbose of logging and no limiter). After removing the "tee", this function was able to push a 8GB log file without issue.  Typically this is the lowest level function in a stack that looks something like.

          try {
            node(...) {
              image.inside(...) {
                utils.record('test-0.log') {
                  sh "pytest"
                }
              }
            }
          } catch(e) {
            emailext(...)
            throw e
          }
              

           

           

          Show
          simaspenser Spenser Gilliland added a comment - Thanks for taking a look at this.  I haven't verified this but we have a shared library function called record which looks like this.   /** * Record a function and archive it's log * * @param filename filename to store the log in * @param func function to record * @param abort_on_failure propagate error on failure */ def record(filename, func, abort_on_failure = false ) { try { //tee(filename) { func() //} } catch (FlowInterruptedException e) { // If Timeout then throw throw e } catch (e) { unstable(e.toString()) if (abort_on_failure) throw e } finally { //archiveArtifacts artifacts: filename } }   To work around the problem, you can see how we commented out the "tee" function (and archiveArtifact).  This was hanging only when we are pushing a lot of console output from "func()" (this happened during a failing test which had too verbose of logging and no limiter). After removing the "tee", this function was able to push a 8GB log file without issue.  Typically this is the lowest level function in a stack that looks something like. try { node(...) { image.inside(...) { utils.record( 'test-0.log' ) { sh "pytest" } } } } catch (e) { emailext(...) throw e }    

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            spenser309 Spenser Gilliland
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated: