Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62999

INFO: Failed to synchronize IO streams on the channel

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Jenkins 2.235.1, AWS EC2 Plugin, Java 8
    • Similar Issues:

      Description

      I'm seeing the following error on our agents.  When this occurs the job hangs and does not make further progress.  We have to disconnect and reconnect the agent for the job to continue to move forward.  On the agent we are seeing the following in the logs.

       

      Jul 07, 2020 6:45:12 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warnJul 07, 2020 6:45:12 PM org.jenkinsci.remoting.util.AnonymousClassWarnings warnWARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.pipeline.utility.steps.fs.TeeStep$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/Jul 07, 2020 7:45:42 PM hudson.Launcher$RemoteLaunchCallable$1 joinINFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@ed17bee:channeljava.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:997) at hudson.remoting.Channel.syncIO(Channel.java:1730) at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1328) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Jul 07, 2020 7:46:12 PM hudson.Launcher$RemoteLaunchCallable$1 joinINFO: Failed to synchronize IO streams on the channel hudson.remoting.Channel@ed17bee:channeljava.lang.InterruptedException at java.lang.Object.wait(Native Method) at hudson.remoting.Request.call(Request.java:177) at hudson.remoting.Channel.call(Channel.java:997) at hudson.remoting.Channel.syncIO(Channel.java:1730) at hudson.Launcher$RemoteLaunchCallable$1.join(Launcher.java:1328) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.remoting.RemoteInvocationHandler$RPCRequest.perform(RemoteInvocationHandler.java:931) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:905) at hudson.remoting.RemoteInvocationHandler$RPCRequest.call(RemoteInvocationHandler.java:857) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      

       

      There are several more of the last error message in the logs with the same back trace.

        Attachments

          Activity

          Hide
          simaspenser Spenser Gilliland added a comment -

          So, after removing the "tee" step from my Jenkinsfile.  I'm no longer seeing this problem.  

          Show
          simaspenser Spenser Gilliland added a comment - So, after removing the "tee" step from my Jenkinsfile.  I'm no longer seeing this problem.  
          Hide
          jthompson Jeff Thompson added a comment -

          Interesting. I'm glad you were able to figure something out, to at least start to troubleshoot this. Often things that look like Remoting failures are caused by plugins (or various environmental, system, or configuration issues). Sometimes it's an interaction between different plugins or plugin operations that causes problems. These can be very difficult to diagnose and reproduce.

          If you can generate a simple reproduction case, someone might be able to take a look at it. Or you can dig into the details yourself.

          My guess is that something is going on with the tee operation and it's getting hung up. That then leads to other problems.

          Show
          jthompson Jeff Thompson added a comment - Interesting. I'm glad you were able to figure something out, to at least start to troubleshoot this. Often things that look like Remoting failures are caused by plugins (or various environmental, system, or configuration issues). Sometimes it's an interaction between different plugins or plugin operations that causes problems. These can be very difficult to diagnose and reproduce. If you can generate a simple reproduction case, someone might be able to take a look at it. Or you can dig into the details yourself. My guess is that something is going on with the tee operation and it's getting hung up. That then leads to other problems.
          Hide
          jthompson Jeff Thompson added a comment -

          [I've edited the original description to update to the accepted terminology. Please use "agent" as the accepted term.]

          Show
          jthompson Jeff Thompson added a comment - [I've edited the original description to update to the accepted terminology. Please use "agent" as the accepted term.]
          Hide
          simaspenser Spenser Gilliland added a comment -

          Thanks for taking a look at this.  I haven't verified this but we have a shared library function called record which looks like this.

           

          /**
           * Record a function and archive it's log
           *
           * @param filename filename to store the log in
           * @param func function to record
           * @param abort_on_failure propagate error on failure
           */
          def record(filename, func, abort_on_failure = false) {
           try {
           //tee(filename) {
           func()
           //}
           } catch(FlowInterruptedException e) {
           // If Timeout then throw
           throw e
           } catch (e) {
           unstable(e.toString())
           if (abort_on_failure) throw e
           } finally {
           //archiveArtifacts artifacts: filename
           }
          }

           

          To work around the problem, you can see how we commented out the "tee" function (and archiveArtifact).  This was hanging only when we are pushing a lot of console output from "func()" (this happened during a failing test which had too verbose of logging and no limiter). After removing the "tee", this function was able to push a 8GB log file without issue.  Typically this is the lowest level function in a stack that looks something like.

          try {
            node(...) {
              image.inside(...) {
                utils.record('test-0.log') {
                  sh "pytest"
                }
              }
            }
          } catch(e) {
            emailext(...)
            throw e
          }
              

           

           

          Show
          simaspenser Spenser Gilliland added a comment - Thanks for taking a look at this.  I haven't verified this but we have a shared library function called record which looks like this.   /** * Record a function and archive it's log * * @param filename filename to store the log in * @param func function to record * @param abort_on_failure propagate error on failure */ def record(filename, func, abort_on_failure = false ) { try { //tee(filename) { func() //} } catch (FlowInterruptedException e) { // If Timeout then throw throw e } catch (e) { unstable(e.toString()) if (abort_on_failure) throw e } finally { //archiveArtifacts artifacts: filename } }   To work around the problem, you can see how we commented out the "tee" function (and archiveArtifact).  This was hanging only when we are pushing a lot of console output from "func()" (this happened during a failing test which had too verbose of logging and no limiter). After removing the "tee", this function was able to push a 8GB log file without issue.  Typically this is the lowest level function in a stack that looks something like. try { node(...) { image.inside(...) { utils.record( 'test-0.log' ) { sh "pytest" } } } } catch (e) { emailext(...) throw e }    
          Hide
          reddwarf94 Cristian added a comment -

          FWIW I was able to reproduce the issue. Simply replacing the tee Jenkins step with the tee shell command fixed the issue. A few things I have noticed, which may be real or in my imagination:

          • It seems to happen when there is a lot of data to "tee"
          • It may happen more in low bandwidth situations
          • I could see the agent was starting to send ~13 Mbps of data constantly, until the job was stopped.
          • "Attempt to (de-)serialize anonymous class" is the only thing I see at first. I don't see the whole "Failed to synchronize IO streams on the channel" thing until I stop the job
          • Other jobs started after this happened to show strange behaviour. For example, the last X lines of a step log starting to appear in a loop (so the jobs never finishing)

          My setup was with two non-virtualized machines connected via SSH with the latest LTS Jenkins and up to date plugins, using Java 11 in both controller and agent.

           

          My impression is that the tee step has problems when it receives data faster than it can send it.

          Show
          reddwarf94 Cristian added a comment - FWIW I was able to reproduce the issue. Simply replacing the tee Jenkins step with the tee shell command fixed the issue. A few things I have noticed, which may be real or in my imagination: It seems to happen when there is a lot of data to "tee" It may happen more in low bandwidth situations I could see the agent was starting to send ~13 Mbps of data constantly, until the job was stopped. "Attempt to (de-)serialize anonymous class" is the only thing I see at first. I don't see the whole "Failed to synchronize IO streams on the channel" thing until I stop the job Other jobs started after this happened to show strange behaviour. For example, the last X lines of a step log starting to appear in a loop (so the jobs never finishing) My setup was with two non-virtualized machines connected via SSH with the latest LTS Jenkins and up to date plugins, using Java 11 in both controller and agent.   My impression is that the tee step has problems when it receives data faster than it can send it.

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            spenser309 Spenser Gilliland
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated: