Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62576

Websockets connection unstable since remoting 4.2.1 (LTS 2.222.4)

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • remoting
    • None

      Hi,

      Since we upgraded to Jenkins Core 2.222.4 (to include the fix JENKINS-61409) and remoting 4.2.1

      We are facing much more stability issue on the websocket connection. It was not the case before with remoting 4.2 (The only issues we faced was the large payload).

      We can observe now, disconnection on the middle of builds

      Connection break after a simple git checkout.

      [Pipeline] { (Git Checkout)
      [Pipeline] dir
      10:08:31  Running in /home/jenkins/agent/workspace/workspace/*****
      [Pipeline] {
      [Pipeline] checkout (hide)
      [Pipeline] }
      [Pipeline] // dir
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      10:08:45  ********* was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
      10:08:45  	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      10:08:45  	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      10:08:45  	at com.sun.proxy.$Proxy91.onWebSocketClose(Unknown Source)
      10:08:45  	at 
      

      On the agent (multiple exception)

      Jun 05, 2020 8:07:10 AM org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef lambda$static$0
      WARNING: null
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:*****": channel is already closed
              at hudson.remoting.Channel.send(Channel.java:760)
              at hudson.remoting.ProxyOutputStream.flush(ProxyOutputStream.java:155)
              at hudson.remoting.RemoteOutputStream.flush(RemoteOutputStream.java:112)
              at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
              at org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream.flush(DelayBufferedOutputStream.java:131)
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
              at org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef.lambda$static$0(GCFlushedOutputStream.java:77)
              at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:******": channel is already closed
              at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
              at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
              at sun.nio.ch.Invoker$2.run(Invoker.java:218)
              at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
              ... 3 more
      
      
      Jun 05, 2020 8:07:28 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
      SEVERE: Connection error has occurred
      java.io.IOException: Connection reset by peer
              at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
              at sun.nio.ch.IOUtil.read(IOUtil.java:197)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
              at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
              at java.lang.Thread.run(Thread.java:748)
      
      WARNING: LinkageError while performing UserRequest:hudson.FilePath$IsDirectory@4d6ab49
      java.lang.NoClassDefFoundError: hudson/util/io/Archiver
              at java.lang.Class.getDeclaredFields0(Native Method)
              at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
              at java.lang.Class.getDeclaredFields(Class.java:1916)
              at java.io.ObjectStreamClass.getDefaultSerialFields(ObjectStreamClass.java:1851)
              at java.io.ObjectStreamClass.getSerialFields(ObjectStreamClass.java:1773)
              at java.io.ObjectStreamClass.access$800(ObjectStreamClass.java:79)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:508)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:494)
              at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
              at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
              at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942)
              at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
              at hudson.remoting.UserRequest.deserialize(UserRequest.java:290)
              at hudson.remoting.UserRequest.perform(UserRequest.java:189)
              at hudson.remoting.UserRequest.perform(UserRequest.java:54)
              at hudson.remoting.Request$2.run(Request.java:369)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.ClassNotFoundException: hudson.util.io.Archiver
              at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:173)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
              ... 36 more
      

      Both the master and agent are running on JDK8

      Agent (A VM)

      openjdk version "1.8.0_252"
      OpenJDK Runtime Environment (build 1.8.0_252-b09)
      OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
      

      Master (Official docker container)

      openjdk version "1.8.0_242"
      OpenJDK Runtime Environment (build 1.8.0_242-b08)
      OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
      

      If you have any idea about what is causing the issue.

      PS : I didn't had time to test using latest jenkins version and remoting 4.3. I don't know if it will change anything
      PS2: Totally aware that Websocket are still in beta

      Thanks!

          [JENKINS-62576] Websockets connection unstable since remoting 4.2.1 (LTS 2.222.4)

          jonesbusy Does the same git checkout work if you put it at the beginning at the pipeline ? Trying to determine whether this is a timing or a payload issue.

          It would be great to try to reproduce the problem in a different environment (for example, running the checkout in an agent that is connected directly to your master) so that we can determine whether this could be a load balancer problem (or interaction with), or a general problem.

          Vincent Latombe added a comment - jonesbusy Does the same git checkout work if you put it at the beginning at the pipeline ? Trying to determine whether this is a timing or a payload issue. It would be great to try to reproduce the problem in a different environment (for example, running the checkout in an agent that is connected directly to your master) so that we can determine whether this could be a load balancer problem (or interaction with), or a general problem.

          Valentin Delaye added a comment - - edited

          The git checkout is already the first stage in the pipeline. After more and more test it seems to happen quite randomly. If i'm lucky I can pass the git checkout and reach the next stage, but it always fail after few seconds

          I've tried to remove traefik on the infra and it's the same (K8S NodePort instead of LoadBalancer)

          For example bellow when executing a script or a maven phase.

          Tomorrow I will try to remove the Azure Gateway, but I'm not still it will resolve the issue as it's working with previous

          Here more logs (30 seconds after the agent is connected, just trying to run an "echo hello" on the console. It just kill the connection...)

          Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
          SEVERE: Connection error has occurred
          java.io.IOException: Connection reset by peer
                  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
                  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
                  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
                  at sun.nio.ch.IOUtil.read(IOUtil.java:197)
                  at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388)
                  at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
                  at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
                  at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
                  at java.lang.Thread.run(Thread.java:748)
          
          Jun 08, 2020 7:09:10 PM hudson.remoting.Engine$1AgentEndpoint onClose
          FINE: onClose: CloseReason[1006,Closed abnormally.]
          Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Read side closed
          Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint close
          FINE: Close public void close(CloseReason cr): CloseReason[1000]
          Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Jun 08, 2020 7:09:10 PM hudson.remoting.UserRequest perform
          WARNING: LinkageError while performing UserRequest:hudson.util.RemotingDiagnostics$Script@20db97f8
          java.lang.ExceptionInInitializerError
                  at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66)
                  at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34)
                  at groovy.lang.Binding.<init>(Binding.java:35)
                  at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136)
                  at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
                  at hudson.remoting.UserRequest.perform(UserRequest.java:211)
                  at hudson.remoting.UserRequest.perform(UserRequest.java:54)
                  at hudson.remoting.Request$2.run(Request.java:369)
                  at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at java.lang.Thread.run(Thread.java:748)
          Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                  at hudson.remoting.Request.abort(Request.java:340)
                  at hudson.remoting.Channel.terminate(Channel.java:1081)
                  at hudson.remoting.Channel$1.terminate(Channel.java:619)
                  at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:315)
                  at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
                  at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
                  at sun.nio.ch.Invoker$2.run(Invoker.java:218)
                  at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  ... 1 more
                  Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to **************
                          at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1788)
                          at hudson.remoting.Request.call(Request.java:202)
                          at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:288)
                          at com.sun.proxy.$Proxy6.fetch3(Unknown Source)
                          at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:211)
                          at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
                          at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
                          at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createNormalMetaClass(MetaClassRegistry.java:171)
                          at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createWithCustomLookup(MetaClassRegistry.java:161)
                          at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.create(MetaClassRegistry.java:144)
                          at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:121)
                          at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:74)
                          at groovy.lang.GroovySystem.<clinit>(GroovySystem.java:36)
                          at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66)
                          at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34)
                          at groovy.lang.Binding.<init>(Binding.java:35)
                          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136)
                          at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
                          at hudson.remoting.UserRequest.perform(UserRequest.java:211)
                          at hudson.remoting.UserRequest.perform(UserRequest.java:54)
                          at hudson.remoting.Request$2.run(Request.java:369)
                          at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                          at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                          ... 1 more
          Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                  ... 19 more
          
          Jun 08, 2020 7:09:10 PM hudson.slaves.ChannelPinger$2 onClosed
          FINE: Terminating ping thread for **************
          Jun 08, 2020 7:09:10 PM hudson.remoting.PingThread run
          FINE: Ping thread for channel hudson.remoting.Channel@34d32620:************** is interrupted. Terminating
          Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0
          FINEST: ProxySelector Request for **************/login
          Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient New
          FINEST: Looking for HttpClient for URL **************/login and proxy value of DIRECT
          Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient <init>
          FINEST: Creating new HttpsClient with url:**************/login and proxy:DIRECT with connect timeout:-1
          Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0
          FINEST: Proxy used: DIRECT
          Jun 08, 2020 7:09:10 PM hudson.remoting.Request$2 run
          WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@4a8c475c
          hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                  at hudson.remoting.Channel.send(Channel.java:760)
                  at hudson.remoting.Request$2.run(Request.java:382)
                  at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                  at java.lang.Thread.run(Thread.java:748)
          Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                  at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
                  at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
                  at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
                  at sun.nio.ch.Invoker$2.run(Invoker.java:218)
                  at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                  ... 1 more
          

          Regards,

          Valentin Delaye added a comment - - edited The git checkout is already the first stage in the pipeline. After more and more test it seems to happen quite randomly. If i'm lucky I can pass the git checkout and reach the next stage, but it always fail after few seconds I've tried to remove traefik on the infra and it's the same (K8S NodePort instead of LoadBalancer) For example bellow when executing a script or a maven phase. Tomorrow I will try to remove the Azure Gateway, but I'm not still it will resolve the issue as it's working with previous Here more logs (30 seconds after the agent is connected, just trying to run an "echo hello" on the console. It just kill the connection...) Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError SEVERE: Connection error has occurred java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213) at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293) at java.lang. Thread .run( Thread .java:748) Jun 08, 2020 7:09:10 PM hudson.remoting.Engine$1AgentEndpoint onClose FINE: onClose: CloseReason[1006,Closed abnormally.] Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Read side closed Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint close FINE: Close public void close(CloseReason cr): CloseReason[1000] Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 08, 2020 7:09:10 PM hudson.remoting.UserRequest perform WARNING: LinkageError while performing UserRequest:hudson.util.RemotingDiagnostics$Script@20db97f8 java.lang.ExceptionInInitializerError at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66) at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34) at groovy.lang.Binding.<init>(Binding.java:35) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at java.lang. Thread .run( Thread .java:748) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Request.abort(Request.java:340) at hudson.remoting.Channel.terminate(Channel.java:1081) at hudson.remoting.Channel$1.terminate(Channel.java:619) at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:315) at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283) at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128) at sun.nio.ch.Invoker$2.run(Invoker.java:218) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to ************** at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1788) at hudson.remoting.Request.call(Request.java:202) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:288) at com.sun.proxy.$Proxy6.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:211) at java.lang. ClassLoader .loadClass( ClassLoader .java:418) at java.lang. ClassLoader .loadClass( ClassLoader .java:351) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createNormalMetaClass(MetaClassRegistry.java:171) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createWithCustomLookup(MetaClassRegistry.java:161) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.create(MetaClassRegistry.java:144) at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:121) at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:74) at groovy.lang.GroovySystem.<clinit>(GroovySystem.java:36) at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66) at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34) at groovy.lang.Binding.<init>(Binding.java:35) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) ... 1 more Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed ... 19 more Jun 08, 2020 7:09:10 PM hudson.slaves.ChannelPinger$2 onClosed FINE: Terminating ping thread for ************** Jun 08, 2020 7:09:10 PM hudson.remoting.PingThread run FINE: Ping thread for channel hudson.remoting.Channel@34d32620:************** is interrupted. Terminating Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0 FINEST: ProxySelector Request for **************/login Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient New FINEST: Looking for HttpClient for URL **************/login and proxy value of DIRECT Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient <init> FINEST: Creating new HttpsClient with url:**************/login and proxy:DIRECT with connect timeout:-1 Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0 FINEST: Proxy used: DIRECT Jun 08, 2020 7:09:10 PM hudson.remoting.Request$2 run WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@4a8c475c hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Channel.send(Channel.java:760) at hudson.remoting.Request$2.run(Request.java:382) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at java.lang. Thread .run( Thread .java:748) Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283) at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128) at sun.nio.ch.Invoker$2.run(Invoker.java:218) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Regards,

          Jesse Glick added a comment -

          Unfortunately Remoting disconnections are not generally diagnosable from logs alone, and there could be any number of root causes. It would be very helpful if there were a known way to reproduce the problem from scratch in a clean environment.

          Jesse Glick added a comment - Unfortunately Remoting disconnections are not generally diagnosable from logs alone, and there could be any number of root causes. It would be very helpful if there were a known way to reproduce the problem from scratch in a clean environment.

          Ok thanks, we will try to reproduce the issue on a clean infra

          Valentin Delaye added a comment - Ok thanks, we will try to reproduce the issue on a clean infra

          We were able to compare the Websocket vs the TCP with the same network component (Traefik 2 on Kubernetes which support HTTP and TCP routes)

          Without any activity (no job running)

          1) The TCP connection is completely stable after many days 

          INFO: Using Remoting version: 4.3 
          Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir 
          INFO: Using /home/jenkins/agent/workspace2/remoting as a remoting work directory 
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Locating server among [***************] 
          Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve 
          INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] 
          Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve 
          INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check 
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Agent discovery successful 
           Agent address: ***********
           Agent port:    **** 
           Identity:      **************
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Handshaking 
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Connecting to ****************
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Trying protocol: JNLP4-connect 
          Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Remote identity confirmed: *************
          Aug 07, 2020 5:31:52 PM hudson.remoting.jnlp.Main$CuiListener status 
          INFO: Connected
          

          2) The Websocket connection fail after few minutes (ping timeout)

          Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
          INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory
          Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
          INFO: Both error and output logs will be printed to /home/jenkins/agent/workspace1/remoting
          Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main createEngine
          INFO: Setting up agent: ws-agent-agent-01
          Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main$CuiListener <init>
          INFO: Jenkins agent is running in headless mode.
          Aug 07, 2020 5:31:58 PM hudson.remoting.Engine startEngine
          INFO: Using Remoting version: 4.3
          Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
          INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory
          Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: WebSocket connection open
          Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          Aug 07, 2020 5:51:02 PM hudson.slaves.ChannelPinger$1 onDead
          INFO: Ping failed. Terminating the channel ws-agent-agent-01.
          java.util.concurrent.TimeoutException: Ping started at 1596815222809 hasn't completed by 1596815462809
                  at hudson.remoting.PingThread.ping(PingThread.java:133)
                  at hudson.remoting.PingThread.run(PingThread.java:89)
          

          Sadly the JNLP4-connect doesn't support SNI, which prevent us to use a single port to connect our external agent

          Valentin Delaye added a comment - We were able to compare the Websocket vs the TCP with the same network component (Traefik 2 on Kubernetes which support HTTP and TCP routes) Without any activity (no job running) 1) The TCP connection is completely stable after many days  INFO: Using Remoting version: 4.3 Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace2/remoting as a remoting work directory Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [***************] Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful  Agent address: ***********  Agent port:    ****   Identity:      ************** Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to **************** Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: ************* Aug 07, 2020 5:31:52 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected 2) The Websocket connection fail after few minutes (ping timeout) Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Both error and output logs will be printed to /home/jenkins/agent/workspace1/remoting Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: ws-agent-agent-01 Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Aug 07, 2020 5:31:58 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 4.3 Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status INFO: WebSocket connection open Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Aug 07, 2020 5:51:02 PM hudson.slaves.ChannelPinger$1 onDead INFO: Ping failed. Terminating the channel ws-agent-agent-01. java.util.concurrent.TimeoutException: Ping started at 1596815222809 hasn't completed by 1596815462809 at hudson.remoting.PingThread.ping(PingThread.java:133) at hudson.remoting.PingThread.run(PingThread.java:89) Sadly the JNLP4-connect doesn't support SNI, which prevent us to use a single port to connect our external agent

          Jesse Glick added a comment -

          Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.

          Jesse Glick added a comment - Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.

          Jesse Glick added a comment -

          Maybe like JENKINS-64598?

          Jesse Glick added a comment - Maybe like JENKINS-64598 ?

          Prudhvi Godithi added a comment - - edited

           

          Hey we are using
          Jenkins version 2.249.2, Remoting version 4.9, kubernetes plugin version 1.27.5
          We are connecting agents with WebSocket protocol and occasionally we do see the following error. We have nginx Running on top of our Jenkins master, and again this we are seeing intermittently. Anything we can incorporate to fix this error, please let us know.
          Agents we have java versions from open jdk8, 12, 15

           

          INFO: Connected
           WARNING: An illegal reflective access operation has occurred
           WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields to field java.util.TreeMap.comparator
           WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields
           WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
           WARNING: All illegal access operations will be denied in a future release
           Oct 04, 2021 7:15:20 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
           SEVERE: Connection error has occurred
           java.io.IOException: Connection reset
                at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:421)
                at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:193)
                at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:215)
                at java.base/sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:306)|
                at java.base/java.lang.Thread.run(Thread.java:832)
          Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status
           INFO: Read side closed
           Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status
           INFO: Terminated
           Oct 04, 2021 7:15:21 AM hudson.remoting.Engine lambda$new$1
           SEVERE: Uncaught exception in Engine thread Thread[Thread-0,5,main]
           java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
                  at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:92)
                  at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)
                  at hudson.remoting.Engine.runWebSocket(Engine.java:687)
                  at hudson.remoting.Engine.run(Engine.java:496)
           Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
                  at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)
                  at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)
                  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
                 at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
          ... 4 more
          

           

           

          Prudhvi Godithi added a comment - - edited   Hey we are using Jenkins version 2.249.2, Remoting version 4.9, kubernetes plugin version 1.27.5 We are connecting agents with WebSocket protocol and occasionally we do see the following error. We have nginx Running on top of our Jenkins master, and again this we are seeing intermittently. Anything we can incorporate to fix this error, please let us know. Agents we have java versions from open jdk8, 12, 15   INFO: Connected WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields to field java.util.TreeMap.comparator WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Oct 04, 2021 7:15:20 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError SEVERE: Connection error has occurred java.io.IOException: Connection reset      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:421)      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:193)      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:215)      at java.base/sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:306)|      at java.base/java.lang. Thread .run( Thread .java:832) Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Read side closed Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Oct 04, 2021 7:15:21 AM hudson.remoting.Engine lambda$ new $1 SEVERE: Uncaught exception in Engine thread Thread [ Thread -0,5,main] java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:92)        at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)        at hudson.remoting.Engine.runWebSocket(Engine.java:687)        at hudson.remoting.Engine.run(Engine.java:496) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)        at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)        at java.base/java.lang. ClassLoader .loadClass( ClassLoader .java:589)       at java.base/java.lang. ClassLoader .loadClass( ClassLoader .java:522) ... 4 more    

          dor s added a comment - - edited

          I have got the same error  

          02:43:59  jenkins-agent-***** was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
          02:43:59  	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:142)
          02:43:59  	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          02:43:59  	at com.sun.proxy.$Proxy101.onWebSocketClose(Unknown Source)
          02:43:59  	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          02:43:59  	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          02:43:59  	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          02:43:59  	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          02:43:59  	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          02:43:59  	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          02:43:59  	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          02:43:59  	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          02:43:59  	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
          02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
          02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
          02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
          02:43:59  	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
          02:43:59  	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
          02:43:59  	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
          02:43:59  	at java.base/java.lang.Thread.run(Thread.java:829)
          
          Error when executing always post condition:
          java.io.IOException: Unable to create live FilePath for jenkins-agent-*****
          	at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64)
          	at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47)
          	at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)
          	at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139)
          	at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)
          	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)
          	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)
          	at org.jenkinsci.plugins.credentialsbinding.impl.BindingStep$Execution2.doStart(BindingStep.java:123)
          	at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77)
          	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          	at java.base/java.lang.Thread.run(Thread.java:829)
          

           

          In my case, I have a declarative pipeline that starts on a k8s agent-pod then jumps to a Win10 and then return to the k8s agent-pod

          pipeline {
              agent {
                  label 'agent-pod'
              }
              ...
              stages {
                  stage('Init') {
                      steps {
                          echo "init"
                      }
                  }
                  stage('Tests'){
                      agent { 
                          label 'Win10'
                      }
                      steps {
                          script {
                              echo "run test"
                              stash name: 'test_report', includes: "test_report.zip"
                          }
                      }
                  }
              }
              post {
                  always {
                      script { 
                          unstash "test_report"
                      }
                  }
              }
          }

          The task in the Win10 Agent takes at least 4 Hours, and when it return to the agent-pod it's getting the error above

          any idea why? 

           

           

          dor s added a comment - - edited I have got the same error   02:43:59 jenkins-agent-***** was marked offline: Connection was broken: java.nio.channels.ClosedChannelException 02:43:59 at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:142) 02:43:59 at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91) 02:43:59 at com.sun.proxy.$Proxy101.onWebSocketClose(Unknown Source) 02:43:59 at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149) 02:43:59 at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316) 02:43:59 at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280) 02:43:59 at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293) 02:43:59 at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193) 02:43:59 at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) 02:43:59 at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440) 02:43:59 at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) 02:43:59 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) 02:43:59 at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) 02:43:59 at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383) 02:43:59 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) 02:43:59 at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) 02:43:59 at java.base/java.lang. Thread .run( Thread .java:829) Error when executing always post condition: java.io.IOException: Unable to create live FilePath for jenkins-agent-***** at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64) at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47) at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94) at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139) at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75) at org.jenkinsci.plugins.credentialsbinding.impl.BindingStep$Execution2.doStart(BindingStep.java:123) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:829)   In my case, I have a declarative pipeline that starts on a k8s agent-pod then jumps to a Win10 and then return to the k8s agent-pod pipeline { agent { label 'agent-pod' } ... stages { stage( 'Init' ) { steps { echo "init" } } stage( 'Tests' ){ agent { label 'Win10' } steps { script { echo "run test" stash name: 'test_report' , includes: "test_report.zip" } } } } post { always { script { unstash "test_report" } } } } The task in the Win10 Agent takes at least 4 Hours, and when it return to the agent-pod it's getting the error above any idea why?     

          Timm Korte added a comment - - edited

          We had the same issue (broken connection to agents) - and in the end an update of traefik from v1 to a current v2 seems to have fixed the issue.

          On that note, there seems to be a deadlock in the current agents connection handling - it just doesn't join the thread of the broken connection and because of that doesn't automatically re-connect in case of a broken connection like that originally mentioned in this ticket.

          In case this is relevant for someone else:

          The websocket connection seems to be created at: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L547

          At connect, we get the "WebSocket connection open" from https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L593

          at connection failure (apparently initiated by traefic towards both jenkins master and agent) we get the "onClose" from https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L616

          .. but after that, the log just stops and there are no further messages added to the log on agent side.

          The source should be doing a "transport.terminate" (from my understanding with sort of atomic/lock (I'm not a Java-Dev by trade)) https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L617 but that doesn't happen/complete.

          Those are the "inner" callbacks of the websocket connection: on the outer level, we have the control loop  -which printed the "Connected" in the first place https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L662

          but after that, the main loop is waiting for the connection to be terminated via "ch.get().join()" to print a "Terminated" - but that also never happened as the websocket itself seems to be locked up at this point.

           

           

          Timm Korte added a comment - - edited We had the same issue (broken connection to agents) - and in the end an update of traefik from v1 to a current v2 seems to have fixed the issue. On that note, there seems to be a deadlock in the current agents connection handling - it just doesn't join the thread of the broken connection and because of that doesn't automatically re-connect in case of a broken connection like that originally mentioned in this ticket. In case this is relevant for someone else: The websocket connection seems to be created at: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L547 At connect, we get the "WebSocket connection open" from  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L593 at connection failure (apparently initiated by traefic towards both jenkins master and agent) we get the "onClose" from  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L616 .. but after that, the log just stops and there are no further messages added to the log on agent side. The source should be doing a "transport.terminate" (from my understanding with sort of atomic/lock (I'm not a Java-Dev by trade))  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L617  but that doesn't happen/complete. Those are the "inner" callbacks of the websocket connection: on the outer level, we have the control loop  -which printed the "Connected" in the first place  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L662 but after that, the main loop is waiting for the connection to be terminated via "ch.get().join()" to print a "Terminated" - but that also never happened as the websocket itself seems to be locked up at this point.    

            Unassigned Unassigned
            jonesbusy Valentin Delaye
            Votes:
            3 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated: