Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62576

Websockets connection unstable since remoting 4.2.1 (LTS 2.222.4)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Component/s: remoting
    • Labels:
      None
    • Environment:
    • Similar Issues:

      Description

      Hi,

      Since we upgraded to Jenkins Core 2.222.4 (to include the fix JENKINS-61409) and remoting 4.2.1

      We are facing much more stability issue on the websocket connection. It was not the case before with remoting 4.2 (The only issues we faced was the large payload).

      We can observe now, disconnection on the middle of builds

      Connection break after a simple git checkout.

      [Pipeline] { (Git Checkout)
      [Pipeline] dir
      10:08:31  Running in /home/jenkins/agent/workspace/workspace/*****
      [Pipeline] {
      [Pipeline] checkout (hide)
      [Pipeline] }
      [Pipeline] // dir
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      10:08:45  ********* was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
      10:08:45  	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      10:08:45  	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      10:08:45  	at com.sun.proxy.$Proxy91.onWebSocketClose(Unknown Source)
      10:08:45  	at 
      

      On the agent (multiple exception)

      Jun 05, 2020 8:07:10 AM org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef lambda$static$0
      WARNING: null
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:*****": channel is already closed
              at hudson.remoting.Channel.send(Channel.java:760)
              at hudson.remoting.ProxyOutputStream.flush(ProxyOutputStream.java:155)
              at hudson.remoting.RemoteOutputStream.flush(RemoteOutputStream.java:112)
              at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
              at org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream.flush(DelayBufferedOutputStream.java:131)
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
              at org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef.lambda$static$0(GCFlushedOutputStream.java:77)
              at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:******": channel is already closed
              at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
              at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
              at sun.nio.ch.Invoker$2.run(Invoker.java:218)
              at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
              ... 3 more
      
      
      Jun 05, 2020 8:07:28 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
      SEVERE: Connection error has occurred
      java.io.IOException: Connection reset by peer
              at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
              at sun.nio.ch.IOUtil.read(IOUtil.java:197)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
              at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
              at java.lang.Thread.run(Thread.java:748)
      
      WARNING: LinkageError while performing UserRequest:hudson.FilePath$IsDirectory@4d6ab49
      java.lang.NoClassDefFoundError: hudson/util/io/Archiver
              at java.lang.Class.getDeclaredFields0(Native Method)
              at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
              at java.lang.Class.getDeclaredFields(Class.java:1916)
              at java.io.ObjectStreamClass.getDefaultSerialFields(ObjectStreamClass.java:1851)
              at java.io.ObjectStreamClass.getSerialFields(ObjectStreamClass.java:1773)
              at java.io.ObjectStreamClass.access$800(ObjectStreamClass.java:79)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:508)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:494)
              at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
              at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
              at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942)
              at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
              at hudson.remoting.UserRequest.deserialize(UserRequest.java:290)
              at hudson.remoting.UserRequest.perform(UserRequest.java:189)
              at hudson.remoting.UserRequest.perform(UserRequest.java:54)
              at hudson.remoting.Request$2.run(Request.java:369)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.ClassNotFoundException: hudson.util.io.Archiver
              at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:173)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
              ... 36 more
      

      Both the master and agent are running on JDK8

      Agent (A VM)

      openjdk version "1.8.0_252"
      OpenJDK Runtime Environment (build 1.8.0_252-b09)
      OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
      

      Master (Official docker container)

      openjdk version "1.8.0_242"
      OpenJDK Runtime Environment (build 1.8.0_242-b08)
      OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
      

      If you have any idea about what is causing the issue.

      PS : I didn't had time to test using latest jenkins version and remoting 4.3. I don't know if it will change anything
      PS2: Totally aware that Websocket are still in beta

      Thanks!

        Attachments

          Issue Links

            Activity

            Hide
            jonesbusy Valentin Delaye added a comment - - edited

            The git checkout is already the first stage in the pipeline. After more and more test it seems to happen quite randomly. If i'm lucky I can pass the git checkout and reach the next stage, but it always fail after few seconds

            I've tried to remove traefik on the infra and it's the same (K8S NodePort instead of LoadBalancer)

            For example bellow when executing a script or a maven phase.

            Tomorrow I will try to remove the Azure Gateway, but I'm not still it will resolve the issue as it's working with previous

            Here more logs (30 seconds after the agent is connected, just trying to run an "echo hello" on the console. It just kill the connection...)

            Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
            SEVERE: Connection error has occurred
            java.io.IOException: Connection reset by peer
                    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
                    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
                    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
                    at sun.nio.ch.IOUtil.read(IOUtil.java:197)
                    at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388)
                    at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
                    at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
                    at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
                    at java.lang.Thread.run(Thread.java:748)
            
            Jun 08, 2020 7:09:10 PM hudson.remoting.Engine$1AgentEndpoint onClose
            FINE: onClose: CloseReason[1006,Closed abnormally.]
            Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Read side closed
            Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint close
            FINE: Close public void close(CloseReason cr): CloseReason[1000]
            Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Terminated
            Jun 08, 2020 7:09:10 PM hudson.remoting.UserRequest perform
            WARNING: LinkageError while performing UserRequest:hudson.util.RemotingDiagnostics$Script@20db97f8
            java.lang.ExceptionInInitializerError
                    at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66)
                    at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34)
                    at groovy.lang.Binding.<init>(Binding.java:35)
                    at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136)
                    at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
                    at hudson.remoting.UserRequest.perform(UserRequest.java:211)
                    at hudson.remoting.UserRequest.perform(UserRequest.java:54)
                    at hudson.remoting.Request$2.run(Request.java:369)
                    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                    at java.lang.Thread.run(Thread.java:748)
            Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                    at hudson.remoting.Request.abort(Request.java:340)
                    at hudson.remoting.Channel.terminate(Channel.java:1081)
                    at hudson.remoting.Channel$1.terminate(Channel.java:619)
                    at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:315)
                    at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
                    at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
                    at sun.nio.ch.Invoker$2.run(Invoker.java:218)
                    at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    ... 1 more
                    Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to **************
                            at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1788)
                            at hudson.remoting.Request.call(Request.java:202)
                            at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:288)
                            at com.sun.proxy.$Proxy6.fetch3(Unknown Source)
                            at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:211)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
                            at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createNormalMetaClass(MetaClassRegistry.java:171)
                            at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createWithCustomLookup(MetaClassRegistry.java:161)
                            at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.create(MetaClassRegistry.java:144)
                            at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:121)
                            at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:74)
                            at groovy.lang.GroovySystem.<clinit>(GroovySystem.java:36)
                            at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66)
                            at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34)
                            at groovy.lang.Binding.<init>(Binding.java:35)
                            at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136)
                            at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114)
                            at hudson.remoting.UserRequest.perform(UserRequest.java:211)
                            at hudson.remoting.UserRequest.perform(UserRequest.java:54)
                            at hudson.remoting.Request$2.run(Request.java:369)
                            at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                            at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                            ... 1 more
            Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                    ... 19 more
            
            Jun 08, 2020 7:09:10 PM hudson.slaves.ChannelPinger$2 onClosed
            FINE: Terminating ping thread for **************
            Jun 08, 2020 7:09:10 PM hudson.remoting.PingThread run
            FINE: Ping thread for channel hudson.remoting.Channel@34d32620:************** is interrupted. Terminating
            Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0
            FINEST: ProxySelector Request for **************/login
            Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient New
            FINEST: Looking for HttpClient for URL **************/login and proxy value of DIRECT
            Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient <init>
            FINEST: Creating new HttpsClient with url:**************/login and proxy:DIRECT with connect timeout:-1
            Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0
            FINEST: Proxy used: DIRECT
            Jun 08, 2020 7:09:10 PM hudson.remoting.Request$2 run
            WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@4a8c475c
            hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                    at hudson.remoting.Channel.send(Channel.java:760)
                    at hudson.remoting.Request$2.run(Request.java:382)
                    at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
                    at java.lang.Thread.run(Thread.java:748)
            Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************": channel is already closed
                    at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
                    at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
                    at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
                    at sun.nio.ch.Invoker$2.run(Invoker.java:218)
                    at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                    ... 1 more
            

            Regards,

            Show
            jonesbusy Valentin Delaye added a comment - - edited The git checkout is already the first stage in the pipeline. After more and more test it seems to happen quite randomly. If i'm lucky I can pass the git checkout and reach the next stage, but it always fail after few seconds I've tried to remove traefik on the infra and it's the same (K8S NodePort instead of LoadBalancer) For example bellow when executing a script or a maven phase. Tomorrow I will try to remove the Azure Gateway, but I'm not still it will resolve the issue as it's working with previous Here more logs (30 seconds after the agent is connected, just trying to run an "echo hello" on the console. It just kill the connection...) Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError SEVERE: Connection error has occurred java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191) at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213) at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293) at java.lang. Thread .run( Thread .java:748) Jun 08, 2020 7:09:10 PM hudson.remoting.Engine$1AgentEndpoint onClose FINE: onClose: CloseReason[1006,Closed abnormally.] Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Read side closed Jun 08, 2020 7:09:10 PM io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusRemoteEndpoint close FINE: Close public void close(CloseReason cr): CloseReason[1000] Jun 08, 2020 7:09:10 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Jun 08, 2020 7:09:10 PM hudson.remoting.UserRequest perform WARNING: LinkageError while performing UserRequest:hudson.util.RemotingDiagnostics$Script@20db97f8 java.lang.ExceptionInInitializerError at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66) at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34) at groovy.lang.Binding.<init>(Binding.java:35) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at java.lang. Thread .run( Thread .java:748) Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Request.abort(Request.java:340) at hudson.remoting.Channel.terminate(Channel.java:1081) at hudson.remoting.Channel$1.terminate(Channel.java:619) at hudson.remoting.AbstractByteBufferCommandTransport.terminate(AbstractByteBufferCommandTransport.java:315) at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283) at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128) at sun.nio.ch.Invoker$2.run(Invoker.java:218) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to ************** at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1788) at hudson.remoting.Request.call(Request.java:202) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:288) at com.sun.proxy.$Proxy6.fetch3(Unknown Source) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:211) at java.lang. ClassLoader .loadClass( ClassLoader .java:418) at java.lang. ClassLoader .loadClass( ClassLoader .java:351) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createNormalMetaClass(MetaClassRegistry.java:171) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.createWithCustomLookup(MetaClassRegistry.java:161) at groovy.lang.MetaClassRegistry$MetaClassCreationHandle.create(MetaClassRegistry.java:144) at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:121) at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.<init>(MetaClassRegistryImpl.java:74) at groovy.lang.GroovySystem.<clinit>(GroovySystem.java:36) at org.codehaus.groovy.runtime.InvokerHelper.<clinit>(InvokerHelper.java:66) at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34) at groovy.lang.Binding.<init>(Binding.java:35) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:136) at hudson.util.RemotingDiagnostics$Script.call(RemotingDiagnostics.java:114) at hudson.remoting.UserRequest.perform(UserRequest.java:211) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) ... 1 more Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed ... 19 more Jun 08, 2020 7:09:10 PM hudson.slaves.ChannelPinger$2 onClosed FINE: Terminating ping thread for ************** Jun 08, 2020 7:09:10 PM hudson.remoting.PingThread run FINE: Ping thread for channel hudson.remoting.Channel@34d32620:************** is interrupted. Terminating Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0 FINEST: ProxySelector Request for **************/login Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient New FINEST: Looking for HttpClient for URL **************/login and proxy value of DIRECT Jun 08, 2020 7:09:10 PM sun.net.www.protocol.https.HttpsClient <init> FINEST: Creating new HttpsClient with url:**************/login and proxy:DIRECT with connect timeout:-1 Jun 08, 2020 7:09:10 PM sun.net.www.protocol.http.HttpURLConnection plainConnect0 FINEST: Proxy used: DIRECT Jun 08, 2020 7:09:10 PM hudson.remoting.Request$2 run WARNING: Failed to send back a reply to the request hudson.remoting.Request$2@4a8c475c hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Channel.send(Channel.java:760) at hudson.remoting.Request$2.run(Request.java:382) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117) at java.lang. Thread .run( Thread .java:748) Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@34d32620:**************" : channel is already closed at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469) at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283) at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128) at sun.nio.ch.Invoker$2.run(Invoker.java:218) at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Regards,
            Hide
            jglick Jesse Glick added a comment -

            Unfortunately Remoting disconnections are not generally diagnosable from logs alone, and there could be any number of root causes. It would be very helpful if there were a known way to reproduce the problem from scratch in a clean environment.

            Show
            jglick Jesse Glick added a comment - Unfortunately Remoting disconnections are not generally diagnosable from logs alone, and there could be any number of root causes. It would be very helpful if there were a known way to reproduce the problem from scratch in a clean environment.
            Hide
            jonesbusy Valentin Delaye added a comment -

            Ok thanks, we will try to reproduce the issue on a clean infra

            Show
            jonesbusy Valentin Delaye added a comment - Ok thanks, we will try to reproduce the issue on a clean infra
            Hide
            jonesbusy Valentin Delaye added a comment -

            We were able to compare the Websocket vs the TCP with the same network component (Traefik 2 on Kubernetes which support HTTP and TCP routes)

            Without any activity (no job running)

            1) The TCP connection is completely stable after many days 

            INFO: Using Remoting version: 4.3 
            Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir 
            INFO: Using /home/jenkins/agent/workspace2/remoting as a remoting work directory 
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Locating server among [***************] 
            Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve 
            INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] 
            Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve 
            INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check 
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Agent discovery successful 
             Agent address: ***********
             Agent port:    **** 
             Identity:      **************
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Handshaking 
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Connecting to ****************
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Trying protocol: JNLP4-connect 
            Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Remote identity confirmed: *************
            Aug 07, 2020 5:31:52 PM hudson.remoting.jnlp.Main$CuiListener status 
            INFO: Connected
            

            2) The Websocket connection fail after few minutes (ping timeout)

            Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
            INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory
            Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
            INFO: Both error and output logs will be printed to /home/jenkins/agent/workspace1/remoting
            Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main createEngine
            INFO: Setting up agent: ws-agent-agent-01
            Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main$CuiListener <init>
            INFO: Jenkins agent is running in headless mode.
            Aug 07, 2020 5:31:58 PM hudson.remoting.Engine startEngine
            INFO: Using Remoting version: 4.3
            Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
            INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory
            Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: WebSocket connection open
            Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status
            INFO: Connected
            Aug 07, 2020 5:51:02 PM hudson.slaves.ChannelPinger$1 onDead
            INFO: Ping failed. Terminating the channel ws-agent-agent-01.
            java.util.concurrent.TimeoutException: Ping started at 1596815222809 hasn't completed by 1596815462809
                    at hudson.remoting.PingThread.ping(PingThread.java:133)
                    at hudson.remoting.PingThread.run(PingThread.java:89)
            

            Sadly the JNLP4-connect doesn't support SNI, which prevent us to use a single port to connect our external agent

            Show
            jonesbusy Valentin Delaye added a comment - We were able to compare the Websocket vs the TCP with the same network component (Traefik 2 on Kubernetes which support HTTP and TCP routes) Without any activity (no job running) 1) The TCP connection is completely stable after many days  INFO: Using Remoting version: 4.3 Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace2/remoting as a remoting work directory Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [***************] Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping] Aug 07, 2020 5:31:51 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Agent discovery successful  Agent address: ***********  Agent port:    ****   Identity:      ************** Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to **************** Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP4-connect Aug 07, 2020 5:31:51 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Remote identity confirmed: ************* Aug 07, 2020 5:31:52 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected 2) The Websocket connection fail after few minutes (ping timeout) Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging INFO: Both error and output logs will be printed to /home/jenkins/agent/workspace1/remoting Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up agent: ws-agent-agent-01 Aug 07, 2020 5:31:58 PM hudson.remoting.jnlp.Main$CuiListener <init> INFO: Jenkins agent is running in headless mode. Aug 07, 2020 5:31:58 PM hudson.remoting.Engine startEngine INFO: Using Remoting version: 4.3 Aug 07, 2020 5:31:58 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir INFO: Using /home/jenkins/agent/workspace1/remoting as a remoting work directory Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status INFO: WebSocket connection open Aug 07, 2020 5:31:59 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Aug 07, 2020 5:51:02 PM hudson.slaves.ChannelPinger$1 onDead INFO: Ping failed. Terminating the channel ws-agent-agent-01. java.util.concurrent.TimeoutException: Ping started at 1596815222809 hasn't completed by 1596815462809 at hudson.remoting.PingThread.ping(PingThread.java:133) at hudson.remoting.PingThread.run(PingThread.java:89) Sadly the JNLP4-connect doesn't support SNI, which prevent us to use a single port to connect our external agent
            Hide
            jglick Jesse Glick added a comment -

            Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.

            Show
            jglick Jesse Glick added a comment - Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.

              People

              Assignee:
              jthompson Jeff Thompson
              Reporter:
              jonesbusy Valentin Delaye
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated: