Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-62576

Websockets connection unstable since remoting 4.2.1 (LTS 2.222.4)

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • remoting
    • None

    Description

      Hi,

      Since we upgraded to Jenkins Core 2.222.4 (to include the fix JENKINS-61409) and remoting 4.2.1

      We are facing much more stability issue on the websocket connection. It was not the case before with remoting 4.2 (The only issues we faced was the large payload).

      We can observe now, disconnection on the middle of builds

      Connection break after a simple git checkout.

      [Pipeline] { (Git Checkout)
      [Pipeline] dir
      10:08:31  Running in /home/jenkins/agent/workspace/workspace/*****
      [Pipeline] {
      [Pipeline] checkout (hide)
      [Pipeline] }
      [Pipeline] // dir
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      10:08:45  ********* was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
      10:08:45  	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      10:08:45  	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      10:08:45  	at com.sun.proxy.$Proxy91.onWebSocketClose(Unknown Source)
      10:08:45  	at 
      

      On the agent (multiple exception)

      Jun 05, 2020 8:07:10 AM org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef lambda$static$0
      WARNING: null
      hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:*****": channel is already closed
              at hudson.remoting.Channel.send(Channel.java:760)
              at hudson.remoting.ProxyOutputStream.flush(ProxyOutputStream.java:155)
              at hudson.remoting.RemoteOutputStream.flush(RemoteOutputStream.java:112)
              at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
              at org.jenkinsci.plugins.workflow.log.DelayBufferedOutputStream$FlushControlledOutputStream.flush(DelayBufferedOutputStream.java:131)
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
              at org.jenkinsci.plugins.workflow.log.GCFlushedOutputStream$FlushRef.lambda$static$0(GCFlushedOutputStream.java:77)
              at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@6cca5774:******": channel is already closed
              at hudson.remoting.Engine$1AgentEndpoint.onClose(Engine.java:590)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusEndpointWrapper.onClose(TyrusEndpointWrapper.java:1251)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.onClose(TyrusWebSocket.java:130)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.ProtocolHandler.close(ProtocolHandler.java:469)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.core.TyrusWebSocket.close(TyrusWebSocket.java:260)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine$2$1.close(TyrusClientEngine.java:635)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processError(ClientFilter.java:254)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:180)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onError(Filter.java:183)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:314)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.failed(TransportFilter.java:283)
              at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:128)
              at sun.nio.ch.Invoker$2.run(Invoker.java:218)
              at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
              ... 3 more
      
      
      Jun 05, 2020 8:07:28 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
      SEVERE: Connection error has occurred
      java.io.IOException: Connection reset by peer
              at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
              at sun.nio.ch.IOUtil.read(IOUtil.java:197)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:388)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:191)
              at sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:213)
              at sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:293)
              at java.lang.Thread.run(Thread.java:748)
      
      WARNING: LinkageError while performing UserRequest:hudson.FilePath$IsDirectory@4d6ab49
      java.lang.NoClassDefFoundError: hudson/util/io/Archiver
              at java.lang.Class.getDeclaredFields0(Native Method)
              at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
              at java.lang.Class.getDeclaredFields(Class.java:1916)
              at java.io.ObjectStreamClass.getDefaultSerialFields(ObjectStreamClass.java:1851)
              at java.io.ObjectStreamClass.getSerialFields(ObjectStreamClass.java:1773)
              at java.io.ObjectStreamClass.access$800(ObjectStreamClass.java:79)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:508)
              at java.io.ObjectStreamClass$3.run(ObjectStreamClass.java:494)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:494)
              at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:391)
              at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:681)
              at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1942)
              at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1808)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2099)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
              at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
              at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
              at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
              at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
              at hudson.remoting.UserRequest.deserialize(UserRequest.java:290)
              at hudson.remoting.UserRequest.perform(UserRequest.java:189)
              at hudson.remoting.UserRequest.perform(UserRequest.java:54)
              at hudson.remoting.Request$2.run(Request.java:369)
              at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:117)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.ClassNotFoundException: hudson.util.io.Archiver
              at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
              at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:173)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
              at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
              ... 36 more
      

      Both the master and agent are running on JDK8

      Agent (A VM)

      openjdk version "1.8.0_252"
      OpenJDK Runtime Environment (build 1.8.0_252-b09)
      OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
      

      Master (Official docker container)

      openjdk version "1.8.0_242"
      OpenJDK Runtime Environment (build 1.8.0_242-b08)
      OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
      

      If you have any idea about what is causing the issue.

      PS : I didn't had time to test using latest jenkins version and remoting 4.3. I don't know if it will change anything
      PS2: Totally aware that Websocket are still in beta

      Thanks!

      Attachments

        Issue Links

          Activity

            jglick Jesse Glick added a comment -

            Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.

            jglick Jesse Glick added a comment - Cannot guess why JENKINS-61409 would have caused issues with Traefik. That fix changed the details of how Remoting commands are encoded in WS—from one command = one WS frame to a more complex chunked framing implementation shared with TCP agents—but nothing essential about how the connection is started, or the outbound WS ping every 30s, etc. If you manage to find the root cause here it would be great.
            jglick Jesse Glick added a comment -

            Maybe like JENKINS-64598?

            jglick Jesse Glick added a comment - Maybe like JENKINS-64598 ?
            pgodithi Prudhvi Godithi added a comment - - edited

             

            Hey we are using
            Jenkins version 2.249.2, Remoting version 4.9, kubernetes plugin version 1.27.5
            We are connecting agents with WebSocket protocol and occasionally we do see the following error. We have nginx Running on top of our Jenkins master, and again this we are seeing intermittently. Anything we can incorporate to fix this error, please let us know.
            Agents we have java versions from open jdk8, 12, 15

             

            INFO: Connected
             WARNING: An illegal reflective access operation has occurred
             WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields to field java.util.TreeMap.comparator
             WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields
             WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
             WARNING: All illegal access operations will be denied in a future release
             Oct 04, 2021 7:15:20 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError
             SEVERE: Connection error has occurred
             java.io.IOException: Connection reset
                  at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:421)
                  at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:193)
                  at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:215)
                  at java.base/sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:306)|
                  at java.base/java.lang.Thread.run(Thread.java:832)
            Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status
             INFO: Read side closed
             Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status
             INFO: Terminated
             Oct 04, 2021 7:15:21 AM hudson.remoting.Engine lambda$new$1
             SEVERE: Uncaught exception in Engine thread Thread[Thread-0,5,main]
             java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller
                    at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:92)
                    at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)
                    at hudson.remoting.Engine.runWebSocket(Engine.java:687)
                    at hudson.remoting.Engine.run(Engine.java:496)
             Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller
                    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)
                    at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)
                    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
                   at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
            ... 4 more
            

             

             

            pgodithi Prudhvi Godithi added a comment - - edited   Hey we are using Jenkins version 2.249.2, Remoting version 4.9, kubernetes plugin version 1.27.5 We are connecting agents with WebSocket protocol and occasionally we do see the following error. We have nginx Running on top of our Jenkins master, and again this we are seeing intermittently. Anything we can incorporate to fix this error, please let us know. Agents we have java versions from open jdk8, 12, 15   INFO: Connected WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields to field java.util.TreeMap.comparator WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Oct 04, 2021 7:15:20 AM io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter processError SEVERE: Connection error has occurred java.io.IOException: Connection reset      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finishRead(UnixAsynchronousSocketChannelImpl.java:421)      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.finish(UnixAsynchronousSocketChannelImpl.java:193)      at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.onEvent(UnixAsynchronousSocketChannelImpl.java:215)      at java.base/sun.nio.ch.EPollPort$EventHandlerTask.run(EPollPort.java:306)|      at java.base/java.lang. Thread .run( Thread .java:832) Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Read side closed Oct 04, 2021 7:15:20 AM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Oct 04, 2021 7:15:21 AM hudson.remoting.Engine lambda$ new $1 SEVERE: Uncaught exception in Engine thread Thread [ Thread -0,5,main] java.lang.NoClassDefFoundError: jenkins/slaves/restarter/JnlpSlaveRestarterInstaller        at jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1.onReconnect(JnlpSlaveRestarterInstaller.java:92)        at hudson.remoting.EngineListenerSplitter.onReconnect(EngineListenerSplitter.java:54)        at hudson.remoting.Engine.runWebSocket(Engine.java:687)        at hudson.remoting.Engine.run(Engine.java:496) Caused by: java.lang.ClassNotFoundException: jenkins.slaves.restarter.JnlpSlaveRestarterInstaller        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:435)        at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:215)        at java.base/java.lang. ClassLoader .loadClass( ClassLoader .java:589)       at java.base/java.lang. ClassLoader .loadClass( ClassLoader .java:522) ... 4 more    
            dordor dor s added a comment - - edited

            I have got the same error  

            02:43:59  jenkins-agent-***** was marked offline: Connection was broken: java.nio.channels.ClosedChannelException
            02:43:59  	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:142)
            02:43:59  	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
            02:43:59  	at com.sun.proxy.$Proxy101.onWebSocketClose(Unknown Source)
            02:43:59  	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
            02:43:59  	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
            02:43:59  	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
            02:43:59  	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
            02:43:59  	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
            02:43:59  	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
            02:43:59  	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
            02:43:59  	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
            02:43:59  	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
            02:43:59  	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
            02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
            02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
            02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
            02:43:59  	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
            02:43:59  	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
            02:43:59  	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
            02:43:59  	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
            02:43:59  	at java.base/java.lang.Thread.run(Thread.java:829)
            
            Error when executing always post condition:
            java.io.IOException: Unable to create live FilePath for jenkins-agent-*****
            	at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64)
            	at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47)
            	at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94)
            	at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139)
            	at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135)
            	at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297)
            	at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75)
            	at org.jenkinsci.plugins.credentialsbinding.impl.BindingStep$Execution2.doStart(BindingStep.java:123)
            	at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77)
            	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
            	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            	at java.base/java.lang.Thread.run(Thread.java:829)
            

             

            In my case, I have a declarative pipeline that starts on a k8s agent-pod then jumps to a Win10 and then return to the k8s agent-pod

            pipeline {
                agent {
                    label 'agent-pod'
                }
                ...
                stages {
                    stage('Init') {
                        steps {
                            echo "init"
                        }
                    }
                    stage('Tests'){
                        agent { 
                            label 'Win10'
                        }
                        steps {
                            script {
                                echo "run test"
                                stash name: 'test_report', includes: "test_report.zip"
                            }
                        }
                    }
                }
                post {
                    always {
                        script { 
                            unstash "test_report"
                        }
                    }
                }
            }

            The task in the Win10 Agent takes at least 4 Hours, and when it return to the agent-pod it's getting the error above

            any idea why? 

             

             

            dordor dor s added a comment - - edited I have got the same error   02:43:59 jenkins-agent-***** was marked offline: Connection was broken: java.nio.channels.ClosedChannelException 02:43:59 at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:142) 02:43:59 at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91) 02:43:59 at com.sun.proxy.$Proxy101.onWebSocketClose(Unknown Source) 02:43:59 at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149) 02:43:59 at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316) 02:43:59 at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280) 02:43:59 at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293) 02:43:59 at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264) 02:43:59 at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193) 02:43:59 at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) 02:43:59 at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510) 02:43:59 at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440) 02:43:59 at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) 02:43:59 at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) 02:43:59 at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) 02:43:59 at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) 02:43:59 at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383) 02:43:59 at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) 02:43:59 at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) 02:43:59 at java.base/java.lang. Thread .run( Thread .java:829) Error when executing always post condition: java.io.IOException: Unable to create live FilePath for jenkins-agent-***** at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:64) at org.jenkinsci.plugins.workflow.support.steps.FilePathDynamicContext.get(FilePathDynamicContext.java:47) at org.jenkinsci.plugins.workflow.steps.DynamicContext$Typed.get(DynamicContext.java:94) at org.jenkinsci.plugins.workflow.cps.ContextVariableSet.get(ContextVariableSet.java:139) at org.jenkinsci.plugins.workflow.cps.CpsThread.getContextVariable(CpsThread.java:135) at org.jenkinsci.plugins.workflow.cps.CpsStepContext.doGet(CpsStepContext.java:297) at org.jenkinsci.plugins.workflow.support.DefaultStepContext.get(DefaultStepContext.java:75) at org.jenkinsci.plugins.credentialsbinding.impl.BindingStep$Execution2.doStart(BindingStep.java:123) at org.jenkinsci.plugins.workflow.steps.GeneralNonBlockingStepExecution.lambda$run$0(GeneralNonBlockingStepExecution.java:77) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:829)   In my case, I have a declarative pipeline that starts on a k8s agent-pod then jumps to a Win10 and then return to the k8s agent-pod pipeline { agent { label 'agent-pod' } ... stages { stage( 'Init' ) { steps { echo "init" } } stage( 'Tests' ){ agent { label 'Win10' } steps { script { echo "run test" stash name: 'test_report' , includes: "test_report.zip" } } } } post { always { script { unstash "test_report" } } } } The task in the Win10 Agent takes at least 4 Hours, and when it return to the agent-pod it's getting the error above any idea why?     
            korte Timm Korte added a comment - - edited

            We had the same issue (broken connection to agents) - and in the end an update of traefik from v1 to a current v2 seems to have fixed the issue.

            On that note, there seems to be a deadlock in the current agents connection handling - it just doesn't join the thread of the broken connection and because of that doesn't automatically re-connect in case of a broken connection like that originally mentioned in this ticket.

            In case this is relevant for someone else:

            The websocket connection seems to be created at: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L547

            At connect, we get the "WebSocket connection open" from https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L593

            at connection failure (apparently initiated by traefic towards both jenkins master and agent) we get the "onClose" from https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L616

            .. but after that, the log just stops and there are no further messages added to the log on agent side.

            The source should be doing a "transport.terminate" (from my understanding with sort of atomic/lock (I'm not a Java-Dev by trade)) https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L617 but that doesn't happen/complete.

            Those are the "inner" callbacks of the websocket connection: on the outer level, we have the control loop  -which printed the "Connected" in the first place https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L662

            but after that, the main loop is waiting for the connection to be terminated via "ch.get().join()" to print a "Terminated" - but that also never happened as the websocket itself seems to be locked up at this point.

             

             

            korte Timm Korte added a comment - - edited We had the same issue (broken connection to agents) - and in the end an update of traefik from v1 to a current v2 seems to have fixed the issue. On that note, there seems to be a deadlock in the current agents connection handling - it just doesn't join the thread of the broken connection and because of that doesn't automatically re-connect in case of a broken connection like that originally mentioned in this ticket. In case this is relevant for someone else: The websocket connection seems to be created at: https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L547 At connect, we get the "WebSocket connection open" from  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L593 at connection failure (apparently initiated by traefic towards both jenkins master and agent) we get the "onClose" from  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L616 .. but after that, the log just stops and there are no further messages added to the log on agent side. The source should be doing a "transport.terminate" (from my understanding with sort of atomic/lock (I'm not a Java-Dev by trade))  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L617  but that doesn't happen/complete. Those are the "inner" callbacks of the websocket connection: on the outer level, we have the control loop  -which printed the "Connected" in the first place  https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/Engine.java#L662 but after that, the main loop is waiting for the connection to be terminated via "ch.get().join()" to print a "Terminated" - but that also never happened as the websocket itself seems to be locked up at this point.    

            People

              jthompson Jeff Thompson
              jonesbusy Valentin Delaye
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: