Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69509

Unstable websocket connection (java.nio.channels.ClosedChannelException)

      xWe have long running tests (1 hour duration) that we manage through Jenkins. Since we upgrade from 2.364 to 2.365 we are getting a unstable websocket connection between the Jenkins agent and controller sometimes a run will be successful but the past days a passed run without this exception is hard to achieve.

      Resulting in the following exception:

      ....FATAL: command execution failed
      java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)
          at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)
          at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:146)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.notifyOnClose(JettyWebSocketFrameHandler.java:308)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onClosed(JettyWebSocketFrameHandler.java:292)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$0(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1445)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$1(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.util.Callback$4.completed(Callback.java:184)
          at org.eclipse.jetty.util.Callback$Completing.succeeded(Callback.java:344)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:268)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1463)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519)
          at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:522)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.close(WebSocketCoreSession.java:239)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processHandlerError(WebSocketCoreSession.java:371)
          at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onIdleExpired(WebSocketConnection.java:233)
          at org.eclipse.jetty.io.ssl.SslConnection.onIdleExpired(SslConnection.java:360)
          at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
          at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
          at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:108)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused: java.io.IOException: Backing channel 'PerformanceSERVER' is disconnected.
          at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
          at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
          at com.sun.proxy.$Proxy76.isAlive(Unknown Source)
          at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1215)
          at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1207)
          at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
          at hudson.plugins.powershell.PowerShell.perform(PowerShell.java:48)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
          at hudson.model.Build$BuildExecution.build(Build.java:199)
          at hudson.model.Build$BuildExecution.doRun(Build.java:164)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
          at hudson.model.Run.execute(Run.java:1899)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
          at hudson.model.ResourceController.execute(ResourceController.java:107)
          at hudson.model.Executor.run(Executor.java:449)
      FATAL: Unable to delete script file C:\Users\account.adm\AppData\Local\Temp\jenkins9449560261283910130.ps1
      java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)
          at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)
          at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:146)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.notifyOnClose(JettyWebSocketFrameHandler.java:308)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onClosed(JettyWebSocketFrameHandler.java:292)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$0(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1445)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$1(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.util.Callback$4.completed(Callback.java:184)
          at org.eclipse.jetty.util.Callback$Completing.succeeded(Callback.java:344)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:268)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1463)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519)
          at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:522)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.close(WebSocketCoreSession.java:239)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processHandlerError(WebSocketCoreSession.java:371)
          at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onIdleExpired(WebSocketConnection.java:233)
          at org.eclipse.jetty.io.ssl.SslConnection.onIdleExpired(SslConnection.java:360)
          at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
          at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
          at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:108)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@3c2ca039:PerformanceSERVER": Remote call on PerformanceSERVER failed. The channel is closing down or has closed down
          at hudson.remoting.Channel.call(Channel.java:993)
          at hudson.FilePath.act(FilePath.java:1186)
          at hudson.FilePath.act(FilePath.java:1175)
          at hudson.FilePath.delete(FilePath.java:1722)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
          at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
          at hudson.plugins.powershell.PowerShell.perform(PowerShell.java:48)
          at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
          at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
          at hudson.model.Build$BuildExecution.build(Build.java:199)
          at hudson.model.Build$BuildExecution.doRun(Build.java:164)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
          at hudson.model.Run.execute(Run.java:1899)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
          at hudson.model.ResourceController.execute(ResourceController.java:107)
          at hudson.model.Executor.run(Executor.java:449)
      Build step 'PowerShell' marked build as failure
      FATAL: Channel "hudson.remoting.Channel@3c2ca039:PerformanceSERVER": Remote call on PerformanceSERVER failed. The channel is closing down or has closed down
      java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)
          at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)
          at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:146)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.notifyOnClose(JettyWebSocketFrameHandler.java:308)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onClosed(JettyWebSocketFrameHandler.java:292)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$0(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1445)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$1(WebSocketCoreSession.java:272)
          at org.eclipse.jetty.util.Callback$4.completed(Callback.java:184)
          at org.eclipse.jetty.util.Callback$Completing.succeeded(Callback.java:344)
          at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:268)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1463)
          at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482)
          at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519)
          at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214)
          at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:522)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.close(WebSocketCoreSession.java:239)
          at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processHandlerError(WebSocketCoreSession.java:371)
          at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onIdleExpired(WebSocketConnection.java:233)
          at org.eclipse.jetty.io.ssl.SslConnection.onIdleExpired(SslConnection.java:360)
          at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
          at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
          at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:108)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          at java.base/java.lang.Thread.run(Thread.java:829)
      Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@3c2ca039:PerformanceSERVER": Remote call on PerformanceSERVER failed. The channel is closing down or has closed down
          at hudson.remoting.Channel.call(Channel.java:993)
          at hudson.Launcher$RemoteLauncher.kill(Launcher.java:1150)
          at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:538)
          at hudson.model.Run.execute(Run.java:1899)
          at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
          at hudson.model.ResourceController.execute(ResourceController.java:107)
          at hudson.model.Executor.run(Executor.java:449)
      Finished: FAILURE

      I have included the logs of the agent in question as an attachment to this issue.

      These are the details of the faulty Jenkins node:

       

      Naam  â†“ Waarde
      awt.toolkit sun.awt.windows.WToolkit
      file.encoding Cp1252
      file.separator |
      java.awt.graphicsenv sun.awt.Win32GraphicsEnvironment
      java.awt.printerjob sun.awt.windows.WPrinterJob
      java.class.path c:\Jenkinsnodeservice\slave.jar
      java.class.version 55.0
      java.home C:\Program Files\Amazon Corretto\jdk11.0.15_9
      java.io.tmpdir C:\Users\el80135.adm\AppData\Local\Temp|
      java.library.path C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin;C:\WINDOWS\Sun\Java\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin;D:\oracle\client64\19\client\bin;C:\Program Files\Python310\Scripts\;C:\Program Files\Python310\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Users\username\AppData\Local\Microsoft\WindowsApps;.
      java.runtime.name OpenJDK Runtime Environment
      java.runtime.version 11.0.15+9-LTS
      java.specification.name Java Platform API Specification
      java.specification.vendor Oracle Corporation
      java.specification.version 11
      java.vendor Amazon.com Inc.
      java.vendor.url https://aws.amazon.com/corretto/
      java.vendor.url.bug https://github.com/corretto/corretto-11/issues/
      java.vendor.version Corretto-11.0.15.9.1
      java.version 11.0.15
      java.version.date 2022-04-19
      java.vm.compressedOopsMode 32-bit
      java.vm.info mixed mode
      java.vm.name OpenJDK 64-Bit Server VM
      java.vm.specification.name Java Virtual Machine Specification
      java.vm.specification.vendor Oracle Corporation
      java.vm.specification.version 11
      java.vm.vendor Amazon.com Inc.
      java.vm.version 11.0.15+9-LTS
      jdk.debug release
      jna.loaded true
      jnidispatch.path C:\Users\username\AppData\Local\Temp\jna-813739984\jna10440559483708474225.dll
      line.separator  
      os.arch amd64
      os.name Windows Server 2019
      os.version 10.0
      path.separator ;
      sun.arch.data.model 64
      sun.boot.library.path C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin
      sun.cpu.endian little
      sun.cpu.isalist amd64
      sun.desktop windows
      sun.io.unicode.encoding UnicodeLittle
      sun.java.command c:\Jenkinsnodeservice\slave.jar -jnlpUrl https://jenkins:8080/computer/PerformanceSERVER_NAME/jenkins-agent.jnlp -secret "secret" -workDir C:\jenkins
      sun.java.launcher SUN_STANDARD
      sun.jnu.encoding Cp1252
      sun.management.compiler HotSpot 64-Bit Tiered Compilers
      sun.os.patch.level  
      user.country US
      user.dir c:\Jenkinsnodeservice
      user.home C:\Users\username
      user.language en
      user.name username
      user.script  
      user.timezone Europe/Berlin
      user.variant

      Details regarding the Jenkins controller:

      Naam  â†“ Waarde
      awt.toolkit sun.awt.windows.WToolkit
      executable-war C:\Jenkins\Jenkins.war
      file.encoding Cp1252
      file.separator |
      hudson.lifecycle hudson.lifecycle.WindowsServiceLifecycle
      java.awt.graphicsenv sun.awt.Win32GraphicsEnvironment
      java.awt.headless true
      java.awt.printerjob sun.awt.windows.WPrinterJob
      java.class.path C:\Jenkins\jenkins.war
      java.class.version 55.0
      java.home C:\Program Files\Amazon Corretto\jdk11.0.15_9
      java.io.tmpdir C:\WINDOWS\TEMP|
      java.library.path C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin;C:\WINDOWS\Sun\Java\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\WINDOWS\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps;.
      java.runtime.name OpenJDK Runtime Environment
      java.runtime.version 11.0.15+9-LTS
      java.specification.name Java Platform API Specification
      java.specification.vendor Oracle Corporation
      java.specification.version 11
      java.vendor Amazon.com Inc.
      java.vendor.url https://aws.amazon.com/corretto/
      java.vendor.url.bug https://github.com/corretto/corretto-11/issues/
      java.vendor.version Corretto-11.0.15.9.1
      java.version 11.0.15
      java.version.date 2022-04-19
      java.vm.compressedOopsMode 32-bit
      java.vm.info mixed mode
      java.vm.name OpenJDK 64-Bit Server VM
      java.vm.specification.name Java Virtual Machine Specification
      java.vm.specification.vendor Oracle Corporation
      java.vm.specification.version 11
      java.vm.vendor Amazon.com Inc.
      java.vm.version 11.0.15+9-LTS
      jdk.debug release
      jetty.git.hash
      • (removed it don't know if it is confidential)
      jna.loaded true
      jnidispatch.path C:\WINDOWS\TEMP\jna-756962248\jna18350657977015144084.dll
      line.separator  
      mail.smtp.sendpartial true
      mail.smtps.sendpartial true
      octaneAllowedStorage C:\Jenkins\userContent|
      os.arch amd64
      os.name Windows Server 2019
      os.version 10.0
      path.separator ;
      sun.arch.data.model 64
      sun.awt.enableExtraMouseButtons true
      sun.boot.library.path C:\Program Files\Amazon Corretto\jdk11.0.15_9\bin
      sun.cpu.endian little
      sun.cpu.isalist amd64
      sun.desktop windows
      sun.io.unicode.encoding UnicodeLittle
      sun.java.command C:\Jenkins\jenkins.war {}httpPort=1 httpsPort=8443 httpsKeyStore=C:\Jenkins\secrets\server.keystore -httpsKeyStorePassword=changeit
      sun.java.launcher SUN_STANDARD
      sun.jnu.encoding Cp1252
      sun.management.compiler HotSpot 64-Bit Tiered Compilers
      sun.os.patch.level  
      user.country US
      user.dir C:\Jenkins
      user.home C:\WINDOWS\system32\config\systemprofile
      user.language en
      user.name SystemUser$
      user.script  
      user.timezone Europe/Berlin
      user.variant

      For a bit more context the output to console of the command we are executing:

      Started by user Hendricks, JGH (Joey)
      Running as SYSTEM
      Building remotely on PerformanceSERVER_NAME (Performance) in workspace C:\jenkins\workspace\generieke-acceptatie-performance-testen\1. latest-stable-build-direct
      [1. latest-stable-build-acc90-gpsv-direct] $ powershell.exe -NonInteractive -ExecutionPolicy Bypass -File C:\Users\username\AppData\Local\Temp\jenkins9449560261283910130.ps1
      OpenJDK 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
      WebServices.nlp file has been found in C:\Users\username\AppData\Local\Temp\NeoloadProjects\WebServices.
      Logging to C:\Users\username\AppData\Roaming\Neotys\NeoLoad\v7.10\logs, as user: username
      Found a valid NTS configuration : URL=[http://licenseserver:8080//] and USER=[service account]
      Leasing license ---...
      Leased license ---== with 150 web and 0 SAP VUs 120 minutes(s).
      Loading project: C:\Users\username\AppData\Local\Temp\NeoloadProjects\WebServices\WebServices.nlp
      Project GPS_WebServices loaded
      Found a valid NTS configuration : URL=[http://licenseserver:8080//] and USER=[service account]
      Launching scenario: WebServices_PiekLoad_60min (01h:02m:00s)
      Initializing...
      [LoadGenerator] OpenJDK 64-Bit Server VM warning: Option MaxRAMFraction was deprecated in version 10.0 and will likely be removed in a future release.
      
      Initializing Monitors...
      Pre-Monitoring...
      Running

      Thanks in advanced this issues has been a headache.

        1. controller_log.txt
          418 kB
        2. jenkins.err.log
          1.26 MB
        3. jenkins.err-1.log
          1.26 MB
        4. jenkins.err-2.log
          1.26 MB
        5. jenkins-master.log
          5 kB
        6. jenkins-slave.err.log
          436 kB
        7. jenkins-slave.wrapper.log
          382 kB
        8. remoting.log.0
          604 kB
        9. remoting.log-1.0
          604 kB
        10. remoting.log-2.0
          604 kB
        11. slave.log.1
          7 kB
        12. slave.log-1.1
          7 kB

          [JENKINS-69509] Unstable websocket connection (java.nio.channels.ClosedChannelException)

          Basil Crow added a comment -

          Regarding your actual error, looks like on the controller side you're hitting some sort of idle timeout. I'm not sure if the -httpKeepAliveTimeout or the -httpsKeepAliveTimeout options to the controller can be used to increase this timeout, but that might be one place to start. Of course the real question is why is the WebSocket connection idle for more than the timeout.

          On the agent side it looks like you're using the latest release 3046.v38db_38a_b_7a_86 which upgraded the Tyrus WebSocket client from 1.18 to 2.1.0. Could you try downgrading to the previous release 3044.vb_940a_a_e4f72e and see if the problem persists. It is the agent JAR file in C:\Jenkinsnodeservice on your agent and you can get the older one from https://repo.jenkins-ci.org/releases/org/jenkins-ci/main/remoting/3044.vb_940a_a_e4f72e/remoting-3044.vb_940a_a_e4f72e.jar - be careful, as your WinSW Windows service wrapper might try to download the new version again from the controller. (If it does, comment out the <download> part of your WinSW XML file.) Check that it is at the older version 3044.vb_940a_a_e4f72e with java -jar <filename> -version after relaunching the agent process in the Jenkins UI and ensure that the agent log reads "INFO: Using Remoting version: 3044.vb_940a_a_e4f72e." If that release doesn't exhibit the problem, we'll know that the Tyrus upgrade is the proximate cause of your issue.

          Basil Crow added a comment - Regarding your actual error, looks like on the controller side you're hitting some sort of idle timeout. I'm not sure if the - httpKeepAliveTimeout or the -httpsKeepAliveTimeout options to the controller can be used to increase this timeout, but that might be one place to start. Of course the real question is why is the WebSocket connection idle for more than the timeout. On the agent side it looks like you're using the latest release 3046.v38db_38a_b_7a_86 which upgraded the Tyrus WebSocket client from 1.18 to 2.1.0. Could you try downgrading to the previous release 3044.vb_940a_a_e4f72e and see if the problem persists. It is the agent JAR file in C:\Jenkinsnodeservice on your agent and you can get the older one from https://repo.jenkins-ci.org/releases/org/jenkins-ci/main/remoting/3044.vb_940a_a_e4f72e/remoting-3044.vb_940a_a_e4f72e.jar - be careful, as your WinSW Windows service wrapper might try to download the new version again from the controller. (If it does, comment out the <download> part of your WinSW XML file.) Check that it is at the older version 3044.vb_940a_a_e4f72e with java -jar <filename> -version after relaunching the agent process in the Jenkins UI and ensure that the agent log reads "INFO: Using Remoting version: 3044.vb_940a_a_e4f72e." If that release doesn't exhibit the problem, we'll know that the Tyrus upgrade is the proximate cause of your issue.

          basil Thanks for your reply! After the weekend I will speak with the Jenkins admins and ask for help trying out your suggestions they will probably have a better understanding where one can find which settings as they are the once that have set everything up. I have also included the controller log which I was able to fetch through the GUI. In this file I also see a lot of time outs errors from wide range of servers not just my own machines. We are going to first try like you suggested to increase the timeout value any recommendations to which value I should increase this to as I can type in any arbitrary number? Also would it be worth upgrading Jenkins to version 2.366, is there a possibility that doing that could solve our issue? Once again thanks for you advice I will come back on this after the weekend .

          joey hendricks added a comment - basil Thanks for your reply! After the weekend I will speak with the Jenkins admins and ask for help trying out your suggestions they will probably have a better understanding where one can find which settings as they are the once that have set everything up. I have also included the controller log which I was able to fetch through the GUI. In this file I also see a lot of time outs errors from wide range of servers not just my own machines. We are going to first try like you suggested to increase the timeout value any recommendations to which value I should increase this to as I can type in any arbitrary number? Also would it be worth upgrading Jenkins to version 2.366, is there a possibility that doing that could solve our issue? Once again thanks for you advice I will come back on this after the weekend .

          Basil Crow added a comment -

          I don't even know for sure that the HTTP keep alive timeout options to Winstone affect the WebSocketConnection.onIdleExpired code path in Jetty that you're hitting. You should investigate that with a debugger prior to changing such options. That will also tell you what the current value of the timeout is. But like I said earlier, the default value is probably reasonable and just pointing toward your agent disconnecting.

          Upgrading the controller to version 2.366 shouldn't make any difference, as there have neither been any changes to Jetty on the controller side nor in the version of Remoting bundled on the controller. The most significant code change that is likely to have affected you is the upgrade of the Tyrus WebSocket client in the agent JAR, which is why I suggested trying out a downgrade of the agent JAR to 3044.vb_940a_a_e4f72e in my last post to help narrow down the proximate cause of the problem.

          Basil Crow added a comment - I don't even know for sure that the HTTP keep alive timeout options to Winstone affect the WebSocketConnection.onIdleExpired code path in Jetty that you're hitting. You should investigate that with a debugger prior to changing such options. That will also tell you what the current value of the timeout is. But like I said earlier, the default value is probably reasonable and just pointing toward your agent disconnecting. Upgrading the controller to version 2.366 shouldn't make any difference, as there have neither been any changes to Jetty on the controller side nor in the version of Remoting bundled on the controller. The most significant code change that is likely to have affected you is the upgrade of the Tyrus WebSocket client in the agent JAR, which is why I suggested trying out a downgrade of the agent JAR to 3044.vb_940a_a_e4f72e in my last post to help narrow down the proximate cause of the problem.

          joey hendricks added a comment - - edited

          basil Over the weekend I applied your solution (Also removed the download part from XML file you mentioned.) to downgrade the agent version and at first appeared to work I was able to run a multitude of tests all without fail. However from the 6 runs (each an hour) 1 test failed with the same error I experienced before. Like you mentioned I checked the agent log and I verified that the correct agent version was being used the log reads as following:

           

          Inbound agent connected from ip address
          Remoting version: 3044.vb_940a_a_e4f72e
          Launcher: JNLPLauncher
          Communication Protocol: WebSocket
          This is a Windows agent
          Agent successfully connected and online 

          I am currently running new tests of an hour I am curious to see if it again will produce a error or that this instability will continue. I will update this message accordingly when I have more information. Tomorrow I will check with the Jenkins admins about looking at the time-out values I will let know what I discover there. Like you mentioned I also believe that the default time-out value would be reasonable, however I am going to check what value is defined just to verify if it is within a reasonable range. 

           

          joey hendricks added a comment - - edited basil Over the weekend I applied your solution (Also removed the download part from XML file you mentioned.) to downgrade the agent version and at first appeared to work I was able to run a multitude of tests all without fail. However from the 6 runs (each an hour) 1 test failed with the same error I experienced before. Like you mentioned I checked the agent log and I verified that the correct agent version was being used the log reads as following:   Inbound agent connected from ip address Remoting version: 3044.vb_940a_a_e4f72e Launcher: JNLPLauncher Communication Protocol: WebSocket This is a Windows agent Agent successfully connected and online I am currently running new tests of an hour I am curious to see if it again will produce a error or that this instability will continue. I will update this message accordingly when I have more information. Tomorrow I will check with the Jenkins admins about looking at the time-out values I will let know what I discover there. Like you mentioned I also believe that the default time-out value would be reasonable, however I am going to check what value is defined just to verify if it is within a reasonable range.   

          Basil Crow added a comment -

          If the older version of Tyrus is giving you the same problem then I don't have any ideas. If you can't come up with steps to reproduce the problem from scratch, you'll have to do some more detailed investigation with a Java debugger on both the controller and agent side to develop a clearer picture of what is causing the connection to drop.

          Basil Crow added a comment - If the older version of Tyrus is giving you the same problem then I don't have any ideas. If you can't come up with steps to reproduce the problem from scratch, you'll have to do some more detailed investigation with a Java debugger on both the controller and agent side to develop a clearer picture of what is causing the connection to drop.

          Hi basil I think I have found the reason why this problem is popping up first off our test run for a long amount of time circa 1 hour.

          Because they run this long there is little logging information that is displayed in the console.

          I have read in some other defect to be more specific this one:

          It seems that people have had issues in the past with connection that die when there is little logging happening over large amounts of time during runs.

          We have now implemented a ping that logs a character towards the console since we have done this the problem has not popped up again even on the newer version of the agent.

          Currently we are running more tests to verify if this behavior is consistence if it is not I will let something know through this defect.

          Basil thanks for your help.  

          joey hendricks added a comment - Hi basil I think I have found the reason why this problem is popping up first off our test run for a long amount of time circa 1 hour. Because they run this long there is little logging information that is displayed in the console. I have read in some other defect to be more specific this one: https://issues.jenkins.io/browse/JENKINS-12235?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&showAll=true "We consistently get this error on a Linux build slave consistently on jobs that have little output and take a long time (typically > 2 hours). We are using ssh and I suspect the problem is due to no traffic on the ssh link for this long period. Using Jenkins 1.434." It seems that people have had issues in the past with connection that die when there is little logging happening over large amounts of time during runs. We have now implemented a ping that logs a character towards the console since we have done this the problem has not popped up again even on the newer version of the agent. Currently we are running more tests to verify if this behavior is consistence if it is not I will let something know through this defect. Basil thanks for your help.  

          Mikel Ortega added a comment -

          Hello, I'm having the same problem with Jenkins 2.361.1 LTS running in a Windows 10 machine with the Agent in the same computer. I've tried different versions of the agent and the problem persist. So I downgraded the Jenkins to 2.346.3 LTS and now it seems it's working fine.

          So far I've discovered that the connection closes when there isn't a new output line in the console for some time. I cannot tell how long until it disconnects, but I guess it disconnects after some minutes with no new console output. The problem is I cannot implement a ping to the console for all my projects, as Joey comments.

          Also, I was not able to set -httpsKeepAliveTimeout, I don't know if that parameter is in milliseconds and what would be a good number to set it up.

          Thank you.

          Mikel Ortega added a comment - Hello, I'm having the same problem with Jenkins 2.361.1 LTS running in a Windows 10 machine with the Agent in the same computer. I've tried different versions of the agent and the problem persist. So I downgraded the Jenkins to 2.346.3 LTS and now it seems it's working fine. So far I've discovered that the connection closes when there isn't a new output line in the console for some time. I cannot tell how long until it disconnects, but I guess it disconnects after some minutes with no new console output. The problem is I cannot implement a ping to the console for all my projects, as Joey comments. Also, I was not able to set -httpsKeepAliveTimeout, I don't know if that parameter is in milliseconds and what would be a good number to set it up. Thank you.

          Hi, we also have the same problem after upgrading to Jenkins 2.361.1 LTS. Our setup and use use case is similar to mikelo's.

          Usually our UI-tests run for about 1.5 hours. Currently they aborted by with java.nio.channels.ClosedChannelException after 7 to 21 minutes.

           

          Benjamin Oschatz added a comment - Hi, we also have the same problem after upgrading to Jenkins 2.361.1 LTS. Our setup and use use case is similar to mikelo 's. Usually our UI-tests run for about 1.5 hours. Currently they aborted by with java.nio.channels.ClosedChannelException after 7 to 21 minutes.  

          mikelo & boschatz after doing that trick where I output more "log message" to the console the error has not come back on my end. The current workaround would be to output something to the console so the websocket connection does not not die.

          However this should not be normal behavior as the process is killed because of inactivity while it is still actively working but not displaying log message. I believe that this "time-outs" combats processes that have crashed and wont come back. However there must be something that could be done other then creating your own manual polling mechanism?

          Would this then be a feature request for Jenkins or a defect? 

          joey hendricks added a comment - mikelo & boschatz after doing that trick where I output more "log message" to the console the error has not come back on my end. The current workaround would be to output something to the console so the websocket connection does not not die. However this should not be normal behavior as the process is killed because of inactivity while it is still actively working but not displaying log message. I believe that this "time-outs" combats processes that have crashed and wont come back. However there must be something that could be done other then creating your own manual polling mechanism? Would this then be a feature request for Jenkins or a defect? 

          Hi, we're facing the same issue weeks ago. I downgraded Jenkins from 2.367 to 2.361 but the CloseChannelException is still occurring in our builds running for ~3h. I also increased the httpsKeepAliveTimeout value when starting the controller but did not change anything.

          Is anyone working on that issue?

          hammou hammoudi added a comment - Hi, we're facing the same issue weeks ago. I downgraded Jenkins from 2.367 to 2.361 but the CloseChannelException is still occurring in our builds running for ~3h. I also increased the httpsKeepAliveTimeout value when starting the controller but did not change anything. Is anyone working on that issue?

          Basil Crow added a comment -

          I noticed that the default Jetty HTTP keep-alive timeout is 30 seconds, but Winstone (and therefore Jenkins) uses a default of only 5 seconds. Could affected users try adding --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000 for TLS)? The Java command should look like this:

          java [… other JVM options…] -jar jenkins.war --httpKeepAliveTimeout=30000 [… other Jenkins options…]

          Basil Crow added a comment - I noticed that the default Jetty HTTP keep-alive timeout is 30 seconds, but Winstone (and therefore Jenkins) uses a default of only 5 seconds. Could affected users try adding --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000 for TLS)? The Java command should look like this: java [… other JVM options…] -jar jenkins.war --httpKeepAliveTimeout=30000 [… other Jenkins options…]

          Basil Crow added a comment -

          I noticed that there was an unintentional change in how the ping thread works between 2.362 and 2.363 as described in jenkinsci/jenkins#7195. I produced a speculative fix to restore the old behavior:

          https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/jenkins-war/2.372-rc32931.44da_2d458596/jenkins-war-2.372-rc32931.44da_2d458596.war

          I am looking for data regarding whether the problem reoccurs in each of these three scenarios:

          • With 2.371 and --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000)
          • With jenkins-war-2.372-rc32931.44da_2d458596.war
          • With jenkins-war-2.372-rc32931.44da_2d458596.war AND --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000)

          Basil Crow added a comment - I noticed that there was an unintentional change in how the ping thread works between 2.362 and 2.363 as described in jenkinsci/jenkins#7195 . I produced a speculative fix to restore the old behavior: https://repo.jenkins-ci.org/incrementals/org/jenkins-ci/main/jenkins-war/2.372-rc32931.44da_2d458596/jenkins-war-2.372-rc32931.44da_2d458596.war I am looking for data regarding whether the problem reoccurs in each of these three scenarios: With 2.371 and --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000) With jenkins-war-2.372-rc32931.44da_2d458596.war With jenkins-war-2.372-rc32931.44da_2d458596.war AND --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000)

          Hierony Manurung added a comment - - edited

          This is also happening to me, the difference is I'm using Linux for the agent and use Jenkins 2.371. The strange things is, it's happened like intermittent (I believe this is not server specs issue, since I monitor the metrics and all good).  Here's the error log when it's happen:

          Running Gradle task 'assembleRelease'... FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          ...
          Caused: java.io.IOException: Backing channel 'jenkins-agent-scale-34eb6c9a' is disconnected.
          ...
          FATAL: Unable to delete script file /tmp/jenkins13945840368427000990.sh
          java.nio.channels.ClosedChannelException
          ...
          Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a": Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down
          ...
          Build step 'Execute shell' marked build as failure
          ERROR: Unable to tear down: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a": Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down
          java.nio.channels.ClosedChannelException
          ...
          Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a": Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down
          ...
          FATAL: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a": Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down
          java.nio.channels.ClosedChannelException
          ...
          Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a": Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down 

           

          Here's the log from the agent connection:
          Inbound agent connected from x.x.x.x
          Remoting version: 3046.v38db_38a_b_7a_86
          Launcher: SwarmLauncher
          Communication Protocol: WebSocket
          This is a Unix agent
          Agent successfully connected and online

          Hierony Manurung added a comment - - edited This is also happening to me, the difference is I'm using Linux for the agent and use Jenkins 2.371 . The strange things is, it's happened like intermittent (I believe this is not server specs issue, since I monitor the metrics and all good).  Here's the error log when it's happen: Running Gradle task 'assembleRelease' ... FATAL: command execution failed java.nio.channels.ClosedChannelException ... Caused: java.io.IOException: Backing channel 'jenkins-agent-scale-34eb6c9a' is disconnected. ... FATAL: Unable to delete script file /tmp/jenkins13945840368427000990.sh java.nio.channels.ClosedChannelException ... Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a" : Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down ... Build step 'Execute shell' marked build as failure ERROR: Unable to tear down: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a" : Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down java.nio.channels.ClosedChannelException ... Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a" : Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down ... FATAL: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a" : Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down java.nio.channels.ClosedChannelException ... Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@7800b416:jenkins-agent-scale-34eb6c9a" : Remote call on jenkins-agent-scale-34eb6c9a failed. The channel is closing down or has closed down   Here's the log from the agent connection: Inbound agent connected from x.x.x.x Remoting version: 3046.v38db_38a_b_7a_86 Launcher: SwarmLauncher Communication Protocol: WebSocket This is a Unix agent Agent successfully connected and online

          Nik Reiman added a comment -

          Normally I'm not the type to post "me too" comments on bug reports like this, but given the last few comments, I thought it might be useful.

          We are running a Jenkins server with around 100 nodes, all of which connect to Jenkins via the Swarm Client plugin. We build on several platforms, but most of our nodes are running either:

          • Ubuntu Linux 20.04
          • macOS 10.15 (Catalina)
          • macOS 12.2 (Monterey)
          • macOS 12.6 (Monterey)
          • Windows 10

          For the one week that we ran Jenkins 2.361.1 on our production, we observed a huge number of node disconnection alerts on all of the above platforms. We have many jobs that run over one hour as well.

          I haven't dug into the guts of the potential solution to this problem, but I just wanted to point out that the problem is definitely not OS-specific.

          Nik Reiman added a comment - Normally I'm not the type to post "me too" comments on bug reports like this, but given the last few comments, I thought it might be useful. We are running a Jenkins server with around 100 nodes, all of which connect to Jenkins via the Swarm Client plugin. We build on several platforms, but most of our nodes are running either: Ubuntu Linux 20.04 macOS 10.15 (Catalina) macOS 12.2 (Monterey) macOS 12.6 (Monterey) Windows 10 For the one week that we ran Jenkins 2.361.1 on our production, we observed a huge number of node disconnection alerts on all of the above platforms. We have many jobs that run over one hour as well. I haven't dug into the guts of the potential solution to this problem, but I just wanted to point out that the problem is definitely not OS-specific.

          Basil Crow added a comment -

          Basil Crow added a comment -   hierony Please read JENKINS-69509 (comment) .

          Thanks basil ,
          one question tho, how to set  --httpKeepAliveTimeout in docker jenkins setup. Since I know it was the java options.

          Here's my docker-compose.

           

          version: '3.8'
          services:
            jenkins:
               image: jenkins/jenkins:2.371-jdk17
               #build: .
               command: ["/usr/bin/tini", "--", "/usr/local/bin/jenkins.sh"]
               privileged: true
               user: root
               ports:
                - 8080:8080
                - 8090:8090
                - 50000:50000
               container_name: jenkins
               volumes:
                 - /var/lib/jenkins:/var/jenkins_home
                 - /var/run/docker.sock:/var/run/docker.sock
               environment:
                 - ANDROID_SDK_ROOT=/var/lib/jenkins/android-sdk
               restart: always 

          Hierony Manurung added a comment - Thanks basil , one question tho, how to set  --httpKeepAliveTimeout in docker jenkins setup. Since I know it was the java options. Here's my docker-compose.   version: '3.8' services:   jenkins:      image: jenkins/jenkins:2.371-jdk17      #build: .      command: [ "/usr/bin/tini" , "--" , "/usr/local/bin/jenkins.sh" ]      privileged: true      user: root      ports:       - 8080:8080       - 8090:8090       - 50000:50000      container_name: jenkins      volumes:        - / var /lib/jenkins:/ var /jenkins_home        - / var /run/docker.sock:/ var /run/docker.sock      environment:        - ANDROID_SDK_ROOT=/ var /lib/jenkins/android-sdk      restart: always

          Basil Crow added a comment -

          hierony To test with a custom --httpKeepAliveTimeout=30000 in the official Docker image, set JENKINS_OPTS in your Dockerfile:

          ENV JENKINS_OPTS --httpKeepAliveTimeout=30000
          

          Note that to test the second and third scenarios I described previously, both of which involve an incremental build of Jenkins, you will need replace /usr/share/jenkins/jenkins.war inside the official Docker image with the incremental build provided above (or build your own Docker image).

          Basil Crow added a comment - hierony To test with a custom --httpKeepAliveTimeout=30000 in the official Docker image, set JENKINS_OPTS in your Dockerfile : ENV JENKINS_OPTS --httpKeepAliveTimeout=30000 Note that to test the second and third scenarios I described previously, both of which involve an incremental build of Jenkins, you will need replace /usr/share/jenkins/jenkins.war inside the official Docker image with the incremental build provided above (or build your own Docker image).

          Bryce Lovell added a comment -

          For what it's worth, I've downloaded the latest war (version 2.373) that was built on October 11th, which included pull 7217 to see if this was better.

          I've updated my instance with the latest war and the remote agent jar (remoting-3063.v26e24490f041.jar).

          In jenkins.xml I've set the timeout to --httpKeepAliveTimeout=120000 in an attempt to resolve the timeout issue.

          With the latest pull, latest files, and a two minute timeout, the first time I ran the build it quickly errored out, the second time it worked. Need to do more testing, but prior to updating to 2.373 it failed every time, this time I got through a build. So maybe a little better, but still concerning that I'm seeing the error after updating.

          Jenkins 2.373 using remoting-3063.v26e24490f041.jar

          sun.java.command C:\Program Files\Jenkins\jenkins.war --httpKeepAliveTimeout=120000 --httpPort=8080 --webroot=C:\ProgramData\Jenkins\war

           

          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
              at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)
              at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)
              at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:149) 

           

          Bryce Lovell added a comment - For what it's worth, I've downloaded the latest war (version 2.373) that was built on October 11th, which included pull 7217 to see if this was better. I've updated my instance with the latest war and the remote agent jar (remoting-3063.v26e24490f041.jar). In jenkins.xml I've set the timeout to --httpKeepAliveTimeout=120000 in an attempt to resolve the timeout issue. With the latest pull, latest files, and a two minute timeout, the first time I ran the build it quickly errored out, the second time it worked. Need to do more testing, but prior to updating to 2.373 it failed every time, this time I got through a build. So maybe a little better, but still concerning that I'm seeing the error after updating. Jenkins 2.373 using remoting-3063.v26e24490f041.jar sun.java.command C:\Program Files\Jenkins\jenkins.war --httpKeepAliveTimeout=120000 --httpPort=8080 --webroot=C:\ProgramData\Jenkins\war   FATAL: command execution failed java.nio.channels.ClosedChannelException     at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:153)     at jenkins.websocket.WebSockets$1.onWebSocketClose(WebSockets.java:80)     at jenkins.websocket.Jetty10Provider$2.onWebSocketClose(Jetty10Provider.java:149)  

          Basil Crow added a comment -

          Thanks for testing brycelovell and glad to hear that the new weekly is more stable for you. Please also test without any custom value for --httpKeepAliveTimeout, since I am trying to determine whether that setting is applicable here (I currently do not think it matters one way or another). If anyone does see any problems after upgrading to 2.373 please include full logs from java -jar jenkins.war side for a full 2-3 minutes before the build fails: there is a long chain of events that leads up to this particular failure mode.

          Basil Crow added a comment - Thanks for testing brycelovell and glad to hear that the new weekly is more stable for you. Please also test without any custom value for --httpKeepAliveTimeout , since I am trying to determine whether that setting is applicable here (I currently do not think it matters one way or another). If anyone does see any problems after upgrading to 2.373 please include full logs from java -jar jenkins.war side for a full 2-3 minutes before the build fails: there is a long chain of events that leads up to this particular failure mode.

          Hierony Manurung added a comment - - edited

          basil 
          Just want to get back, I already update my jenkins master to 2.373. I do still experiencing this error:
          Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@135ef64d:ip-xxx-xx-x-xxx.ap-southeast-1.compute.internal-1600541b": Remote call on ip-xxx-xx-x-xxx.ap-southeast-1.compute.internal-1600541b failed. The channel is closing down or has closed down

          download the agent jar through the master endpoint 

          wget https://jenkins.xxx.id/swarm/swarm-client.jar

          Here's my agent version

          Inbound agent connected from xx.xx.xx.xxx 
          Remoting version: 3046.v38db_38a_b_7a_86 
          Launcher: SwarmLauncher Communication 
          Protocol: WebSocket 
          This is a Unix agent Agent
          successfully connected and online

          Hierony Manurung added a comment - - edited basil   Just want to get back, I already update my jenkins master to 2.373. I do still experiencing this error: Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@135ef64d:ip-xxx-xx-x-xxx.ap-southeast-1.compute.internal-1600541b": Remote call on ip-xxx-xx-x-xxx.ap-southeast-1.compute.internal-1600541b failed. The channel is closing down or has closed down download the agent jar through the master endpoint  wget https: //jenkins.xxx.id/swarm/swarm-client.jar Here's my agent version Inbound agent connected from xx.xx.xx.xxx Remoting version: 3046.v38db_38a_b_7a_86 Launcher: SwarmLauncher Communication Protocol: WebSocket This is a Unix agent Agent successfully connected and online

          Basil Crow added a comment -

          Basil Crow added a comment - hierony Please read JENKINS-69509 (comment) .

          Hierony Manurung added a comment - - edited

          This is the log that I can get from my master (I run the Jenkins master on docker container), I'm afraid it didn't show chain of events that leads up to the failure mode though

          jenkins-master.log

          Hierony Manurung added a comment - - edited This is the log that I can get from my master (I run the Jenkins master on docker container), I'm afraid it didn't show chain of events that leads up to the failure mode though jenkins-master.log

          Bryce Lovell added a comment -

          remoting.log.0jenkins.err.logslave.log.1

          basil hey, trying to give you the error logs you asked for, hope this helps.

          In jenkins.err.log, removed the timeout setting and restarted the service around line 13054.

          Kicked off a new build, you can see that the timeout error happened a few minutes thereafter at the bottom of the log. Happens randomly and in different spots when building.

          Not sure if related, but in the log, while Jenkins was idle, there are a ton of web socket exceptions being thrown with idle timeouts.

          Also included the remoting log on the agent and the slave log from the controller. Not sure if there is any useful information there. The slave log has a little more in the stack trace then what is show in the jenkins error log.

          Bryce Lovell added a comment - remoting.log.0 jenkins.err.log slave.log.1 basil hey, trying to give you the error logs you asked for, hope this helps. In jenkins.err.log, removed the timeout setting and restarted the service around line 13054. Kicked off a new build, you can see that the timeout error happened a few minutes thereafter at the bottom of the log. Happens randomly and in different spots when building. Not sure if related, but in the log, while Jenkins was idle, there are a ton of web socket exceptions being thrown with idle timeouts. Also included the remoting log on the agent and the slave log from the controller. Not sure if there is any useful information there. The slave log has a little more in the stack trace then what is show in the jenkins error log.

          Basil Crow added a comment - - edited

          Thanks brycelovell, based on those logs I can see that the ping thread isn't dying, so the fix in 2.373 at least does not seem to hurt, and is likely helping. The WebSocketTimeout exception you're seeing, in the absence of another exception from jenkins.websocket.Jetty10Provider#sendPing, seems like a legitimate failure to receive an acknowledgement from the ping request within the allotted timeout period. Could you try running with --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000), which is the Jetty (but not Winstone) default? If things are more stable for you with that value, I might consider changing the Winstone default to match the upstream Jetty default.

          Basil Crow added a comment - - edited Thanks brycelovell , based on those logs I can see that the ping thread isn't dying, so the fix in 2.373 at least does not seem to hurt, and is likely helping. The WebSocketTimeout exception you're seeing, in the absence of another exception from jenkins.websocket.Jetty10Provider#sendPing , seems like a legitimate failure to receive an acknowledgement from the ping request within the allotted timeout period. Could you try running with --httpKeepAliveTimeout=30000 (or --httpsKeepAliveTimeout=30000 ), which is the Jetty (but not Winstone) default? If things are more stable for you with that value, I might consider changing the Winstone default to match the upstream Jetty default.

          Bryce Lovell added a comment -

          basil, updated the httpKeepAliveTimeout (http in my instance) to 30000, restarted, and have built 6 times consecutively. Haven't seen the error yet, so far so good.

          Bryce Lovell added a comment - basil , updated the httpKeepAliveTimeout (http in my instance) to 30000, restarted, and have built 6 times consecutively. Haven't seen the error yet, so far so good.

          Basil Crow added a comment -

          Jenkins 2.375 has been released with both jenkinsci/jenkins#7195 and jenkinsci/winstone#296 thereby fixing all known WebSocket regressions relating to the upgrade of Jetty from version 9 to version 10. I am closing this ticket; if you continue to encounter issues, please open a separate ticket with log files from both the controller (both general and agent-specific controller-side logs) and agent side (ideally a full support-core bundle) from several minutes before the disconnection event occurs all the way through to the disconnection event.

          Basil Crow added a comment - Jenkins 2.375 has been released with both jenkinsci/jenkins#7195 and jenkinsci/winstone#296 thereby fixing all known WebSocket regressions relating to the upgrade of Jetty from version 9 to version 10. I am closing this ticket; if you continue to encounter issues, please open a separate ticket with log files from both the controller (both general and agent-specific controller-side logs) and agent side (ideally a full support-core bundle) from several minutes before the disconnection event occurs all the way through to the disconnection event.

          George Yu added a comment -

          Tried 2.375 and got the problem in less than 2 hours, org.eclipse.jetty.websocket.core.exception.WebSocketTimeoutException: Connection Idle Timeout. We have jobs that run over 15 hours.  Reverted all the way back to Jenkins 2.346.3 LTS as a workaround

          George Yu added a comment - Tried 2.375 and got the problem in less than 2 hours, org.eclipse.jetty.websocket.core.exception.WebSocketTimeoutException: Connection Idle Timeout. We have jobs that run over 15 hours.  Reverted all the way back to Jenkins 2.346.3 LTS as a workaround

          Basil Crow added a comment -

          Basil Crow added a comment - gyu Please read JENKINS-69509 (comment) .

          George Yu added a comment -

          Created a separate ticket, JENKINS-69955

          George Yu added a comment - Created a separate ticket,  JENKINS-69955

          basil (I know this ticket is resolved I do however have one quick and on topic question.) Some colleagues of mine are also still reporting this error they want to increase the httpKeepAliveTimeout value. They cannot implement my workaround, so they are stuck. I am Googling to see how to do this, but I cannot find directly how to do this could elaborate this please or provide some reference material? As an update to all of you my initial fix to communicate a "." every second to the terminal still holds strong no more errors have been encounter since then if you have the ability to manipulate the logging in the terminal then this could serve as a temporary fix. Basil in advance thank you for your help, it has been priceless!

          joey hendricks added a comment - basil (I know this ticket is resolved I do however have one quick and on topic question.) Some colleagues of mine are also still reporting this error they want to increase the httpKeepAliveTimeout value. They cannot implement my workaround, so they are stuck. I am Googling to see how to do this, but I cannot find directly how to do this could elaborate this please or provide some reference material? As an update to all of you my initial fix to communicate a "." every second to the terminal still holds strong no more errors have been encounter since then if you have the ability to manipulate the logging in the terminal then this could serve as a temporary fix. Basil in advance thank you for your help, it has been priceless!

          Basil Crow added a comment -

          I will not answer any further questions in this ticket.

          Basil Crow added a comment - I will not answer any further questions in this ticket.

            basil Basil Crow
            ninjaman1159 joey hendricks
            Votes:
            6 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: