Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64598

Jenkins agent disconnects on k8s with SIGHUP / ClosedChannelException

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Major Major
    • jenkins instance:
      jenkins core 2.263.1
      CentOS Linux 7 (Core)
      kubernetes plugin 1.28.4

      jenkins agent remoting VERSION=4.6
      -websocket flag passed to jenkins agent

      I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

       

      • Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
      • Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
      • jenkins masters are on premise
      • jenkins agents are in GKE GCP kubernetes version 1.16.5
      • jenkins agent container image has default java -version
        openjdk version "1.8.0_232"
        OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
        OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
      • remoting VERSION=4.6
      • -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket (in the kubernetes plugin config), but couldn't find docs to go with it, should I switch to using that?
      • In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
      • The issue happens even when load is low

      The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

       

      Stack trace:

       

      SignalException: SIGHUP
      FATAL: command execution failed
      java.nio.channels.ClosedChannelException
      	at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
      	at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
      	at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
      	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
      	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
      	at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
      	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
      	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
      	at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
      	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
      	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
      	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
      	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
      	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
      	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
      	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
      	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
      	at java.base/java.lang.Thread.run(Thread.java:834)

          [JENKINS-64598] Jenkins agent disconnects on k8s with SIGHUP / ClosedChannelException

          Samuel Beaulieu created issue -
          Samuel Beaulieu made changes -
          Description Original: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
          openjdk version "1.8.0_232"
          OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
          OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in hte jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          New: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in hte jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          Samuel Beaulieu made changes -
          Description Original: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in hte jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          New: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * remoting VERSION=4.6
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in hte jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          Samuel Beaulieu made changes -
          Description Original: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * remoting VERSION=4.6
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in hte jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          New: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * remoting VERSION=4.6
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          Oleg Nenashev made changes -
          Labels New: websocket
          Samuel Beaulieu made changes -
          Description Original: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * remoting VERSION=4.6
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket, but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}
          New: I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.

           
           * Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
           * Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
           * jenkins masters are on premise
           * jenkins agents are in GKE GCP kubernetes version 1.16.5
           * jenkins agent container image has default java -version
           openjdk version "1.8.0_232"
           OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
           OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
           * remoting VERSION=4.6
           * -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket (in the kubernetes plugin config), but couldn't find docs to go with it, should I switch to using that?
           * In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
           * The issue happens even when load is low

          The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.

           

          Stack trace:

           
          {code:java}
          SignalException: SIGHUP
          FATAL: command execution failed
          java.nio.channels.ClosedChannelException
          at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141)
          at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91)
          at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source)
          at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
          at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316)
          at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280)
          at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293)
          at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264)
          at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193)
          at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
          at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510)
          at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440)
          at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
          at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
          at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
          at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
          at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
          at java.base/java.lang.Thread.run(Thread.java:834){code}

          Tim Jacomb added a comment -

          cc jglick

          Tim Jacomb added a comment - cc jglick

          My next step is going to set the default java version for the jenkins agent container to java 11. I've also noticed higher CPU load and CPU spikes when we moved more traffic to k8s, and trying to track this down. Is the pingIntervalSeconds a heavy operation for the jenkins server in general? would k8s agent increase CPU usage?

          Samuel Beaulieu added a comment - My next step is going to set the default java version for the jenkins agent container to java 11. I've also noticed higher CPU load and CPU spikes when we moved more traffic to k8s, and trying to track this down. Is the pingIntervalSeconds a heavy operation for the jenkins server in general? would k8s agent increase CPU usage?

          Jesse Glick added a comment -

          As far as I know there is no straightforward way to track down why the connection gets broken. Possibly related to your nginx configuration.

          Jesse Glick added a comment - As far as I know there is no straightforward way to track down why the connection gets broken. Possibly related to your nginx configuration.

          Samuel Beaulieu added a comment - - edited

          I have moved the agents to using java 11, but it did not help with the issue.

          Would the nginx logs show something?

           

          Some metrics it happens about 30-60 a day with a few hundred builds. It can happen at undertermined time, for example after 5 minutes, 60 minutes or 90+ minutes of successfully running. No other pattern is noticed.

           

          Now, from the nginx docs on websockets:

          Alternatively, the proxied server can be configured to periodically send WebSocket ping frames to reset the timeout and check if the connection is still alive.

          I am assuming thats what I can use hudson.slaves.ChannelPinger.pingIntervalSeconds for? All the timeouts I have set are above the pingIntervalSeconds of 30 seconds. I'll try http://nginx.org/en/docs/http/ngx_http_core_module.html#lingering_close set to 'always' but I'm not convinced this is the root cause.

           

          We also get a similar disconnect (although with different stack trace, as expected) without websockets and going to the defined jnlp port. Moving to websocket was an attempt to get rid of those disconnects, we hopped that standard web spec would be more resilient than the jenkins agent jnlp connection. Any preference at this point from cloudbees engineers?

          Samuel Beaulieu added a comment - - edited I have moved the agents to using java 11, but it did not help with the issue. Would the nginx logs show something?   Some metrics it happens about 30-60 a day with a few hundred builds. It can happen at undertermined time, for example after 5 minutes, 60 minutes or 90+ minutes of successfully running. No other pattern is noticed.   Now, from the nginx docs on websockets: Alternatively, the proxied server can be configured to periodically send WebSocket ping frames to reset the timeout and check if the connection is still alive. I am assuming thats what I can use  hudson.slaves.ChannelPinger.pingIntervalSeconds for? All the timeouts I have set are above the pingIntervalSeconds of 30 seconds. I'll try http://nginx.org/en/docs/http/ngx_http_core_module.html#lingering_close set to 'always' but I'm not convinced this is the root cause.   We also get a similar disconnect (although with different stack trace, as expected) without websockets and going to the defined jnlp port. Moving to websocket was an attempt to get rid of those disconnects, we hopped that standard web spec would be more resilient than the jenkins agent jnlp connection. Any preference at this point from cloudbees engineers?

            jthompson Jeff Thompson
            sbeaulie Samuel Beaulieu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: