Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69955

WebSocketTimeoutException: Connection Idle Timeout

    • 2.395

      We first experienced the problem that websocket connections were closed down unexpectedly in Jenkins  2.361.1 LTS. The problem was reported in JENKINS-69509, and Jenkins 2.375 was released subsequently to address the issue. We tried Jenkins 2.375 and found the websocket problem still there. The websocket was closed down in less than 2 hours after the build started.  Attached all the necessary logs

      Reverted back to  Jenkins 2.346.3 LTS is a workaround that works for us.

      How to Reproduce

      • Start Jenkins 2.361.x or later with -Djenkins.websocket.pingInterval=120
      • Connect a Websocket agent
        --> Notice that the websocket agent disconnect/reconnect at every ping

      An interval of 120 is a way to consistently see the error. Though it should happen with any value > 30. It may happen with the default 30 but with a lower likelihood.

          [JENKINS-69955] WebSocketTimeoutException: Connection Idle Timeout

          Nik Reiman added a comment -

          I can now confirm that the `webSocket: true` option in the Swarm Client plugin seems to have been the culprit! We jut ran a test cluster for 4 days with no node disconnections. 🎉

          Nik Reiman added a comment - I can now confirm that the `webSocket: true` option in the Swarm Client plugin seems to have been the culprit! We jut ran a test cluster for 4 days with no node disconnections. 🎉

          Allan BURDAJEWICZ added a comment - - edited

          Websocket agents seem to be intermittently disconnecting. This problem is reproducible in current weekly 2.391, even just locally:

          • Spin up a new Jenkins controller
          • Create an inbound Websocket agent
          • Start the websocket agent

          Wait until you see the agent disconnecting:

          Feb. 21, 2023 3:39:31 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Connected
          Feb. 21, 2023 3:46:16 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Read side closed
          Feb. 21, 2023 3:46:16 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Terminated
          Feb. 21, 2023 3:46:26 PM hudson.remoting.jnlp.Main$CuiListener status
          INFO: Performing onReconnect operation.
          

          The controller show the timeoutexception:

          Feb. 21, 2023 3:46:16 PM jenkins.agents.WebSocketAgents$Session error
          WARNING: null
          org.eclipse.jetty.websocket.api.exceptions.WebSocketTimeoutException: Connection Idle Timeout
          	at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.convertCause(JettyWebSocketFrameHandler.java:524)
          	at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:258)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284)
          	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1468)
          	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1487)
          	at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519)
          	at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155)
          	at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197)
          	at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154)
          	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232)
          	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214)
          	at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:522)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.close(WebSocketCoreSession.java:239)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processHandlerError(WebSocketCoreSession.java:371)
          	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onIdleExpired(WebSocketConnection.java:233)
          	at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
          	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
          	at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
          	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
          	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          	at java.base/java.lang.Thread.run(Thread.java:829)
          Caused by: org.eclipse.jetty.websocket.core.exception.WebSocketTimeoutException: Connection Idle Timeout
          	... 10 more
          

          ****

          I am not 100% sure this is remoting. It looks like people are hitting this since the move to Jetty 10 jenkins.websocket.Jetty10Provider. I collected debug jetty log from the controller, hopefully that can help:

          I can only acknowledge that the default websocket connection timeout is 30s. And per Jetty, we get over it:

          Feb. 21, 2023 3:46:16 PM org.eclipse.jetty.io.IdleTimeout checkIdleTimeout
          FINE: SocketChannelEndPoint@45b2acfd[{l=/127.0.0.1:8081,r=/127.0.0.1:63856,OPEN,fill=FI,flush=W,to=30003/30000}{io=1/1,kio=1,kro=1}]->[WebSocketConnection@47fa53ab[SERVER,p=Parser@d1a2f85[s=START,c=0,o=0x0,m=-,l=-1],f=Flusher@7e9adc28[PROCESSING][queueSize=0,aggregate=null],g=org.eclipse.jetty.websocket.core.internal.Generator@6ef93c39]] idle timeout check, elapsed: 30003 ms, remaining: -3 ms
          

          Allan BURDAJEWICZ added a comment - - edited Websocket agents seem to be intermittently disconnecting. This problem is reproducible in current weekly 2.391, even just locally: Spin up a new Jenkins controller Create an inbound Websocket agent Start the websocket agent Wait until you see the agent disconnecting: Feb. 21, 2023 3:39:31 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Feb. 21, 2023 3:46:16 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Read side closed Feb. 21, 2023 3:46:16 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Feb. 21, 2023 3:46:26 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Performing onReconnect operation. The controller show the timeoutexception: Feb. 21, 2023 3:46:16 PM jenkins.agents.WebSocketAgents$Session error WARNING: null org.eclipse.jetty.websocket.api.exceptions.WebSocketTimeoutException: Connection Idle Timeout at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.convertCause(JettyWebSocketFrameHandler.java:524) at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onError(JettyWebSocketFrameHandler.java:258) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$closeConnection$2(WebSocketCoreSession.java:284) at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1468) at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1487) at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.closeConnection(WebSocketCoreSession.java:284) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.lambda$sendFrame$7(WebSocketCoreSession.java:519) at org.eclipse.jetty.util.Callback$3.succeeded(Callback.java:155) at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.notifyCallbackSuccess(TransformingFlusher.java:197) at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:154) at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232) at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214) at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:522) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.close(WebSocketCoreSession.java:239) at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.processHandlerError(WebSocketCoreSession.java:371) at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onIdleExpired(WebSocketConnection.java:233) at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407) at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170) at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:829) Caused by: org.eclipse.jetty.websocket.core.exception.WebSocketTimeoutException: Connection Idle Timeout ... 10 more **** I am not 100% sure this is remoting. It looks like people are hitting this since the move to Jetty 10 jenkins.websocket.Jetty10Provider . I collected debug jetty log from the controller, hopefully that can help: Agent name JENKINS-69955 Disconnection detected at 3:46:16 PM hudson.remoting-0.log.0 jenkins.agents.WebSocketAgents-0.log.0 org.jetty-0.log.0 I can only acknowledge that the default websocket connection timeout is 30s. And per Jetty, we get over it: Feb. 21, 2023 3:46:16 PM org.eclipse.jetty.io.IdleTimeout checkIdleTimeout FINE: SocketChannelEndPoint@45b2acfd[{l=/127.0.0.1:8081,r=/127.0.0.1:63856,OPEN,fill=FI,flush=W,to=30003/30000}{io=1/1,kio=1,kro=1}]->[WebSocketConnection@47fa53ab[SERVER,p=Parser@d1a2f85[s=START,c=0,o=0x0,m=-,l=-1],f=Flusher@7e9adc28[PROCESSING][queueSize=0,aggregate= null ],g=org.eclipse.jetty.websocket.core.internal.Generator@6ef93c39]] idle timeout check, elapsed: 30003 ms, remaining: -3 ms

          Dan Wang added a comment -

          Hi nre_ableton ,

           Could you share some tips on where to add the "webSocket: true" option?  

          Dan Wang added a comment - Hi nre_ableton ,  Could you share some tips on where to add the "webSocket: true" option?  

          Nik Reiman added a comment -

          sbc8112 it's an argument to the Swarm Client, in this case in a YAML configuration file. See https://github.com/jenkinsci/swarm-plugin#available-options. If you aren't using Swarm Client, then you should check whatever protocol your agents use to connect. Also note that the solution (for me, anyways), was not to specify this option. We were using web sockets before, and now we are not.

          Nik Reiman added a comment - sbc8112 it's an argument to the Swarm Client, in this case in a YAML configuration file. See https://github.com/jenkinsci/swarm-plugin#available-options. If you aren't using Swarm Client, then you should check whatever protocol your agents use to connect. Also note that the solution (for me, anyways), was not to specify this option. We were using web sockets before, and now we are not.

          Olivier Lamy added a comment -
          idle timeout check, elapsed: 30003 ms, remaining: -3 ms 

          really possible reason. Jetty have a default IdleTime out 30s.

          websocket is sending ping per default every 30s. (see https://github.com/jenkinsci/jenkins/blob/a3f31145e621ab0072bb872ecac93a2c6cbcbaae/core/src/main/java/jenkins/websocket/WebSocketSession.java#L58)

          so yup this ping can work or not work by a matter of few milliseconds (in this logs it's 3ms) it depends on the network and if you are lucky or not  

          possible workaround start jenkins master with

           -Djenkins.websocket.pingInterval=15

          ping delay will be shorter than Jetty idle timeout.

          Change the configuration of Jetty websocket container to have a  larger per default idle timeout.

          can be done around here https://github.com/jenkinsci/jenkins/blob/a3f31145e621ab0072bb872ecac93a2c6cbcbaae/websocket/jetty10/src/main/java/jenkins/websocket/Jetty10Provider.java#L55

          with something such

          JettyWebSocketServerContainer.getContainer(req.getServletContext()).setIdleTimeout(some duration); 

           

          Javadoc from here https://github.com/eclipse/jetty.project/blob/b7075161d015ddce23fbf3db873d5f6b539f6a6b/jetty-io/src/main/java/org/eclipse/jetty/io/IdleTimeout.java#L29

          a check is then made to see when the last operation took place. 

          so if nothing happen during 30s in the established websocket connection.... 

          Olivier Lamy added a comment - idle timeout check, elapsed: 30003 ms, remaining: -3 ms really possible reason. Jetty have a default IdleTime out 30s. websocket is sending ping per default every 30s. (see https://github.com/jenkinsci/jenkins/blob/a3f31145e621ab0072bb872ecac93a2c6cbcbaae/core/src/main/java/jenkins/websocket/WebSocketSession.java#L58) so yup this ping can work or not work by a matter of few milliseconds (in this logs it's 3ms) it depends on the network and if you are lucky or not   possible workaround start jenkins master with -Djenkins.websocket.pingInterval=15 ping delay will be shorter than Jetty idle timeout. Change the configuration of Jetty websocket container to have a  larger per default idle timeout. can be done around here https://github.com/jenkinsci/jenkins/blob/a3f31145e621ab0072bb872ecac93a2c6cbcbaae/websocket/jetty10/src/main/java/jenkins/websocket/Jetty10Provider.java#L55 with something such JettyWebSocketServerContainer.getContainer(req.getServletContext()).setIdleTimeout(some duration);   Javadoc from here https://github.com/eclipse/jetty.project/blob/b7075161d015ddce23fbf3db873d5f6b539f6a6b/jetty-io/src/main/java/org/eclipse/jetty/io/IdleTimeout.java#L29 a check is then made to see when the last operation took place. so if nothing happen during 30s in the established websocket connection.... 

          Allan BURDAJEWICZ added a comment - - edited

          I can definitely reproduce with 2.361.1 by adjusting the websocket ping interval. And I can't reproduce with 2.346.4.
          Updated the description with a reproduction scenario.

          IIUC the previous websocket timeout was 5 minutes. Set by the WebsocketPolicy at https://github.com/eclipse/jetty.project/blob/jetty-9.4.48.v20220622/jetty-websocket/websocket-api/src/main/java/org/eclipse/jetty/websocket/api/WebSocketPolicy.java#L81-L86

          Allan BURDAJEWICZ added a comment - - edited I can definitely reproduce with 2.361.1 by adjusting the websocket ping interval. And I can't reproduce with 2.346.4. Updated the description with a reproduction scenario. IIUC the previous websocket timeout was 5 minutes. Set by the WebsocketPolicy at https://github.com/eclipse/jetty.project/blob/jetty-9.4.48.v20220622/jetty-websocket/websocket-api/src/main/java/org/eclipse/jetty/websocket/api/WebSocketPolicy.java#L81-L86

          Hung added a comment -

          allan_burdajewicz Hi Allan, could you have some updates or workaround on this issue?

          Currently, i'm using Jenkins 2.375.1 and due to some reason I could not rollback to jenkins 2.346.3 LTS as suggestion above. 

          Hung added a comment - allan_burdajewicz Hi Allan, could you have some updates or workaround on this issue? Currently, i'm using Jenkins 2.375.1 and due to some reason I could not rollback to jenkins 2.346.3 LTS as suggestion above. 

          Olivier Lamy added a comment -

          leminhhung0110  this a PR ready. Currently you can use the workaround 
          -Djenkins.websocket.pingInterval=15
          or even less

          Olivier Lamy added a comment - leminhhung0110   this a PR ready. Currently you can use the workaround  -Djenkins.websocket.pingInterval=15 or even less

          Hung added a comment -

          olamy do you meant i will use this command when starting jenkins master "jenkins restart -Djenkins.websocket.pingInterval=15"?

          Hung added a comment - olamy do you meant i will use this command when starting jenkins master " jenkins restart -Djenkins.websocket.pingInterval=15 "?

          Olivier Lamy added a comment -

          leminhhung0110 I have no idea what your script called jenkins is doing   but Jenkins need to be started with the system property.

          Olivier Lamy added a comment - leminhhung0110 I have no idea what your script called jenkins is doing   but Jenkins need to be started with the system property.

            allan_burdajewicz Allan BURDAJEWICZ
            gyu George Yu
            Votes:
            13 Vote for this issue
            Watchers:
            25 Start watching this issue

              Created:
              Updated:
              Resolved: