Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-66172

Unexplained websocket idle timeout disconnects from Windows 10 agents and Jenkins controllers in AWS ECS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • remoting
    • Jenkins 2.263.3, On-prem Windows 10 agents connecting via websockets to Jenkins controller in AWS ECS.

      We have ~20 on-prem Windows 10 agents using websockets to connect to Jenkins controllers running on AWS ECS.    Unfortunately these agents have to run on-prem due to embedded development boards connected to these Windows 10 agents for running regression test suites and run for 1-2 hours.

      We can track down some of the disconnects to networking blips which is expected with the connection from on-prem into AWS cloud.

      But we also have a small set of disconnects which only occur when the job is running on the node.   The agent in question only seems to disconnect while running a job.  I setup another windows 10 agent in our dev environment just connected and no jobs running.  It stays connected for multiple weeks while the agent running the builds seems to disconnect 1-2  times per week.   

      I configured some websocket systems logs and the log shows the connection closed due to "Idle timeout expired".   Looks like a 1 second timeout on something...which seems pretty short.

      Jul 06, 2021 9:07:17 AM WARNING jenkins.agents.WebSocketAgents$Session error null
      java.util.concurrent.TimeoutException: Idle timeout expired: 2463/1000 ms
      Caused: org.eclipse.jetty.websocket.api.CloseException
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onReadTimeout(AbstractWebSocketConnection.java:564)
      	at org.eclipse.jetty.io.AbstractConnection.onFillInterestedFailed(AbstractConnection.java:172)
      	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillInterestedFailed(AbstractWebSocketConnection.java:539)
      	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.failed(AbstractConnection.java:317)
      	at org.eclipse.jetty.io.FillInterest.onFail(FillInterest.java:140)
      	at org.eclipse.jetty.io.AbstractEndPoint.onIdleExpired(AbstractEndPoint.java:407)
      	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171)
      	at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113)
      	at org.eclipse.jetty.io.IdleTimeout.activate(IdleTimeout.java:136)
      	at org.eclipse.jetty.io.IdleTimeout.setIdleTimeout(IdleTimeout.java:100)
      	at org.eclipse.jetty.server.LowResourceMonitor.setLowResources(LowResourceMonitor.java:412)
      	at org.eclipse.jetty.server.LowResourceMonitor.monitor(LowResourceMonitor.java:352)
      	at org.eclipse.jetty.server.LowResourceMonitor$1.run(LowResourceMonitor.java:84)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

       

      Any ideas?  I was going to start running a test job on my dev agent and see if it still remains stable while it running a job...

            Unassigned Unassigned
            johnlengeling John Lengeling
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: