Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70812

Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • amazon-ecs-plugin

      Context :

      1. Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
      2. When jenkins-agent(s) is connected to controller over WebSocket, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
      3. If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
      4. Logs for stopped jenkins-agent container 
      Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: Handshake error.
      io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
          at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
          at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
          at hudson.remoting.Engine.runWebSocket(Engine.java:678)
          at hudson.remoting.Engine.run(Engine.java:499)
      Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
          at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
          at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
          at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at java.base/java.lang.Thread.run(Unknown Source) 

      I have used below environment variables in jenkins-agent task definition.

      -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true 

       

      Expected Results 
      a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

      b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

       

          [JENKINS-70812] Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS

          Vishal added a comment - - edited

          Still experiencing "websocket.DeploymentException: Handshake error" errors in Jenkins version 2.387.3. This is impacting our jenkins prod server. 
          Stopping old container of jenkins-agent when new deployment of jenkins-agent is in-progress causes downtime in jenkins.

          Vishal added a comment - - edited Still experiencing "websocket.DeploymentException: Handshake error" errors in Jenkins version 2.387.3 . This is impacting our jenkins prod server.  Stopping old container of jenkins-agent when new deployment of jenkins-agent is in-progress causes downtime in jenkins.

          Vishal added a comment -

          Hi markewaite , is it possible to prioritise this bug ? 
          It is affecting production workflows. 

          Attaching latest logs of jenkins-agents 

          Jul 19, 2023 9:21:56 AM hudson.remoting.jnlp.Main createEngineINFO: Setting up agent: jenkins-agent-1Jul 19, 2023 9:21:56 AM hudson.remoting.Engine startEngineINFO: Using Remoting version: 3107.v665000b_51092Jul 19, 2023 9:21:56 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDirINFO: Using /home/jenkins/agent/remoting as a remoting work directoryJul 19, 2023 9:21:56 AM org.jenkinsci.remoting.engine.WorkDirManager setupLoggingINFO: Both error and output logs will be printed to /home/jenkins/agent/remotingJul 19, 2023 9:21:56 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: Handshake error.io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)	at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)	at hudson.remoting.Engine.runWebSocket(Engine.java:678)	at hudson.remoting.Engine.run(Engine.java:499)Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.	at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)	at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)	at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)	at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)	at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)	at java.base/java.lang.Thread.run(Unknown Source) 

          Thanks !

          Vishal added a comment - Hi markewaite , is it possible to prioritise this bug ?  It is affecting production workflows.  Attaching latest logs of jenkins-agents  Jul 19, 2023 9:21:56 AM hudson.remoting.jnlp.Main createEngineINFO: Setting up agent: jenkins-agent-1Jul 19, 2023 9:21:56 AM hudson.remoting.Engine startEngineINFO: Using Remoting version: 3107.v665000b_51092Jul 19, 2023 9:21:56 AM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDirINFO: Using /home/jenkins/agent/remoting as a remoting work directoryJul 19, 2023 9:21:56 AM org.jenkinsci.remoting.engine.WorkDirManager setupLoggingINFO: Both error and output logs will be printed to /home/jenkins/agent/remotingJul 19, 2023 9:21:56 AM hudson.remoting.jnlp.Main$CuiListener errorSEVERE: Handshake error.io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849) at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337) at hudson.remoting.Engine.runWebSocket(Engine.java:678) at hudson.remoting.Engine.run(Engine.java:499)Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279) at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source) at java.base/sun.nio.ch.Invoker$2.run(Unknown Source) at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang. Thread .run(Unknown Source) Thanks !

          Mark Waite added a comment -

          vishalci the best way for you to raise the priority of this bug is for you to investigate the bug and resolve it.

          This is an opportunity for your company to meet its own business needs by contributing to open source. Persuade your manager or others in your organization that someone from your company should be assigned to investigate and resolve this issue.

          Since it is affecting production workloads in your company, that is a great reason for your company to find a solution. Identifying and solving the issue will benefit your company and will benefit the Jenkins community.

          Mark Waite added a comment - vishalci the best way for you to raise the priority of this bug is for you to investigate the bug and resolve it. This is an opportunity for your company to meet its own business needs by contributing to open source. Persuade your manager or others in your organization that someone from your company should be assigned to investigate and resolve this issue. Since it is affecting production workloads in your company, that is a great reason for your company to find a solution. Identifying and solving the issue will benefit your company and will benefit the Jenkins community.

            Unassigned Unassigned
            vishalci Vishal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: