Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70812

Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • amazon-ecs-plugin

      Context :

      1. Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
      2. When jenkins-agent(s) is connected to controller over WebSocket, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
      3. If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
      4. Logs for stopped jenkins-agent container 
      Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: Handshake error.
      io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
          at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
          at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
          at hudson.remoting.Engine.runWebSocket(Engine.java:678)
          at hudson.remoting.Engine.run(Engine.java:499)
      Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
          at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
          at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
          at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at java.base/java.lang.Thread.run(Unknown Source) 

      I have used below environment variables in jenkins-agent task definition.

      -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true 

       

      Expected Results 
      a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

      b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

       

          [JENKINS-70812] Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS

          Vishal created issue -
          Vishal made changes -
          Description Original: Context :


           # Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images


           # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.


           # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller


           # Logs for stopped jenkins-agent container 

          {code:java}
          Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: Handshake error.
          io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
              at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
              at hudson.remoting.Engine.runWebSocket(Engine.java:678)
              at hudson.remoting.Engine.run(Engine.java:499)
          Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
              at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
              at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
              at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.base/java.lang.Thread.run(Unknown Source) {code}

           # I have used below environment variables in jenkins-agent task definition.
          {code:java}
          -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code}
           

           # Expected Results 
          a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

          b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

           
          New: Context :
           # Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images
           # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
           # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
           # Logs for stopped jenkins-agent container 

          {code:java}
          Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: Handshake error.
          io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
              at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
              at hudson.remoting.Engine.runWebSocket(Engine.java:678)
              at hudson.remoting.Engine.run(Engine.java:499)
          Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
              at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
              at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
              at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.base/java.lang.Thread.run(Unknown Source) {code}

          I have used below environment variables in jenkins-agent task definition.
          {code:java}
          -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code}
           

          Expected Results 
          a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

          b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

           
          Vishal made changes -
          Component/s New: remoting [ 15489 ]
          Vishal made changes -
          Description Original: Context :
           # Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images
           # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
           # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
           # Logs for stopped jenkins-agent container 

          {code:java}
          Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: Handshake error.
          io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
              at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
              at hudson.remoting.Engine.runWebSocket(Engine.java:678)
              at hudson.remoting.Engine.run(Engine.java:499)
          Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
              at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
              at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
              at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.base/java.lang.Thread.run(Unknown Source) {code}

          I have used below environment variables in jenkins-agent task definition.
          {code:java}
          -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code}
           

          Expected Results 
          a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

          b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

           
          New: Context :
           # Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
           # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
           # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
           # Logs for stopped jenkins-agent container 

          {code:java}
          Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
          SEVERE: Handshake error.
          io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
              at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
              at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
              at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
              at hudson.remoting.Engine.runWebSocket(Engine.java:678)
              at hudson.remoting.Engine.run(Engine.java:499)
          Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
              at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
              at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
              at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
              at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
              at java.base/java.lang.Thread.run(Unknown Source) {code}
          I have used below environment variables in jenkins-agent task definition.
          {code:java}
          -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code}
           

          Expected Results 
          a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

          b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

           
          Vishal made changes -
          Priority Original: Major [ 3 ] New: Blocker [ 1 ]
          Vishal made changes -
          Priority Original: Blocker [ 1 ] New: Critical [ 2 ]
          Mark Waite made changes -
          Component/s New: amazon-ecs-plugin [ 20840 ]
          Component/s Original: core [ 15593 ]
          Component/s Original: remoting [ 15489 ]

            Unassigned Unassigned
            vishalci Vishal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: