Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70812

Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • amazon-ecs-plugin

      Context :

      1. Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
      2. When jenkins-agent(s) is connected to controller over WebSocket, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
      3. If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
      4. Logs for stopped jenkins-agent container 
      Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: Handshake error.
      io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
          at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
          at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
          at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
          at hudson.remoting.Engine.runWebSocket(Engine.java:678)
          at hudson.remoting.Engine.run(Engine.java:499)
      Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
          at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
          at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
          at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
          at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          at java.base/java.lang.Thread.run(Unknown Source) 

      I have used below environment variables in jenkins-agent task definition.

      -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true 

       

      Expected Results 
      a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console

      b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).

       

            Unassigned Unassigned
            vishalci Vishal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: