-
Bug
-
Resolution: Unresolved
-
Critical
Context :
- Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
- When jenkins-agent(s) is connected to controller over WebSocket, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
- If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
- Logs for stopped jenkins-agent container
Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Handshake error.
io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
at hudson.remoting.Engine.runWebSocket(Engine.java:678)
at hudson.remoting.Engine.run(Engine.java:499)
Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
I have used below environment variables in jenkins-agent task definition.
-e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true
Expected Results
a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console
b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).