-
Bug
-
Resolution: Unresolved
-
Critical
Context :
- Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images
- When jenkins-agent(s) is connected to controller over WebSocket, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster.
- If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller
- Logs for stopped jenkins-agent container
Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Handshake error.
io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error.
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849)
at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337)
at hudson.remoting.Engine.runWebSocket(Engine.java:678)
at hudson.remoting.Engine.run(Engine.java:499)
Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500.
at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295)
at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279)
at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source)
at java.base/sun.nio.ch.Invoker$2.run(Unknown Source)
at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
I have used below environment variables in jenkins-agent task definition.
-e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true
Expected Results
a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console
b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent).
[JENKINS-70812] Jenkins-agent fails to reconnect to jenkins controller (jdk 17) when redeployed in AWS ECS
Description |
Original:
Context :
# Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster. # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller # Logs for stopped jenkins-agent container {code:java} Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Handshake error. io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849) at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337) at hudson.remoting.Engine.runWebSocket(Engine.java:678) at hudson.remoting.Engine.run(Engine.java:499) Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279) at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source) at java.base/sun.nio.ch.Invoker$2.run(Unknown Source) at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} # I have used below environment variables in jenkins-agent task definition. {code:java} -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code} # Expected Results a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent). |
New:
Context :
# Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster. # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller # Logs for stopped jenkins-agent container {code:java} Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Handshake error. io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849) at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337) at hudson.remoting.Engine.runWebSocket(Engine.java:678) at hudson.remoting.Engine.run(Engine.java:499) Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279) at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source) at java.base/sun.nio.ch.Invoker$2.run(Unknown Source) at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} I have used below environment variables in jenkins-agent task definition. {code:java} -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code} Expected Results a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent). |
Component/s | New: remoting [ 15489 ] |
Description |
Original:
Context :
# Two agents-agents and jenkins controller are deployed are in AWS ECS as services. Controller and agents are using *-jdk17 container images # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster. # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller # Logs for stopped jenkins-agent container {code:java} Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Handshake error. io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849) at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337) at hudson.remoting.Engine.runWebSocket(Engine.java:678) at hudson.remoting.Engine.run(Engine.java:499) Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279) at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source) at java.base/sun.nio.ch.Invoker$2.run(Unknown Source) at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} I have used below environment variables in jenkins-agent task definition. {code:java} -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code} Expected Results a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent). |
New:
Context :
# Two agents and jenkins controller are deployed in AWS ECS as services. Controller and agents are using *-jdk17 container images # When jenkins-agent(s) is connected to controller over {*}WebSocket{*}, If I redeploy jenkins-agent from AWS ECS console, new tasks (new containers of jenkins-agent) fails to connect to controller and new containers stops after 10-12 seconds. Controller and agent are in same aws ecs cluster. # If I stop running container of jenkins-agent which disconnects agent, new task (new container of jenkins-agent) connects successfully to controller # Logs for stopped jenkins-agent container {code:java} Mar 13, 2023 8:39:11 AM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Handshake error. io.jenkins.remoting.shaded.jakarta.websocket.DeploymentException: Handshake error. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3$1.run(ClientManager.java:658) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$3.run(ClientManager.java:696) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager$SameThreadExecutorService.execute(ClientManager.java:849) at java.base/java.util.concurrent.AbstractExecutorService.submit(Unknown Source) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:493) at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.ClientManager.connectToServer(ClientManager.java:337) at hudson.remoting.Engine.runWebSocket(Engine.java:678) at hudson.remoting.Engine.run(Engine.java:499) Caused by: io.jenkins.remoting.shaded.org.glassfish.tyrus.core.HandshakeException: Response code was not 101: 500. at io.jenkins.remoting.shaded.org.glassfish.tyrus.client.TyrusClientEngine.processResponse(TyrusClientEngine.java:301) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.ClientFilter.processRead(ClientFilter.java:167) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.handleRead(SslFilter.java:402) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.SslFilter.processRead(SslFilter.java:365) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:111) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.Filter.onRead(Filter.java:113) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:295) at io.jenkins.remoting.shaded.org.glassfish.tyrus.container.jdk.client.TransportFilter$4.completed(TransportFilter.java:279) at java.base/sun.nio.ch.Invoker.invokeUnchecked(Unknown Source) at java.base/sun.nio.ch.Invoker$2.run(Unknown Source) at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source) {code} I have used below environment variables in jenkins-agent task definition. {code:java} -e JENKINS_AGENT_NAME=jenkins-agent -e JENKINS_SECRET=<secret> -e JENKINS_URL=<jenkins-url> -e JENKINS_WEB_SOCKET=true {code} Expected Results a. new containers of agents should reconnect to controller over websocket when jenkins-agents are redeployed from aws ecs cluster/console b. new deployment of jenkins-agent service should complete without stopping old task (running container of jenkins-agent). |
Priority | Original: Major [ 3 ] | New: Blocker [ 1 ] |
Priority | Original: Blocker [ 1 ] | New: Critical [ 2 ] |
Component/s | New: amazon-ecs-plugin [ 20840 ] | |
Component/s | Original: core [ 15593 ] | |
Component/s | Original: remoting [ 15489 ] |
Still experiencing "websocket.DeploymentException: Handshake error" errors in Jenkins version 2.387.3. This is impacting our jenkins prod server.
Stopping old container of jenkins-agent when new deployment of jenkins-agent is in-progress causes downtime in jenkins.