-
Bug
-
Resolution: Not A Defect
-
Major
-
jenkins instance:
jenkins core 2.263.1
CentOS Linux 7 (Core)
kubernetes plugin 1.28.4
jenkins agent remoting VERSION=4.6
-websocket flag passed to jenkins agent
I get intermittent agent disconnects while build is running. I'll try to provide as much info, let me know what else I can check.
- Jenkins master java version 11 (java-11-openjdk-11.0.5.10) started with hudson.slaves.ChannelPinger.pingIntervalSeconds 30 in order to avoid disconnects
- Nginx reverse proxy in use and ssl timeout is 5 minutes, which was too close to the default hudson.slaves.ChannelPinger.pingIntervalSeconds, so was reduced to 30 seconds with good results, and reduced the number of disconnects per day (stack trace was different and did not show a SIGHUP)
- jenkins masters are on premise
- jenkins agents are in GKE GCP kubernetes version 1.16.5
- jenkins agent container image has default java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-1~deb9u1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode) - remoting VERSION=4.6
- -websocket flag passed to jenkins agent via the k8s plugin extra cli command, I noticed afterwards there is a checkbox for websocket (in the kubernetes plugin config), but couldn't find docs to go with it, should I switch to using that?
- In terms of sizing, we peak to about 400 jenkins-agents / pods connected at a time, the limit is set to 500 in the jenkins kubernetes plugin configuration
- The issue happens even when load is low
The connection is established fine, but intermittently gets disconnected. Let me know what else I can look at.
Stack trace:
SignalException: SIGHUP FATAL: command execution failed java.nio.channels.ClosedChannelException at jenkins.agents.WebSocketAgents$Session.closed(WebSocketAgents.java:141) at jenkins.websocket.WebSocketSession.onWebSocketSomething(WebSocketSession.java:91) at com.sun.proxy.$Proxy105.onWebSocketClose(Unknown Source) at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149) at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.disconnect(AbstractWebSocketConnection.java:316) at org.eclipse.jetty.websocket.common.io.DisconnectCallback.succeeded(DisconnectCallback.java:42) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection$CallbackBridge.writeSuccess(AbstractWebSocketConnection.java:86) at org.eclipse.jetty.websocket.common.io.FrameFlusher.notifyCallbackSuccess(FrameFlusher.java:359) at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeedEntries(FrameFlusher.java:288) at org.eclipse.jetty.websocket.common.io.FrameFlusher.succeeded(FrameFlusher.java:280) at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:293) at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264) at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193) at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:181) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:510) at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.base/java.lang.Thread.run(Thread.java:834)
- relates to
-
JENKINS-62576 Websockets connection unstable since remoting 4.2.1 (LTS 2.222.4)
-
- Open
-
Probably.
Potentially yes—either because the actual network throughput cannot handle the traffic, or the controller JVM is not able to keep up with all the work, or the controller is generally OK but for some reason is not processing this node monitor in a timely fashion, or the fix of
JENKINS-18671was incorrect, etc.Then try disabling this monitor in https://jenkins/computer/configure and see if the problem clears up. This and other monitors should arguably just be suppressed on Cloud agents: the whole system of node monitors only makes sense in the context of a modest-sized pool of static agents that an admin is directly managing.