-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
Linux (Controller), Windows (Agents)
Controller Version: 2.263.1
Swarm Version: 3.24
Remoting Version: 4.5
We do daily maintenance on our Windows Agents, which includes a reboot. This works fine most of the time. The machines reboot and the Swarm Agent (which runs as a Windows service) just reconnects to the controller and is ready to run builds again.
However, after some time (maybe days or a couple of weeks), agents can't connect anymore until the controller is restarted.
In the agent log I see messages like the following:
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Nov 08, 2021 12:56:29 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
WARNING: Connection refused: connect
Nov 08, 2021 12:56:29 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: https://<jenkins-controller>/ provided port:35725 is not reachable
On the controller, OTOH, the agent shows up as being offline and subsequent connection attempts result in
SEVERE: An error occurred
hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 409
Agent "myAgent" already exists.
If the agent is removed from the controller, the same happens again. The only way to resolve the situation is to restart the controller.
I wonder whether this might be related to JENKINS-57831.