Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67082

(Swarm) Agents fail to reconnect to controller after reboot

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • remoting, swarm-plugin
    • None
    • Linux (Controller), Windows (Agents)
      Controller Version: 2.263.1
      Swarm Version: 3.24
      Remoting Version: 4.5

      We do daily maintenance on our Windows Agents, which includes a reboot. This works fine most of the time. The machines reboot and the Swarm Agent (which runs as a Windows service) just reconnects to the controller and is ready to run builds again.

      However, after some time (maybe days or a couple of weeks), agents can't connect anymore until the controller is restarted.

      In the agent log I see messages like the following:

      INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
      Nov 08, 2021 12:56:29 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
      WARNING: Connection refused: connect
      Nov 08, 2021 12:56:29 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: https://<jenkins-controller>/ provided port:35725 is not reachable

      On the controller, OTOH, the agent shows up as being offline and subsequent connection attempts result in

      SEVERE: An error occurred
      hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 409
      Agent "myAgent" already exists.

      If the agent is removed from the controller, the same happens again. The only way to resolve the situation is to restart the controller.

      I wonder whether this might be related to JENKINS-57831.

            Unassigned Unassigned
            dhs Dirk Heinrichs
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: