Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67082

(Swarm) Agents fail to reconnect to controller after reboot

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: remoting, swarm-plugin
    • Labels:
      None
    • Environment:
      Linux (Controller), Windows (Agents)
      Controller Version: 2.263.1
      Swarm Version: 3.24
      Remoting Version: 4.5
    • Similar Issues:

      Description

      We do daily maintenance on our Windows Agents, which includes a reboot. This works fine most of the time. The machines reboot and the Swarm Agent (which runs as a Windows service) just reconnects to the controller and is ready to run builds again.

      However, after some time (maybe days or a couple of weeks), agents can't connect anymore until the controller is restarted.

      In the agent log I see messages like the following:

      INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
      Nov 08, 2021 12:56:29 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
      WARNING: Connection refused: connect
      Nov 08, 2021 12:56:29 AM hudson.remoting.jnlp.Main$CuiListener error
      SEVERE: https://<jenkins-controller>/ provided port:35725 is not reachable

      On the controller, OTOH, the agent shows up as being offline and subsequent connection attempts result in

      SEVERE: An error occurred
      hudson.plugins.swarm.RetryException: Failed to create a Swarm agent on Jenkins. Response code: 409
      Agent "myAgent" already exists.

      If the agent is removed from the controller, the same happens again. The only way to resolve the situation is to restart the controller.

      I wonder whether this might be related to JENKINS-57831.

        Attachments

          Activity

          Hide
          dhs Dirk Heinrichs added a comment -

          Any news on this? It just happened again, only a few hours after a controller reboot.

          Show
          dhs Dirk Heinrichs added a comment - Any news on this? It just happened again, only a few hours after a controller reboot.
          Hide
          ethorsa ethorsa added a comment - - edited

          Do you pass -deleteExistingClients to the swarm CLI?

          Show
          ethorsa ethorsa added a comment - - edited Do you pass  -deleteExistingClients to the swarm CLI?
          Hide
          dhs Dirk Heinrichs added a comment -

          No, I don't.

          Show
          dhs Dirk Heinrichs added a comment - No, I don't.
          Hide
          ethorsa ethorsa added a comment - - edited

          Using deleteExistingClients a potentially existing agent is removed and the reconnected one is created. Another approach is disableClientsUniqueId, where each connected agent gets an unique ID assigned.

          Docs: https://github.com/jenkinsci/swarm-plugin#available-options 

          Show
          ethorsa ethorsa added a comment - - edited Using deleteExistingClients a potentially existing agent is removed and the reconnected one is created. Another approach is disableClientsUniqueId, where each connected agent gets an unique ID assigned. Docs: https://github.com/jenkinsci/swarm-plugin#available-options  
          Hide
          dhs Dirk Heinrichs added a comment -

          Yep, I use the latter.

          Show
          dhs Dirk Heinrichs added a comment - Yep, I use the latter.

            People

            Assignee:
            jthompson Jeff Thompson
            Reporter:
            dhs Dirk Heinrichs
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: