Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31084

Swarm slaves should allow some level of control interaction

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Minor Minor
    • swarm-plugin

      This is a feature idea that I'm willing to implement but I'd like to hear maintainers' thoughts on this first.

      Case:

      I manage slaves with puppet. Bringing them up is easy - configure, run java. Shutting down (say, for a reboot or vm teardown) is not so easy - I'm very likely to kill a running job. So I have a bash loop that counts the java processes not including the swarm process itself. If the count is 0, I can shut down. But that's crude and unreliable - not all subprocesses will be java, and there is certainly a chance of a new job starting in the time it takes to kill the swarm instance.

      The proper way is of course to interact with the master - mark offline, wait, reboot. But this requires the swarm nodes to have extensive knowledge of the master, which seems to contradict the purpose of swarm (it's managed from the slave side without any master interaction, and thus should be able to dynamically come and go).

      Things I'm considering:

      SOME COMMAND may be "go offline", "shutdown", "block till idle" etc - but may also be something that can return status - ie "is idle?", "is offline?", etc (obviously not for the Signals approach)

      There are definitely some problems with these solutions, so I'm curious what others think. It's also possible that I'm overlooking a simpler way.

          [JENKINS-31084] Swarm slaves should allow some level of control interaction

          Oleg Nenashev added a comment -

          KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Oleg Nenashev added a comment - KK does not maintain this plugin anymore. Moving to unassigned to set the expectation

          Basil Crow added a comment -

          This is a feature idea that I'm willing to implement but I'd like to hear maintainers' thoughts on this first.

          This sounds like a reasonable feature to me. In many ways, it's similar to Postgres' Smart Shutdown mode:

          After receiving SIGTERM, the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate.

          Implementing this in Swarm via a signal also seems reasonable. This also would work well with, e.g. systemd, which could send either SIGTERM or SIGINT to the Swarm client as appropriate (just like in the Postgres systemd unit file).

          On receiving the signal, the client would need to communicate with the server to do the graceful shutdown. A new backend endpoint would need to be created. When this endpoint is called, it would need to invoke the API equivalent of the "Mark this node temporarily offline" feature in the UI (which waits for the current task to complete, then takes the node offline). The endpoint would be in plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java and would look something like this:

          Node node = getNodeByName(name, rsp);
          node.toComputer().setTemporarilyOffline(true);
          

          Once the graceful shutdown has been initiated, the client would need to wait for the node to be unused. This could be done with another backend endpoint:

          Node node = getNodeByName(name, rsp);
          boolean isOffline = node.toComputer().isOffline();
          

          The client would then have to wait in a loop, polling this endpoint for the node to be offline. Once the node is offline, the client could terminate.

          I welcome any PRs to implement this and would be happy to review them.

          Basil Crow added a comment - This is a feature idea that I'm willing to implement but I'd like to hear maintainers' thoughts on this first. This sounds like a reasonable feature to me. In many ways, it's similar to Postgres' Smart Shutdown mode : After receiving SIGTERM , the server disallows new connections, but lets existing sessions end their work normally. It shuts down only after all of the sessions terminate. Implementing this in Swarm via a signal also seems reasonable. This also would work well with, e.g. systemd, which could send either SIGTERM or SIGINT to the Swarm client as appropriate (just like in the Postgres systemd unit file ). On receiving the signal, the client would need to communicate with the server to do the graceful shutdown. A new backend endpoint would need to be created. When this endpoint is called, it would need to invoke the API equivalent of the "Mark this node temporarily offline" feature in the UI (which waits for the current task to complete, then takes the node offline). The endpoint would be in plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java and would look something like this: Node node = getNodeByName(name, rsp); node.toComputer().setTemporarilyOffline( true ); Once the graceful shutdown has been initiated, the client would need to wait for the node to be unused. This could be done with another backend endpoint: Node node = getNodeByName(name, rsp); boolean isOffline = node.toComputer().isOffline(); The client would then have to wait in a loop, polling this endpoint for the node to be offline. Once the node is offline, the client could terminate. I welcome any PRs to implement this and would be happy to review them.

            Unassigned Unassigned
            akom Alexander Komarov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: