Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-13140

Disconnected slaves come back online within a few minutes

    • Icon: New Feature New Feature
    • Resolution: Fixed
    • Icon: Minor Minor
    • core
    • None
    • Linux Jenkins Master and build slave

      I have a Jenkins installation with 20 build slaves. A couple of minutes after I click on "Disconnect slave", the slave is back online.

      The log contains:

      Mar 19, 2012 5:22:19 PM hudson.slaves.SlaveComputer tryReconnect
      INFO: Attempting to reconnect musxdodo77

      Somehow Jenkins is ignoring the "offline" tag when set manually.

          [JENKINS-13140] Disconnected slaves come back online within a few minutes

          Daniel Beck added a comment -
          • To prevent builds, Mark this node temporarily offline.
          • To prevent reconnect after manual disconnect, don't configure Jenkins to Keep this slave online as much as possible.

          Therefore this is a new feature. Reducing to minor since the original user commented this was just a misunderstanding of user options for preventing builds.

          Daniel Beck added a comment - To prevent builds, Mark this node temporarily offline . To prevent reconnect after manual disconnect, don't configure Jenkins to Keep this slave online as much as possible . Therefore this is a new feature. Reducing to minor since the original user commented this was just a misunderstanding of user options for preventing builds.

          Daniel Beck added a comment -

          As an alternative to Keep this slave online as much as possible, here's a plugin adding a retention strategy that does not reconnect slaves if they're temporarily marked offline by the user:

          https://github.com/daniel-beck/keep-slave-disconnected-plugin

          I'd consider this issue now (or with official release of this plugin at the latest) resolved. Any objections?

          Daniel Beck added a comment - As an alternative to Keep this slave online as much as possible , here's a plugin adding a retention strategy that does not reconnect slaves if they're temporarily marked offline by the user: https://github.com/daniel-beck/keep-slave-disconnected-plugin I'd consider this issue now (or with official release of this plugin at the latest) resolved. Any objections?

          I ended up changing my slaves to "take this slave on-line when in demand and off-line when idle", and as suggested set

          in demand delay = 0
          idle delay = 1000000
          

          That does not actually fix the problem, but it's good enough for me:

          • When Jenkins (re)starts, the slave process does not.
          • At the time a slave is required by a job, the slave process is started.
          • If I disconnect a slave, it stays disconnected until a job requires it, at which point it is restarted. At least it stays disconnected until required, rather than being reconnected almost immediately.

          I did not try the keep-slave-disconnected-plugin (thanks for contributing though). To my mind this is solving the problem in the wrong place - the muddle and poor handling needs to be fixed in core, rather than overridden.

          Matthew Webber added a comment - I ended up changing my slaves to " take this slave on-line when in demand and off-line when idle ", and as suggested set in demand delay = 0 idle delay = 1000000 That does not actually fix the problem, but it's good enough for me: When Jenkins (re)starts, the slave process does not. At the time a slave is required by a job, the slave process is started. If I disconnect a slave, it stays disconnected until a job requires it, at which point it is restarted. At least it stays disconnected until required, rather than being reconnected almost immediately. I did not try the keep-slave-disconnected-plugin (thanks for contributing though). To my mind this is solving the problem in the wrong place - the muddle and poor handling needs to be fixed in core, rather than overridden.

          Daniel Beck added a comment -

          I disagree: Core ships with a handful of strategies that behave exactly as they should based on their descriptions: "Keep this slave online as much as possible" will try to reconnect ASAP if disconnected.

          That core doesn't ship with the strategy fitting your requirements is the only issue here, and that's why there are plugins.

          I'm considering this resolved:

          • It's not a bug. Don't select "Keep online as much as possible" if that's not what you want
          • Feature of a different retention strategy (this is exactly what the extension point is for!) has been implemented in keep-slave-disconnected-plugin

          Daniel Beck added a comment - I disagree: Core ships with a handful of strategies that behave exactly as they should based on their descriptions: "Keep this slave online as much as possible" will try to reconnect ASAP if disconnected. That core doesn't ship with the strategy fitting your requirements is the only issue here, and that's why there are plugins. I'm considering this resolved: It's not a bug. Don't select "Keep online as much as possible" if that's not what you want Feature of a different retention strategy (this is exactly what the extension point is for!) has been implemented in keep-slave-disconnected-plugin

          James Howe added a comment - - edited

          I've just tried the "Keep Offline Slaves Disconnected Retention Strategy Plugin" and it doesn't work (Jenkins ver. 2.138.2).
          Even with the node set to "Keep this slave on-line as much as possible, but don’t reconnect if temporarily marked offline by the user", it still comes back online within 30 seconds of marking it offline.

          Having no way to actually take a node offline temporarily (e.g. to safely work on it without any jobs starting) is a big deal.

          James Howe added a comment - - edited I've just tried the "Keep Offline Slaves Disconnected Retention Strategy Plugin" and it doesn't work (Jenkins ver. 2.138.2). Even with the node set to "Keep this slave on-line as much as possible, but don’t reconnect if temporarily marked offline by the user", it still comes back online within 30 seconds of marking it offline. Having no way to actually take a node offline temporarily (e.g. to safely work on it without any jobs starting) is a big deal.

          Sean Grider added a comment -

          Going to have to agree with James Howe here. What workflow is Core targeting by even offering "Take this node offline" if it just comes back 30 seconds later?

          If the node has an issue causing all builds executed on it to fail, then I need to take it offline, so apparently this means I have to update its availability option and then take it offline? I could save a step by just updating the availability to use a schedule which is never available, which then prompts the question, why is there even a button to take offline in the first place if it doesn't really take it offline?

          Sean Grider added a comment - Going to have to agree with James Howe here. What workflow is Core targeting by even offering "Take this node offline" if it just comes back 30 seconds later? If the node has an issue causing all builds executed on it to fail, then I need to take it offline, so apparently this means I have to update its availability option and then take it offline? I could save a step by just updating the availability to use a schedule which is never available, which then prompts the question, why is there even a button to take offline in the first place if it doesn't really take it offline?

          Yves Schumann added a comment -

          Running into the same issue. I need to set dedicated slaves offline for certain reasons and the should not come back online until an admin performs the corresponding action. 

          Yves Schumann added a comment - Running into the same issue. I need to set dedicated slaves offline for certain reasons and the should not come back online until an admin performs the corresponding action. 

          brian hewson added a comment -

          Can we re-open this issue? I'm seeing the same problem and it's a really bad one, when I'm trying to block agents from getting jobs because of JENKINS-53810, I try to take the agents offline, and they try to reconnect soon after, which leaves me in a state of draining my queue by sending jobs to vms that aren't actually connected, which results in 100s of job failures.

          brian hewson added a comment - Can we re-open this issue? I'm seeing the same problem and it's a really bad one, when I'm trying to block agents from getting jobs because of JENKINS-53810 , I try to take the agents offline, and they try to reconnect soon after, which leaves me in a state of draining my queue by sending jobs to vms that aren't actually connected, which results in 100s of job failures.

          +1. This is a bug, not a feature. If you click a button to disable a node, then a reasonable person would expect the node to stay disabled.  

          Mike Baranczak added a comment - +1. This is a bug, not a feature. If you click a button to disable a node, then a reasonable person would expect the node to stay disabled.  

          Artalus S. added a comment - - edited

          This is a bug, not a feature. I want to keep my agent online as much as possible, so jobs don't have to wait even 1 minute before Jenkins considers that it should be connected. I also want to perform maintenance on the agent that requires no connection at all. For example, to change jenkins user ID, which  requires no process running under said user, including SSH connection from Jenkins master.

          As much as I hate the word "obvious", because what is obvious for one is not necessary the same for the other... Obviously if I disconnect something, I expect it to stay disconnected.

          Artalus S. added a comment - - edited This is a bug, not a feature. I want to keep my agent online as much as possible, so jobs don't have to wait even 1 minute before Jenkins considers that it should be connected. I also want to perform maintenance on the agent that requires no connection at all. For example, to change jenkins user ID, which  requires no process running under said user, including SSH connection from Jenkins master. As much as I hate the word "obvious", because what is obvious for one is not necessary the same for the other... Obviously if I disconnect something, I expect it to stay disconnected.

            kohsuke Kohsuke Kawaguchi
            norman Norman Baumann
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: