-
New Feature
-
Resolution: Fixed
-
Minor
-
None
-
Linux Jenkins Master and build slave
-
Powered by SuggestiMate
I have a Jenkins installation with 20 build slaves. A couple of minutes after I click on "Disconnect slave", the slave is back online.
The log contains:
Mar 19, 2012 5:22:19 PM hudson.slaves.SlaveComputer tryReconnect
INFO: Attempting to reconnect musxdodo77
Somehow Jenkins is ignoring the "offline" tag when set manually.
[JENKINS-13140] Disconnected slaves come back online within a few minutes
Hey Marc,
I don't know what a "Swarm client" is. Can you clarify this for me?
I have created the slave as a "dump" slave. Setting it's status to offline has the same effect as disconnecting.
I am also having this issue. The master node is connecting to slave nodes via a service. After I disconnect a slave (and give a reason) it goes back online within a few minutes. This can cause builds to fail, because the slave will execute a build that it was not ready to do.
I think I found the issue.
It's actually not a bug, it's was me being dull
Jenkins has an option to disconnect a slave, which only cuts the connection to this slave. This triggers Jenkins to re-initiate the connection.
An alternative option is the "temporarily remove" a slave. This does what I had in mind. Jenkins ignores this slave and does not bring it back online.
Problem solved
This is still an issue, at least as of Jenkins 1.522.
I note that there are two actions you can take for a node:
- "mark temporarily offline" which keeps the slave.jar running, but doesn’t send any new jobs to the slave
- "disconnect" which means kill the slave.jar process (however, the slave process will be restarted in a few minutes)
There is no option to actually kill the slave process, and stop it restarting.
Also, there is confusion in the Jenkins code about what these options mean. If I "mark temporarily offline", then on the node staus page, it says "disconnected by ...", which is just mixing up two things that are supposed to be distinct.
A rethink of what the options are is needed.
The original poster said:
An alternative option is the "temporarily remove" a slave. This does what I had in mind. Jenkins ignores this slave and does not bring it back online
There's no such option as "temporarily remove", at least in recent Jenkins releases. Perhaps he meant "mark temporarily offline". Although this does means Jenkins ignores the slave, it stil leaves the slave process running.
Can the 'Availability' option in the slave config be used to not reconnect automatically?
Can the 'Availability' option in the slave config be used to not reconnect automatically?
Err ... how? Did you look at options for 'Availability' and come up with an idea, or did you just see an option called 'Availability' and idly speculate as to what it might do?
mwebber: Instead of only disconnecting, configure the slave to also have an in demand connection delay of 1000000. Works like a charm.
It'd be better if slave processes weren't started while they're marked offline, then you could just configure all your slaves with in demand delay 0/idle delay 1000000 and have effectively regular 'always on' slaves that don't automatically reconnect. Maybe this can be improved in core.
Since the user reporting this issue originally mentioned that the goal was accomplished through "mark offline", it'd help if anyone interested in this could provide a use case. Starting new builds can be prevented after all.
Something approaching a fix should be a trivial change in RetentionStrategy:
If the node is disconnected, and isn't launching, and can be launched, launch it if not marked offline by the user.
Would that work?
Never reconnecting could be annoying with disconnects due to error, so I doubt that's an option.
Instead of only disconnecting, configure the slave to also have an in demand connection delay of 1000000. Works like a charm.
I don't see a "demand connection delay" setting anywhere in the web UI. Is this supplied by a particular plugin that I don't have? Note that I'm using regular dumb slaves, not swarm.
mwebber: Right, sorry about that. I assumed you took a look at the option I mentioned earlier before posting a response. Just change it to Take this slave online when in demand and offline when idle (present for SSH slaves in 1.532.x) and my comment will make sense.
Obviously you do not want to 'Keep this slave online as much as possible', so there's no point in selecting that value.
- To prevent builds, Mark this node temporarily offline.
- To prevent reconnect after manual disconnect, don't configure Jenkins to Keep this slave online as much as possible.
Therefore this is a new feature. Reducing to minor since the original user commented this was just a misunderstanding of user options for preventing builds.
As an alternative to Keep this slave online as much as possible, here's a plugin adding a retention strategy that does not reconnect slaves if they're temporarily marked offline by the user:
https://github.com/daniel-beck/keep-slave-disconnected-plugin
I'd consider this issue now (or with official release of this plugin at the latest) resolved. Any objections?
I ended up changing my slaves to "take this slave on-line when in demand and off-line when idle", and as suggested set
in demand delay = 0 idle delay = 1000000
That does not actually fix the problem, but it's good enough for me:
- When Jenkins (re)starts, the slave process does not.
- At the time a slave is required by a job, the slave process is started.
- If I disconnect a slave, it stays disconnected until a job requires it, at which point it is restarted. At least it stays disconnected until required, rather than being reconnected almost immediately.
I did not try the keep-slave-disconnected-plugin (thanks for contributing though). To my mind this is solving the problem in the wrong place - the muddle and poor handling needs to be fixed in core, rather than overridden.
I disagree: Core ships with a handful of strategies that behave exactly as they should based on their descriptions: "Keep this slave online as much as possible" will try to reconnect ASAP if disconnected.
That core doesn't ship with the strategy fitting your requirements is the only issue here, and that's why there are plugins.
I'm considering this resolved:
- It's not a bug. Don't select "Keep online as much as possible" if that's not what you want
- Feature of a different retention strategy (this is exactly what the extension point is for!) has been implemented in keep-slave-disconnected-plugin
I've just tried the "Keep Offline Slaves Disconnected Retention Strategy Plugin" and it doesn't work (Jenkins ver. 2.138.2).
Even with the node set to "Keep this slave on-line as much as possible, but don’t reconnect if temporarily marked offline by the user", it still comes back online within 30 seconds of marking it offline.
Having no way to actually take a node offline temporarily (e.g. to safely work on it without any jobs starting) is a big deal.
Going to have to agree with James Howe here. What workflow is Core targeting by even offering "Take this node offline" if it just comes back 30 seconds later?
If the node has an issue causing all builds executed on it to fail, then I need to take it offline, so apparently this means I have to update its availability option and then take it offline? I could save a step by just updating the availability to use a schedule which is never available, which then prompts the question, why is there even a button to take offline in the first place if it doesn't really take it offline?
Running into the same issue. I need to set dedicated slaves offline for certain reasons and the should not come back online until an admin performs the corresponding action.
Can we re-open this issue? I'm seeing the same problem and it's a really bad one, when I'm trying to block agents from getting jobs because of JENKINS-53810, I try to take the agents offline, and they try to reconnect soon after, which leaves me in a state of draining my queue by sending jobs to vms that aren't actually connected, which results in 100s of job failures.
+1. This is a bug, not a feature. If you click a button to disable a node, then a reasonable person would expect the node to stay disabled.
This is a bug, not a feature. I want to keep my agent online as much as possible, so jobs don't have to wait even 1 minute before Jenkins considers that it should be connected. I also want to perform maintenance on the agent that requires no connection at all. For example, to change jenkins user ID, which requires no process running under said user, including SSH connection from Jenkins master.
As much as I hate the word "obvious", because what is obvious for one is not necessary the same for the other... Obviously if I disconnect something, I expect it to stay disconnected.
Are these swarm clients or normal dumb ssh clients? Swarm clients reconnect automatically after a disconnect, but you can "mark" them "offline".