Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-60434

"Prepare for shutdown" should continue executing already running pipelines to completion

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • workflow-cps-plugin
    • None

      Based on dnusbaum's comment from JENKINS-34256:

      A fix for this issue was just released in Pipeline: Groovy Plugin version 2.78. I think there is/was some confusion as to the expected behavior (myself included!), so let me try to clarify: When Jenkins prepares for shutdown, all running Pipelines are paused, and this is the intended behavior. The unintended behavior was that if you canceled shutdown, Pipelines remained paused. This has been fixed in 2.78; Pipelines will now resume execution if shutdown is canceled. Before 2.78, you had to manually pause and unpause each Pipeline to get it to resume execution, or restart Jenkins. Additionally, preparing Jenkins for shutdown and canceling shutdown now each cause a message to be printed to Pipeline build logs indicating that the Pipeline is being paused or resumed due to shutdown so that it is easier to understand what is happening.

      Based on comments here and elsewhere, I think some users would prefer a variant of "Prepare for shutdown" in which Pipelines continue executing to completion, the same as other types of jobs like Freestyle. If that is something you want, please open a new ticket, describing your use case and the desired behavior.

      [...]

      If there is some other aspect of this issue that you would like to see addressed, or a different behavior you would prefer, please open a new ticket describing your particular use case.

      My use case is to make restarting Jenkins master to allow upgrading Jenkins core or updating Jenkins plugins easier, because now I need to do the following:

      1. wait until no pipelines are running anymore
        • which can get difficult in bigger Jenkins environments during the day i.e. normal working hours (due to steady commits triggering pipelines), but also in case there are longer lasting test suites that e.g. are triggered all around the clock
      2. click "prepare for shutdown"
      3. ... (continue normal work like upgrading/updating)

          [JENKINS-60434] "Prepare for shutdown" should continue executing already running pipelines to completion

          Reinhold Füreder created issue -
          Reinhold Füreder made changes -
          Link New: This issue is related to JENKINS-34256 [ JENKINS-34256 ]

          Another use case: The ThinBackup plugin sets Jenkins to shutdown and waits for all jobs to finish. But Pipeline Jobs never finishes: dead lock.

          Ulrich Köhler added a comment - Another use case: The ThinBackup plugin sets Jenkins to shutdown and waits for all jobs to finish. But Pipeline Jobs never finishes: dead lock.

          In case it can be useful to anyone, here is the "planned upgrade" process we have for Jenkins in my company.
          It relies on a custom quiet-mode implementation we've implemented in an internal plugin, which basically allows already running builds to terminate (including Pipelines), but forbids starting execution of new builds (expect if they are necessary for termination of the already running builds).

          The overall process is automated (we have many Jenkins instances), and it goes like this:

          • activate the custom quiet-down mode (forbid starting new builds)
          • poll Jenkins until it's idle, for up to X minutes, and then do the upgrade (including an actual restart)
          • on time-out of this polling, cancel the planned upgrade (cancel the custom quiet-mode), and retry it all later (sometimes we have to find arrangements with users, so that they don't launch their freaking 18 hours tests suite on the day we are planning to do an upgrade)

          We don't have plans/time to publish and maintain this as a community plugin, but if someone wants to do something similar, I will dump the code below, feel free to reuse what you want.

          Note that we would probably never had written this code if we had not been bitten many times by JENKINS-34256. A few years ago, we were simply using the standard Jenkins quiet-mode, but then stuck Pipelines (when the upgrade was cancelled) really became an issue...
          Now that JENKINS-34256 is fixed, I don't know, we might consider going back to this standard solution. But I think our users prefer having their Pipelines finished before the upgrade, rather than paused/resumed (mainly because the "resume" part is not always smooth: some plugins upgrades might break compatibility of the serialized data, etc.).

          Anyway, this is the "interesting" part of the code, the QuietDownQueueTaskDispatcher, which filters which new Queue.Item can actually be started when in (custom) quiet-mode.

          @Extension
          public class QuietDownQueueTaskDispatcher extends QueueTaskDispatcher {
          
          	@Inject
          	QuietDownStateManager quietDownStateManager;
          
          	// key: upstreamProject+upstreamBuild from an UpstreamCause
          	// value: true if children builds should be allowed to run
          	private ConcurrentHashMap<String, Boolean> knownUpstreamCauses = new ConcurrentHashMap<>();
          
          	// used to decide when cache should be flushed
          	private AtomicLong quietDownTimestamp = new AtomicLong(0l);
          
          	@Override
          	public @CheckForNull CauseOfBlockage canRun(Queue.Item item) {
          		QuietDownState currentState = quietDownStateManager.getState();
          		if (!currentState.isDown()) {
          			return null;
          		}
          
          		// flush cache if quietDown state has changed
          		if (quietDownTimestamp.getAndSet(currentState.since()) != currentState.since()) {
          			knownUpstreamCauses.clear();
          		}
          
          		Queue.Task task = item.task;
          		// always allow some kind of tasks
          		if (task instanceof NonBlockingTask || task instanceof ContinuedTask) {
          			return null;
          		}
          		// allow build task because of its upstream cause
          		if (hasAllowingCause(item.getCauses())) {
          			return null;
          		}
          		// not allowed, let's explain why
          		return new QuietDownBlockageCause(currentState);
          	}
          
          	private boolean hasAllowingCause(@Nonnull List<Cause> causes) {
          		boolean result = false;
          		for (Cause parentCause: causes) {
          			if (!(parentCause instanceof UpstreamCause)) {
          				continue;
          			}
          			result = result || isAllowingUpstreamCause((UpstreamCause) parentCause);
          		}
          		return result;
          	}
          
          	private boolean isAllowingUpstreamCause(@Nonnull UpstreamCause cause) {
          		String runKey = cause.getUpstreamProject() + ':' + cause.getUpstreamBuild();
          		Boolean decisionFromCache = knownUpstreamCauses.get(runKey);
          		if (decisionFromCache != null) {
          			return decisionFromCache;
          		}
          		boolean newDecision = hasAllowingCause(cause.getUpstreamCauses())
          				|| isRunAllowingDownstreamBuilds(cause.getUpstreamRun());
          		knownUpstreamCauses.put(runKey, newDecision);
          		return newDecision;
          	}
          
          	private boolean isRunAllowingDownstreamBuilds(@CheckForNull Run<?, ?> run) {
          		if (run == null || !run.isBuilding()) {
          			return false;
          		}
          		// a running WorkflowRun or MatrixBuild may wait for its children to complete
          		// Note: assume there exists no MatrixBuild subclass, it saves an optional plugin dependency
          		return (run instanceof WorkflowRun || "hudson.matrix.MatrixBuild".equals(run.getClass().getName()));
          	}
          
          	public static class QuietDownBlockageCause extends CauseOfBlockage {
          
          		private final @Nonnull QuietDownState quietDownState;
          
          		private QuietDownBlockageCause(QuietDownState quietDownState) {
          			this.quietDownState = quietDownState;
          		}
          
          		public static @CheckForNull QuietDownBlockageCause from(QuietDownState quietDownState) {
          			if (!quietDownState.isDown()) {
          				return null;
          			}
          			return new QuietDownBlockageCause(quietDownState);
          		}
          
          		@Override
          		public String getShortDescription() {
          			return quietDownState.toShortDescriptionString();
          		}
          
          	}
          }
          

          The currently implemented policy is to only allow tasks which are:

          • NonBlockingTask, or Pipeline ContinuedTask (I can't remember the specific details, I wrote that long time ago)
          • children of an already running Pipeline or Matrix build (that's necessary to let these builds terminate, because they can wait for their children termination, but it could be refined: for instance we don't really need to allow builds launched by a Pipeline build step with wait=false parameter)

          Other than these, new builds will be declined, and stay in the queue.

          To avoid spending too much time walking the UpstreamCause of the candidate tasks, we keep a cache of already made decisions (whether a specific build is a legitimate cause for allowing children builds, or not).

          A QuietDownState has a State (AVAILABLE or QUIET_DOWN enumeration), a starting timestamp, and a cause message.

          public class QuietDownState {
          
          	private final String cause;
          	private final State state;
          	private final long timestamp;
          
          	private QuietDownState(@Nonnull State state) {
          		this(state, null);
          	}
          
          	private QuietDownState(@Nonnull State state, String cause) {
          		this.cause = cause;
          		this.state = state;
          		this.timestamp = System.currentTimeMillis();
          	}
          
          	public static @Nonnull QuietDownState available() {
          		return new QuietDownState(State.AVAILABLE);
          	}
          
          	public static @Nonnull QuietDownState quietDown(@Nonnull String cause) {
          		return new QuietDownState(State.QUIET_DOWN, cause);
          	}
          
          	public boolean is(State state) {
          		return this.state == state;
          	}
          
          	public boolean isDown() {
          		return state.down;
          	}
          
          	public @CheckForNull String why() {
          		return cause;
          	}
          
          	public long since() {
          		return timestamp;
          	}
          
          	public @Nonnull String toApiString() {
          		StringBuilder sb = new StringBuilder();
          		sb.append(state);
          		sb.append(" since ");
          		sb.append(Util.XS_DATETIME_FORMATTER.format(timestamp));
          		if (StringUtils.isNotEmpty(cause)) {
          			sb.append(" - ").append(cause);
          		}
          		return sb.toString();
          	}
          
          	// FIXME: better message/formatting
          	public @Nonnull String toUserString() {
          		StringBuilder sb = new StringBuilder();
          		sb.append("Jenkins has been ");
          		sb.append(state.label);
          		sb.append(" for ");
          		sb.append(Util.getTimeSpanString(System.currentTimeMillis() - timestamp));
          		if (StringUtils.isNotEmpty(cause)) {
          			sb.append(" - ").append(cause);
          		}
          		return sb.toString();
          	}
          
          	// FIXME: make it shorter?
          	public @Nonnull String toShortDescriptionString() {
          		return toUserString();
          	}
          
          	public @Nonnull String toString() {
          		return toApiString();
          	}
          
          	@Override
          	public int hashCode() {
          		// <snip>
          	}
          
          	@Override
          	public boolean equals(Object obj) {
          		// <snip>
          	}
          
          	public enum State {
          		AVAILABLE(false, "available"), QUIET_DOWN(true, "sleeping");
          		private boolean down;
          		private String label;
          
          		private State(boolean down, String label) {
          			this.down = down;
          			this.label = label;
          		}
          	}
          }
          

          The (global) current state can be changed via a QuietDownStateManager, which is a Guice singleton:

          public class QuietDownStateManager {
          
          	private AtomicReference<QuietDownState> currentState = new AtomicReference<>(QuietDownState.available());
          
          	public QuietDownState getState() {
          		return currentState.get();
          	}
          
          	public QuietDownState quietDown(String cause) {
          		final QuietDownState newState = QuietDownState.quietDown(cause);
          		return currentState.updateAndGet(
          				state -> state.is(QUIET_DOWN) ? state : newState);
          		// TODO: updating the cause (when already down) could be nice (while still preserving the initial timestamp)
          	}
          
          	public QuietDownState cancelQuietDown() {
          		final QuietDownState newState = QuietDownState.available();
          		return currentState.updateAndGet(
          				state -> state.is(AVAILABLE) ? state : newState);
          	}
          
          }
          
          @Extension
          public class GuiceBindings extends AbstractModule {
          
          	@Override
          	protected void configure() {
          		//...
          		bind(QuietDownStateManager.class).in(Singleton.class);
          	}
          
          }
          

          We control the QuietDownStateManager through a few simple HTTP methods:

          • doQuietDown(): enable quiet-down mode (with a cause message)
          • doCancelQuietDown(): disable quiet-down mode
          • doGetQuietDownStatus(): get current quiet-down status

          We also have a method (doActivity() below) which we can poll to know whether Jenkins is BUSY or IDLE (that's what we use to wait for it being idle before triggering an actual restart - this too could be refined, for instance we could consider that Jenkins is idle when the only running Pipelines which are left are actually blocked on input steps).

          @Extension
          public class SomethingRemoteAPI extends AbstractModelObject implements UnprotectedRootAction {
          	@Inject
          	QuietDownStateManager quietDownStateManager;
          
          	public String getDisplayName() {
          		return "SomethingAPI";
          	}
          
          	public String getSearchUrl() {
          		return getUrlName();
          	}
          
          	public String getIconFileName() {
          		return null;
          	}
          
          	public String getUrlName() {
          		return "somethingAPI";
          	}
          
          	// <snip> other unrelated methods
          
          	@RequirePOST
          	public HttpResponse doQuietDown() {
          		Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER);
          		return (req, rsp, node) -> {
          			final QuietDownState state = quietDownStateManager.quietDown(defaultString(req.getParameter("cause")));
          			rsp.setStatus(HttpServletResponse.SC_OK);
          			rsp.setContentType("text/plain");
          			PrintWriter w = rsp.getWriter();
          			w.println(state.toApiString());
          		};
          	}
          
          	@RequirePOST
          	public HttpResponse doCancelQuietDown() {
          		Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER);
          		return (req, rsp, node) -> {
          			final QuietDownState state = quietDownStateManager.cancelQuietDown();
          			rsp.setStatus(HttpServletResponse.SC_OK);
          			rsp.setContentType("text/plain");
          			PrintWriter w = rsp.getWriter();
          			w.println(state.toApiString());
          		};
          	}
          
          	public HttpResponse doGetQuietDownStatus() {
          		return (req, rsp, node) -> {
          			final QuietDownState state = quietDownStateManager.getState();
          			rsp.setStatus(HttpServletResponse.SC_OK);
          			rsp.setContentType("text/plain");
          			PrintWriter w = rsp.getWriter();
          			w.println(state.toApiString());
          		};
          	}
          
          	public HttpResponse doActivity() {
          		final int httpStatus;
          		final String body;
          		try {
          			body = countBusyExecutors() > 0 ? "BUSY" : "IDLE" ;
          			httpStatus = HttpServletResponse.SC_OK;
          		} catch (RuntimeException e) {
          			LOGGER.log(Level.WARNING, "failed to count busy executors: " + e.getMessage(), e);
          			body = "UNKOWN" ;
          			httpStatus = HttpServletResponse.SC_INTERNAL_SERVER_ERROR;
          		}
          		return (req, rsp, node) -> {
          			rsp.setStatus(httpStatus);
          			rsp.setContentType("text/plain");
          			PrintWriter w = rsp.getWriter();
          			w.println(body);
          		};
          	}
          
          	private int countBusyExecutors() {
          		// see hudson.model.ComputerSet.getBusyExecutors()
          		int r = 0;
          		for (Computer c : Jenkins.get().getComputers()) {
          			if (c.isOnline()) {
          				r += c.countBusy();
          			}
          		}
          		return r;
          	}
          }
          

          Finally, we also have some bits of code to display a message in Jenkins GUI when our quiet-mode is enabled (that's part of a more general-purpose system we have for pushing notification messages to our Jenkins users, but that could ofcourse be implemented differently in the context of a dedicated plugin).

          Thomas de Grenier de Latour added a comment - In case it can be useful to anyone, here is the "planned upgrade" process we have for Jenkins in my company. It relies on a custom quiet-mode implementation we've implemented in an internal plugin, which basically allows already running builds to terminate (including Pipelines), but forbids starting execution of new builds (expect if they are necessary for termination of the already running builds). The overall process is automated (we have many Jenkins instances), and it goes like this: activate the custom quiet-down mode (forbid starting new builds) poll Jenkins until it's idle, for up to X minutes, and then do the upgrade (including an actual restart) on time-out of this polling, cancel the planned upgrade (cancel the custom quiet-mode), and retry it all later (sometimes we have to find arrangements with users, so that they don't launch their freaking 18 hours tests suite on the day we are planning to do an upgrade) We don't have plans/time to publish and maintain this as a community plugin, but if someone wants to do something similar, I will dump the code below, feel free to reuse what you want. Note that we would probably never had written this code if we had not been bitten many times by JENKINS-34256 . A few years ago, we were simply using the standard Jenkins quiet-mode, but then stuck Pipelines (when the upgrade was cancelled) really became an issue... Now that JENKINS-34256 is fixed, I don't know, we might consider going back to this standard solution. But I think our users prefer having their Pipelines finished before the upgrade, rather than paused/resumed (mainly because the "resume" part is not always smooth: some plugins upgrades might break compatibility of the serialized data, etc.). Anyway, this is the "interesting" part of the code, the QuietDownQueueTaskDispatcher , which filters which new Queue.Item can actually be started when in (custom) quiet-mode. @Extension public class QuietDownQueueTaskDispatcher extends QueueTaskDispatcher { @Inject QuietDownStateManager quietDownStateManager; // key: upstreamProject+upstreamBuild from an UpstreamCause // value: true if children builds should be allowed to run private ConcurrentHashMap< String , Boolean > knownUpstreamCauses = new ConcurrentHashMap<>(); // used to decide when cache should be flushed private AtomicLong quietDownTimestamp = new AtomicLong(0l); @Override public @CheckForNull CauseOfBlockage canRun(Queue.Item item) { QuietDownState currentState = quietDownStateManager.getState(); if (!currentState.isDown()) { return null ; } // flush cache if quietDown state has changed if (quietDownTimestamp.getAndSet(currentState.since()) != currentState.since()) { knownUpstreamCauses.clear(); } Queue.Task task = item.task; // always allow some kind of tasks if (task instanceof NonBlockingTask || task instanceof ContinuedTask) { return null ; } // allow build task because of its upstream cause if (hasAllowingCause(item.getCauses())) { return null ; } // not allowed, let's explain why return new QuietDownBlockageCause(currentState); } private boolean hasAllowingCause(@Nonnull List<Cause> causes) { boolean result = false ; for (Cause parentCause: causes) { if (!(parentCause instanceof UpstreamCause)) { continue ; } result = result || isAllowingUpstreamCause((UpstreamCause) parentCause); } return result; } private boolean isAllowingUpstreamCause(@Nonnull UpstreamCause cause) { String runKey = cause.getUpstreamProject() + ':' + cause.getUpstreamBuild(); Boolean decisionFromCache = knownUpstreamCauses.get(runKey); if (decisionFromCache != null ) { return decisionFromCache; } boolean newDecision = hasAllowingCause(cause.getUpstreamCauses()) || isRunAllowingDownstreamBuilds(cause.getUpstreamRun()); knownUpstreamCauses.put(runKey, newDecision); return newDecision; } private boolean isRunAllowingDownstreamBuilds(@CheckForNull Run<?, ?> run) { if (run == null || !run.isBuilding()) { return false ; } // a running WorkflowRun or MatrixBuild may wait for its children to complete // Note: assume there exists no MatrixBuild subclass, it saves an optional plugin dependency return (run instanceof WorkflowRun || "hudson.matrix.MatrixBuild" .equals(run.getClass().getName())); } public static class QuietDownBlockageCause extends CauseOfBlockage { private final @Nonnull QuietDownState quietDownState; private QuietDownBlockageCause(QuietDownState quietDownState) { this .quietDownState = quietDownState; } public static @CheckForNull QuietDownBlockageCause from(QuietDownState quietDownState) { if (!quietDownState.isDown()) { return null ; } return new QuietDownBlockageCause(quietDownState); } @Override public String getShortDescription() { return quietDownState.toShortDescriptionString(); } } } The currently implemented policy is to only allow tasks which are: NonBlockingTask , or Pipeline ContinuedTask (I can't remember the specific details, I wrote that long time ago) children of an already running Pipeline or Matrix build (that's necessary to let these builds terminate, because they can wait for their children termination, but it could be refined: for instance we don't really need to allow builds launched by a Pipeline build step with wait=false parameter) Other than these, new builds will be declined, and stay in the queue. To avoid spending too much time walking the UpstreamCause of the candidate tasks, we keep a cache of already made decisions (whether a specific build is a legitimate cause for allowing children builds, or not). A QuietDownState has a State ( AVAILABLE or QUIET_DOWN enumeration), a starting timestamp, and a cause message. public class QuietDownState { private final String cause; private final State state; private final long timestamp; private QuietDownState(@Nonnull State state) { this (state, null ); } private QuietDownState(@Nonnull State state, String cause) { this .cause = cause; this .state = state; this .timestamp = System .currentTimeMillis(); } public static @Nonnull QuietDownState available() { return new QuietDownState(State.AVAILABLE); } public static @Nonnull QuietDownState quietDown(@Nonnull String cause) { return new QuietDownState(State.QUIET_DOWN, cause); } public boolean is(State state) { return this .state == state; } public boolean isDown() { return state.down; } public @CheckForNull String why() { return cause; } public long since() { return timestamp; } public @Nonnull String toApiString() { StringBuilder sb = new StringBuilder(); sb.append(state); sb.append( " since " ); sb.append(Util.XS_DATETIME_FORMATTER.format(timestamp)); if (StringUtils.isNotEmpty(cause)) { sb.append( " - " ).append(cause); } return sb.toString(); } // FIXME: better message/formatting public @Nonnull String toUserString() { StringBuilder sb = new StringBuilder(); sb.append( "Jenkins has been " ); sb.append(state.label); sb.append( " for " ); sb.append(Util.getTimeSpanString( System .currentTimeMillis() - timestamp)); if (StringUtils.isNotEmpty(cause)) { sb.append( " - " ).append(cause); } return sb.toString(); } // FIXME: make it shorter? public @Nonnull String toShortDescriptionString() { return toUserString(); } public @Nonnull String toString() { return toApiString(); } @Override public int hashCode() { // <snip> } @Override public boolean equals( Object obj) { // <snip> } public enum State { AVAILABLE( false , "available" ), QUIET_DOWN( true , "sleeping" ); private boolean down; private String label; private State( boolean down, String label) { this .down = down; this .label = label; } } } The (global) current state can be changed via a QuietDownStateManager , which is a Guice singleton: public class QuietDownStateManager { private AtomicReference<QuietDownState> currentState = new AtomicReference<>(QuietDownState.available()); public QuietDownState getState() { return currentState.get(); } public QuietDownState quietDown( String cause) { final QuietDownState newState = QuietDownState.quietDown(cause); return currentState.updateAndGet( state -> state.is(QUIET_DOWN) ? state : newState); // TODO: updating the cause (when already down) could be nice ( while still preserving the initial timestamp) } public QuietDownState cancelQuietDown() { final QuietDownState newState = QuietDownState.available(); return currentState.updateAndGet( state -> state.is(AVAILABLE) ? state : newState); } } @Extension public class GuiceBindings extends AbstractModule { @Override protected void configure() { //... bind(QuietDownStateManager.class).in(Singleton.class); } } We control the QuietDownStateManager through a few simple HTTP methods: doQuietDown() : enable quiet-down mode (with a cause message) doCancelQuietDown() : disable quiet-down mode doGetQuietDownStatus() : get current quiet-down status We also have a method ( doActivity() below) which we can poll to know whether Jenkins is BUSY or IDLE (that's what we use to wait for it being idle before triggering an actual restart - this too could be refined, for instance we could consider that Jenkins is idle when the only running Pipelines which are left are actually blocked on  input steps). @Extension public class SomethingRemoteAPI extends AbstractModelObject implements UnprotectedRootAction { @Inject QuietDownStateManager quietDownStateManager; public String getDisplayName() { return "SomethingAPI" ; } public String getSearchUrl() { return getUrlName(); } public String getIconFileName() { return null ; } public String getUrlName() { return "somethingAPI" ; } // <snip> other unrelated methods @RequirePOST public HttpResponse doQuietDown() { Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER); return (req, rsp, node) -> { final QuietDownState state = quietDownStateManager.quietDown(defaultString(req.getParameter( "cause" ))); rsp.setStatus(HttpServletResponse.SC_OK); rsp.setContentType( "text/plain" ); PrintWriter w = rsp.getWriter(); w.println(state.toApiString()); }; } @RequirePOST public HttpResponse doCancelQuietDown() { Jenkins.getInstance().checkPermission(Jenkins.ADMINISTER); return (req, rsp, node) -> { final QuietDownState state = quietDownStateManager.cancelQuietDown(); rsp.setStatus(HttpServletResponse.SC_OK); rsp.setContentType( "text/plain" ); PrintWriter w = rsp.getWriter(); w.println(state.toApiString()); }; } public HttpResponse doGetQuietDownStatus() { return (req, rsp, node) -> { final QuietDownState state = quietDownStateManager.getState(); rsp.setStatus(HttpServletResponse.SC_OK); rsp.setContentType( "text/plain" ); PrintWriter w = rsp.getWriter(); w.println(state.toApiString()); }; } public HttpResponse doActivity() { final int httpStatus; final String body; try { body = countBusyExecutors() > 0 ? "BUSY" : "IDLE" ; httpStatus = HttpServletResponse.SC_OK; } catch (RuntimeException e) { LOGGER.log(Level.WARNING, "failed to count busy executors: " + e.getMessage(), e); body = "UNKOWN" ; httpStatus = HttpServletResponse.SC_INTERNAL_SERVER_ERROR; } return (req, rsp, node) -> { rsp.setStatus(httpStatus); rsp.setContentType( "text/plain" ); PrintWriter w = rsp.getWriter(); w.println(body); }; } private int countBusyExecutors() { // see hudson.model.ComputerSet.getBusyExecutors() int r = 0; for (Computer c : Jenkins.get().getComputers()) { if (c.isOnline()) { r += c.countBusy(); } } return r; } } Finally, we also have some bits of code to display a message in Jenkins GUI when our quiet-mode is enabled (that's part of a more general-purpose system we have for pushing notification messages to our Jenkins users, but that could ofcourse be implemented differently in the context of a dedicated plugin).

          tom_gl Thanks for the insight! And wow, that is impressive and I am not sure you got that right in the first attempt

          Reinhold Füreder added a comment - tom_gl Thanks for the insight! And wow, that is impressive and I am not sure you got that right in the first attempt

          Jason Antman added a comment -

          We could really use this as well; our use case is similar to the above, largely around upgrades to Jenkins or the infrastructure that it runs on. To put it simply:

          1. There's a button on the Manage Jenkins page that says, "Prepare for Shutdown: Stops executing new builds, so that the system can be eventually shut down safely." I'd say that this is no longer correct, since it actually does more than that, it now also pauses running builds.
          2. In our case at least, pausing a pipeline is almost never the right thing to do. This has negative impacts for both cost (if we spin up a bunch of billed-by-the-minute EC2 instances for a test environment, we don't want to pause after doing that and before tearing it down) and user experience (when a pipeline kicks off, the people who are watching it expect it to run to completion, not get paused). We also occasionally have issues around timeouts, due to pausing between time-dependent stages.
          3. There's no clear visual indication of this state. If you look at "Build Executor Status" on the main page, it looks like the builds are running. There doesn't appear to be anything clearly indicating, "HEY, THIS BUILD IS PAUSED!"
          4. This is, in my opinion, a really major and unintuitive change from previous behavior. I've been using "Prepare for Shutdown" to upgrade Jenkins for years. The first time I found out that it's now pausing jobs, I spent an hour waiting for the currently-running jobs to complete (with no indication they were paused, see above) before I finally looked at the console output of one and found out that it was paused.

          Jason Antman added a comment - We could really use this as well; our use case is similar to the above, largely around upgrades to Jenkins or the infrastructure that it runs on. To put it simply: 1. There's a button on the Manage Jenkins page that says, "Prepare for Shutdown: Stops executing new builds, so that the system can be eventually shut down safely." I'd say that this is no longer correct, since it actually does more than that, it now also pauses running builds. 2. In our case at least, pausing a pipeline is almost never the right thing to do. This has negative impacts for both cost (if we spin up a bunch of billed-by-the-minute EC2 instances for a test environment, we don't want to pause after doing that and before tearing it down) and user experience (when a pipeline kicks off, the people who are watching it expect it to run to completion, not get paused). We also occasionally have issues around timeouts, due to pausing between time-dependent stages. 3. There's no clear visual indication of this state. If you look at "Build Executor Status" on the main page, it looks like the builds are running. There doesn't appear to be anything clearly indicating, "HEY, THIS BUILD IS PAUSED!" 4. This is, in my opinion, a really major and unintuitive change from previous behavior. I've been using "Prepare for Shutdown" to upgrade Jenkins for years . The first time I found out that it's now pausing jobs, I spent an hour waiting for the currently-running jobs to complete (with no indication they were paused, see above) before I finally looked at the console output of one and found out that it was paused.

          Tim Black added a comment -

          I agree completely with all 4 of @Jason Antman's points, and share the same negative experience with this misbehavior. This is a major problem with companies using pipelines and performing upgrades.

          Tim Black added a comment - I agree completely with all 4 of @Jason Antman's points, and share the same negative experience with this misbehavior. This is a major problem with companies using pipelines and performing upgrades.

          Tim Brown added a comment - - edited

          Have you tried using <jenkins_url>/safeRestart?
          It seems like it restarts once Pipelines are paused, and restart Pipelines after restart. That said it seems to claim it waits until they are finished (which I am hoping is just needs updating.

          Tim Brown added a comment - - edited Have you tried using <jenkins_url>/safeRestart? It seems like it restarts once Pipelines are paused, and restart Pipelines after restart. That said it seems to claim it waits until they are finished (which I am hoping is just needs updating.

          I tried the Lenient shutdown plugin but it pauses running pipelines and doesn't prevent new pipelines to start.
          This plugin is pretty old so it is probably not compatible with pipelines "by design".

          Jonathan Delizy added a comment - I tried the Lenient shutdown plugin but it pauses running pipelines and doesn't prevent new pipelines to start. This plugin is pretty old so it is probably not compatible with pipelines "by design".

          Brett Alex added a comment -

          @Tim Brown, I think this whole ticket is based on the fact that <jenkins_url>/safeRestart doesn't actually work.  I believe it actually does work in some cases but not in others.

          For example, I hit this bug at least once a week when applying patches. I think our cause may be that we have some freestyle jobs that trigger pipeline jobs (the horror!). Since the pipeline jobs pause indefinitely, the freestyle jobs never complete and the restart will never happen.

          I think the more common use case would be /safeRestart should just let all running jobs finish as it has done since the beginning. The only case to pause running jobs would be if you had one of those 18 hour jobs running and had to restart immediately.  

          I agree completely with all 4 of @Jason Antman's points as well.

          Brett Alex added a comment - @Tim Brown, I think this whole ticket is based on the fact that <jenkins_url>/safeRestart doesn't actually work.  I believe it actually does work in some cases but not in others. For example, I hit this bug at least once a week when applying patches. I think our cause may be that we have some freestyle jobs that trigger pipeline jobs (the horror!). Since the pipeline jobs pause indefinitely, the freestyle jobs never complete and the restart will never happen. I think the more common use case would be /safeRestart should just let all running jobs finish as it has done since the beginning. The only case to pause running jobs would be if you had one of those 18 hour jobs running and had to restart immediately.   I agree completely with all 4 of @Jason Antman's points as well.

            Unassigned Unassigned
            reinholdfuereder Reinhold Füreder
            Votes:
            40 Vote for this issue
            Watchers:
            48 Start watching this issue

              Created:
              Updated: