Status: Resolved (View Workflow)
Oftentimes tests fail with
java.util.concurrent.RejectedExecutionException: null at hudson.remoting.SingleLaneExecutorService.execute(SingleLaneExecutorService.java:99) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) at com.google.common.util.concurrent.ForwardingExecutorService.submit(ForwardingExecutorService.java:105) at jenkins.util.InterceptingExecutorService.submit(InterceptingExecutorService.java:39) at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.scheduleRun(CpsThreadGroup.java:172) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.waitForSuspension(CpsFlowExecution.java:524) at org.jenkinsci.plugins.workflow.SingleJobTestBase.waitForWorkflowToSuspend(SingleJobTestBase.java:108) at org.jenkinsci.plugins.workflow.SingleJobTestBase.waitForWorkflowToComplete(SingleJobTestBase.java:95) at org.jenkinsci.plugins.workflow.steps.input.InputStepRestartTest$2.evaluate(InputStepRestartTest.java:63)
The usual meaning of "shutdown" here, that Jenkins is shutting down, does not seem to apply, since we are in the middle of a story.
I suspect that CpsThreadGroup.scheduleRun is to blame. It calls runner.shutdown() sometimes. It does catch RejectedExecutionException (from cc4eca7), but in the nested call to submit which was observed to throw this exception earlier, not the one that is throwing the exception here.
The call to shutdown dates to 42dbfb1 in the single-thread branch, which says only
shutdown the executor pool when CpsThreadGroup is done
Is this still necessary and appropriate after 289c9b0 (collection of CpsThread)?
It is possible the flaw is in CpsFlowExecution.waitForSuspension, as a TODO comment there suggests. Can there be a race condition whereby programPromise is momentarily not null, but then the flow ends immediately afterwards? For that matter, is the comment
the execution has already finished
even correct, given that there appears to be no code which would set programPromise back to null? If I am right, then this comment is misleading, quietly returning when it is null is possibly wrong, and g.scheduleRun().get() ought to be returning without throwing an exception if the flow is already shut down.
Code changed in jenkins
User: Jesse Glick
[FXED JENKINS-25921] scheduleRun should not throw RejectedExecutionException simply because the flow is complete.
In fact adding an extra call to waitForWorkflowToComplete triggers the failure, confirming my hypothesis.