-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
-
workflow-cps 2.82
Very similar to JENKINS-53709 and JENKINS-41791. In some cases a reference to the CpsBodyExecution of a block-scoped step can outlive the actual execution of that step, causing a CpsThread that is no longer part of the CpsThreadGroup to stick around in the program.
The easiest way to run into this issue is if you create a closure in a node step, allow the closure to outlive the step (for example by assigning it to a variable), and then restart Jenkins after the node step completed but while the closure is still reachable in the Pipeline. The PlaceholderTask for the node step will be rehydrated when the Pipeline resumes, even though the node step already completed. If you are using dynamically provisioned agents, the Pipeline will either hang forever waiting for that agent or fail if Jenkins can tell that the node will never be available.
The specific issue is that CpsBodyExecution.onSuccess and CpsBodyExecution.onFailure are injected as continuations into the program. Those objects are instances of non-static inner classes in CpsBodyExecution because they need to access the fields and methods of the CpsBodyExecution, but this means that the program's serialized state includes the CpsBodyExecution and all of its state, most notably its CpsThread field.
Originally noticed because RestartingLoadStepTest.updatedBindingsOnRestart was flaky, see jenkinsci/workflow-cps-plugin#366.
Minimal reproducer:
def closure = null node { closure = { -> "This closure captures CpsBodyExecution" } } // node step is complete, but its execution is reachable via the closure's captured variables. // Agent is rehydrated if you restart Jenkins here. echo(closure())
- relates to
-
JENKINS-41791 Build cannot be resumed if parallel was used with Kubernetes plugin
- Resolved
-
JENKINS-53709 Parallel blocks in node blocks cause executors to be persisted outside of the node block
- Resolved
- links to