Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-26148

Incorrect implementation of RetryStepExecution.stop

      https://github.com/jenkinsci/workflow-plugin/commit/6be3d810597079905b6d19911bd40139ee4ec0eb purported to let stop kill its body. Yet this implementation can only work for the first round. For the second and subsequent rounds, the BodyExecution created by Callback.onFailure is thrown away, and body will still be the first one, which will presumably cause CpsBodyExecution.cancel to be a no-op since isDone().

      So somehow the Callback needs to get a handle back to its RetryStepExecution. But how? You could use StepExecution.applyAll to find all the candidates, but how would you select the right one if there are several? ExecutorStepExecution.Callback overloads cookie for this purpose, but that is used also for ProcessKiller; it seems like a hack here. ParallelStep uses ResultHandler, apparently assuming that identical objects are linked in the reference graph during serialization.

          [JENKINS-26148] Incorrect implementation of RetryStepExecution.stop

          Jesse Glick created issue -

          Jesse Glick added a comment -

          waitForCond would have the same issue.

          Jesse Glick added a comment - waitForCond would have the same issue.
          Jesse Glick made changes -
          Link New: This issue is blocking JENKINS-25570 [ JENKINS-25570 ]

          Jesse Glick added a comment -

          Just realized that CpsFlowExecution.interrupt only calls stop on getCurrentExecutions, which by its Javadoc only includes innermost executions. And this makes sense to me: it is only necessary to interrupt the stuff that is actually running, and if that throws up a FlowInterruptedException, then BodyExecutionCallback.onFailure should be called and the outer step should fail too.

          If that is right, then the current stop method is totally wrong: it should not call cancel on anything, and in fact it should be a no-op because it should never be called. (Or if it is called, it would be as a race condition between retries, meaning all it needs to do is immediately getContext().onFailure(cause).) But then Callback.onFailure needs to check for t instanceof InterruptedException and throw that up without retrying. And other stop methods on block-scoped steps are probably incorrect too.

          The correct handling of interruption by steps remains very unclear to me. There needs to be a definitive document explaining how interruption works and what steps are supposed to do in order to participate properly. And we need tests that actually trying interrupting flows at various points.

          Jesse Glick added a comment - Just realized that CpsFlowExecution.interrupt only calls stop on getCurrentExecutions , which by its Javadoc only includes innermost executions. And this makes sense to me: it is only necessary to interrupt the stuff that is actually running, and if that throws up a FlowInterruptedException , then BodyExecutionCallback.onFailure should be called and the outer step should fail too. If that is right, then the current stop method is totally wrong: it should not call cancel on anything, and in fact it should be a no-op because it should never be called. (Or if it is called, it would be as a race condition between retries, meaning all it needs to do is immediately getContext().onFailure(cause) .) But then Callback.onFailure needs to check for t instanceof InterruptedException and throw that up without retrying. And other stop methods on block-scoped steps are probably incorrect too. The correct handling of interruption by steps remains very unclear to me. There needs to be a definitive document explaining how interruption works and what steps are supposed to do in order to participate properly. And we need tests that actually trying interrupting flows at various points.
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-26131 [ JENKINS-26131 ]
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-26163 [ JENKINS-26163 ]

          Jesse Glick added a comment -

          WaitForConditionStep seems to at least not be buggy, though the code style may be overly complicated. body is updated whenever the block is rerun. stop behaves properly when there is no active body; when there is an active body, it tries to interrupt it, even though stop would not be called in this case. Probably the body field could simply be removed without loss of functionality.

          It also does some tricks with a random id and StepExecution.applyAll which are probably unnecessary; Callback could simply keep a reference to the Execution that owns it. This would not be a compatible change for running build records, though.

          Jesse Glick added a comment - WaitForConditionStep seems to at least not be buggy, though the code style may be overly complicated. body is updated whenever the block is rerun. stop behaves properly when there is no active body; when there is an active body, it tries to interrupt it, even though stop would not be called in this case. Probably the body field could simply be removed without loss of functionality. It also does some tricks with a random id and StepExecution.applyAll which are probably unnecessary; Callback could simply keep a reference to the Execution that owns it. This would not be a compatible change for running build records, though.
          Jesse Glick made changes -
          Link New: This issue is related to JENKINS-25550 [ JENKINS-25550 ]
          Jesse Glick made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ]
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 160138 ] New: JNJira + In-Review [ 180269 ]

            jglick Jesse Glick
            jglick Jesse Glick
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: