Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51454

Pipeline retry operation doesn't retry when there is a timeout inside of it

      When a timeout call is fired inside of a retry, retry is not being triggered and job execution is aborted, the only way to make it work is by surrounding the timeout operation with a try/catch.

      without try/catch

      Log output

      Cancelling nested steps due to timeout
      

      Execution result

      Timeout has been exceeded
      Finished: ABORTED
      

      with try/catch

      Log output

      Timeout set to expire after 2 sec without activity
      Sleeping for 2 sec
      Cancelling nested steps due to timeout
      ERROR: catched timeout! org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
      Retrying
      

      Execution result

      Finished: SUCCESS
      

       

      Examples to reproduce the issue

      Failing example

      node {
          def timeoutSeconds = 3
          stage('Preparation') { // for display purposes
              retry(3){
                  timeout(activity: true, time: 2, unit: 'SECONDS') {
                      sleep(timeoutSeconds)
                  }
                  timeoutSeconds--
              }
         }
      }
      

      Working example

      node {
          def timeoutSeconds = 3
          stage('Preparation') { // for display purposes
              retry(3){
                  try{
                      timeout(activity: true, time: 2, unit: 'SECONDS') {
                          sleep(timeoutSeconds)
                      }
                  }catch(err){
                      timeoutSeconds--
                      script{
                        def user = err.getCauses()[0].getUser()
                        error "[${user}] catched timeout! $err"
                      }
                  }
              }
         }
      }
      

          [JENKINS-51454] Pipeline retry operation doesn't retry when there is a timeout inside of it

          It looks like the org.jenkinsci.plugins.workflow.steps.FlowInterruptedException exception could be the  cause. If the timeout directive would throw another exception, would this solve the issue?

          Nepomuk Seiler added a comment - It looks like the org.jenkinsci.plugins.workflow.steps.FlowInterruptedException exception could be the  cause. If the timeout directive would throw another exception, would this solve the issue?

          If capturing the `FlowInterruptedException` it will retry also when aborting the job for any other reason, like canceling. Definitively, timeout should thow a different exception.

          Luis Piedra-Márquez added a comment - If capturing the ` FlowInterruptedException` it will retry also when aborting the job for any other reason, like canceling. Definitively, timeout should thow a different exception.

          Basil Crow added a comment -

          I can reproduce this error in a scripted pipeline as well.

          Basil Crow added a comment - I can reproduce this error in a scripted pipeline as well.

          Basil Crow added a comment -

          As a workaround, I was able to catch FlowInterruptedException and rethrow a more generic exception:

          import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException
          
          retry(3) {
            try {
              timeout(time: 10, unit: 'MINUTES') {
                [..]
              }
            } catch (FlowInterruptedException e) {
             // Work around https://issues.jenkins-ci.org/browse/JENKINS-51454
              error 'Timeout has been exceeded'
            }
          }
          

          Basil Crow added a comment - As a workaround, I was able to catch FlowInterruptedException and rethrow a more generic exception: import org.jenkinsci.plugins.workflow.steps.FlowInterruptedException retry(3) { try { timeout(time: 10, unit: 'MINUTES') { [..] } } catch (FlowInterruptedException e) { // Work around https://issues.jenkins-ci.org/browse/JENKINS-51454 error 'Timeout has been exceeded' } }

          Devin Nusbaum added a comment -

          I think there are two desired behaviors depending on the placement of timeout and retry relative to each other.

          First case (this ticket):

          retry(3) {
            timeout(time: 5, unit: 'MINUTES') {
              // Something that can fail
            }
          }
          

          In this case, if the timeout triggers, I think the desired behavior is for retry to run its body again. This is not the current behavior.

          Second case (not this ticket):

          timeout(time: 5, unit: 'MINUTES') {
            retry(3) {
              // Something that can fail
            }
          }
          

          In this case, if the timeout triggers, I think the desired behavior is for retry to be aborted without retrying anything. This is the current behavior, and as far as I understand, is working as-designed after JENKINS-44379.

          Switching timeout to use a different kind of exception would fix the first case, but break the second case. To support both use cases, something more complex would be needed (see PR 81 for a possible approach, although retry would need to be updated as well)). basil Noted that a fix for JENKINS-60354 might overlap a bit with this issue.

          Devin Nusbaum added a comment - I think there are two desired behaviors depending on the placement of timeout and retry relative to each other. First case (this ticket): retry(3) { timeout(time: 5, unit: 'MINUTES' ) { // Something that can fail } } In this case, if the timeout triggers, I think the desired behavior is for retry to run its body again. This is not the current behavior. Second case (not this ticket): timeout(time: 5, unit: 'MINUTES' ) { retry(3) { // Something that can fail } } In this case, if the timeout triggers, I think the desired behavior is for retry to be aborted without retrying anything. This is the current behavior, and as far as I understand, is working as-designed after  JENKINS-44379 . Switching timeout to use a different kind of exception would fix the first case, but break the second case. To support both use cases, something more complex would be needed (see PR 81 for a possible approach, although retry would need to be updated as well)). basil Noted that a fix for  JENKINS-60354 might overlap a bit with this issue.

          pixman20 added a comment -

          FWIW: I did a slight modification to the initial workaround and added it to our global shared libraries as retryEvenOnTimeout.
           
          It's slightly more robust:

          def call(Map retryOptions, Closure closure) {
              retry(retryOptions) {
                  try {
                      closure()
                  } catch(org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){
                      def causes = ex.getCauses()
                      if(causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) {
                          throw ex
                      }
                      // Wrap the real exception and throw a RuntimeException that'll trigger the retry
                      throw new RuntimeException(ex)
                  }
              }
          }
          

          Other potential solution options are to merge retry and timeout functionality to be able to allow the user to have the option of which functionality they need.

          pixman20 added a comment - FWIW: I did a slight modification to the initial workaround and added it to our global shared libraries as retryEvenOnTimeout.   It's slightly more robust: def call(Map retryOptions, Closure closure) { retry(retryOptions) { try { closure() } catch(org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){ def causes = ex.getCauses() if(causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) { throw ex } // Wrap the real exception and throw a RuntimeException that'll trigger the retry throw new RuntimeException(ex) } } } Other potential solution options are to merge retry and timeout functionality to be able to allow the user to have the option of which functionality they need.

          pixman20 Your retryEvenOnTimeout didn't quite work for me, but it did when I replaced "Map retryOptions" with "int retryCount":

          def call(int retryCount, Closure closure) {
              retry(retryCount) {
                  try {
                      closure()
                  } catch(org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){
                      def causes = ex.getCauses()
                      if(causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) {
                          throw ex
                      }
                      // Wrap the real exception and throw a RuntimeException that'll trigger the retry
                      throw new RuntimeException(ex)
                  }
              }
          }
          

          Jonathan Rogers added a comment - pixman20 Your retryEvenOnTimeout didn't quite work for me, but it did when I replaced "Map retryOptions" with "int retryCount": def call( int retryCount, Closure closure) { retry(retryCount) { try { closure() } catch (org.jenkinsci.plugins.workflow.steps.FlowInterruptedException ex){ def causes = ex.getCauses() if (causes.find { ! (it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution$ExceededTimeout) }) { throw ex } // Wrap the real exception and throw a RuntimeException that'll trigger the retry throw new RuntimeException(ex) } } }

          Basil Crow added a comment -

          Basil Crow added a comment - Fixed in jenkinsci/workflow-basic-steps-plugin#144 . Released in 2.24 .

            basil Basil Crow
            daconstenla David Constenla
            Votes:
            11 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: