Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54735

Pipeline retry clause: optionally delay between retries

      I use a retry block to perform requests on a remote server that I don't control (e.g., Apple's Code Signing or Notarization service), and sometimes that remote server is offline.

      Currently retry immediately retries upon failure, so when the remote server is offline for an extended period of time, my pipeline project quickly retries and then the build fails.

      It would be nice to have the ability to make retry wait after each failure, with optional exponential backoff (doubling the wait time after each failure), to give these builds a greater chance of success.  For example:

      retry(tries:5, waitSecondsBetweenFailures:30, doubleWaitTimeAfterEachFailure:true)

          [JENKINS-54735] Pipeline retry clause: optionally delay between retries

          Steve Mokris added a comment - - edited

          A workaround in the meantime:

              retry(5) {
                  ret = sh returnStatus: true, script: 'some-failure-prone-script'
                  if (ret) {
                      sleep time: 30, unit: 'SECONDS'
                      error 'some-failure-prone-script failed'
                  }
              }
          

          …but there's no way to know when we're on the last retry attempt (see JENKINS-49341), so it unnecessarily waits after the last retry attempt fails.

          Steve Mokris added a comment - - edited A workaround in the meantime:     retry(5) {         ret = sh returnStatus: true , script: 'some-failure-prone-script'         if (ret) {             sleep time: 30, unit: 'SECONDS'             error 'some-failure-prone-script failed'         }     } …but there's no way to know when we're on the last retry attempt (see JENKINS-49341 ), so it unnecessarily waits after the last retry attempt fails.

          in response to the workaround, what about this...?

          ret = false
          retry(5) {
                  if (ret) {
                      sleep time: 30, unit: 'SECONDS'
                  } else {
                      ret = true
                  }
                  sh 'some-failure-prone-script'         
              }
          

          Adam Brousseau added a comment - in response to the workaround, what about this...? ret = false retry(5) {         if (ret) {             sleep time: 30, unit: 'SECONDS' } else { ret = true } sh 'some-failure-prone-script'             }

          asgeirn added a comment - - edited

          Here's a different approach:

          timeout(10) {
              waitUntil {
                  script {
                      def r = sh returnStatus: true, script: 'command'
                      return r == 0
                  }
              }
          }
          

          This will retry for up to 10 minutes. waitUntil automatically adds delays between the attempts.

          asgeirn added a comment - - edited Here's a different approach: timeout(10) { waitUntil { script { def r = sh returnStatus: true , script: 'command' return r == 0 } } } This will retry for up to 10 minutes. waitUntil automatically adds delays between the attempts.

          Matt Weisel added a comment -

          I created a retry with exponential backoff and jitter method:

          def retryWithBackoff(int retries = 1, int backoffSeconds = 5, int jitterSeconds = 30, Closure op) {
              def random = new Random();
              int backoffSecondsCurrent = 0;
              retry(retries) {
                  if (backoffSecondsCurrent > 0) {
                      sleep backoffSecondsCurrent
                  }
                  backoffSecondsCurrent += backoffSeconds + random.nextInt(jitterSeconds)
                  if (backoffSecondsCurrent < 0) {
                      Log.warn("Backoff seconds rolled over, got ${backoffSecondsCurrent}, resetting to ${backoffSeconds}")
                      backoffSecondsCurrent = backoffSeconds
                  }
                  op()
              }
          }
          
          

           

           

          Matt Weisel added a comment - I created a retry with exponential backoff and jitter method: def retryWithBackoff( int  retries = 1,  int  backoffSeconds = 5,  int  jitterSeconds = 30, Closure op) {     def random =  new  Random();      int  backoffSecondsCurrent = 0;     retry(retries) {          if  (backoffSecondsCurrent > 0) {             sleep backoffSecondsCurrent         }         backoffSecondsCurrent += backoffSeconds + random.nextInt(jitterSeconds)          if  (backoffSecondsCurrent < 0) {             Log.warn( "Backoff seconds rolled over, got ${backoffSecondsCurrent}, resetting to ${backoffSeconds}" )             backoffSecondsCurrent = backoffSeconds         }         op()     } }    

          João Pinto added a comment - - edited

          A Pull Request exists to add this functionality and an associated issue (JENKINS-59678) was created, however it seems to have stalled. I'm linking it here in hopes of reviving it.

          João Pinto added a comment - - edited A Pull Request exists to add this functionality and an associated issue ( JENKINS-59678 ) was created, however it seems to have stalled. I'm linking it here in hopes of reviving it.

          Scott Marlow added a comment -

          I heard feedback on https://github.com/jenkinsci/openstack-cloud-plugin/pull/365 that adding exponential backoff support to another part of Jenkins was not correct and learned that a deeper approach would be better.  I thought I would echo that here in case it helps those still working on this issue. 

          Also, https://www.jenkins.io/projects/gsoc/2023/project-ideas/agent_reconnections_exponential_backoff/ may also be related in that I think that seeks to unify changing how retry occurs in Jenkins.  At least that is what I think the intention is.  Note that they also identify using jitter to reduce spikes in CPU usage that otherwise might occur when several clients try to connect at the same time.

          Scott Marlow added a comment - I heard feedback on https://github.com/jenkinsci/openstack-cloud-plugin/pull/365 that adding exponential backoff support to another part of Jenkins was not correct and learned that a deeper approach would be better.  I thought I would echo that here in case it helps those still working on this issue.  Also, https://www.jenkins.io/projects/gsoc/2023/project-ideas/agent_reconnections_exponential_backoff/ may also be related in that I think that seeks to unify changing how retry occurs in Jenkins.  At least that is what I think the intention is.  Note that they also identify using jitter to reduce spikes in CPU usage that otherwise might occur when several clients try to connect at the same time.

            Unassigned Unassigned
            smokris Steve Mokris
            Votes:
            19 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated: