Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22427

Parameterized Remote Trigger Plugin fails when remote job waits for available executor

      When the remote job is still waiting for an available executor, the job that triggered the remote job, gives a build failed status.

      Triggering this remote job: *****
      Checking that the remote job ***** is not currently building.
      Remote job remote job ***** is not currenlty building.
      This job is build #[**] on the remote server.
      Triggering remote job now.
      Blocking local job until remote job completes
      ERROR: Remote build failed for the following reason:
      ERROR: http://****/job/***/**/api/json
      Finished: FAILURE

          [JENKINS-22427] Parameterized Remote Trigger Plugin fails when remote job waits for available executor

          Kevin Van Poppel created issue -

          Fixed it myself.
          When the build is waiting for an executor, the constructed url returns a 404 message.
          I added a loop where it tries to get a new response (every 5 seconds, max 5 min) instead of immediately sending a build failed signal.

          Kevin Van Poppel added a comment - Fixed it myself. When the build is waiting for an executor, the constructed url returns a 404 message. I added a loop where it tries to get a new response (every 5 seconds, max 5 min) instead of immediately sending a build failed signal.
          Kevin Van Poppel made changes -
          Assignee Original: Maurice W. [ morficus ] New: Kevin Van Poppel [ v969540 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          Maurice W. added a comment -

          Kevin - could you either share the patch you did as a gist (http://gist.github.com) or issue a PR.
          Thanks

          Maurice W. added a comment - Kevin - could you either share the patch you did as a gist ( http://gist.github.com ) or issue a PR. Thanks

          Maurice W. added a comment - - edited

          My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus()

          Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far.

          Either way, I would still like to see the code you implemented.

          Maurice W. added a comment - - edited My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus() Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far. Either way, I would still like to see the code you implemented.
          Maurice W. made changes -
          Assignee Original: Kevin Van Poppel [ v969540 ] New: Maurice W. [ morficus ]
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

          Hi Maurice,

          Here is the link to the code:
          https://github.com/v969540/ParamRemoteTrigger

          It would be nice if you could implement it in the official code or, if you know a better solution then mine
          I might have made little changes to the POM file etc, but these can be reverted if you like.

          I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this.

          Greetings,

          Kevin

          Kevin Van Poppel added a comment - Hi Maurice, Here is the link to the code: https://github.com/v969540/ParamRemoteTrigger It would be nice if you could implement it in the official code or, if you know a better solution then mine I might have made little changes to the POM file etc, but these can be reverted if you like. I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this. Greetings, Kevin

          Tim Brown added a comment -

          I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used.

          This allows any call to be retried up to a specific amount of times with an configurable interval.

          GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19

          Tim Brown added a comment - I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used. This allows any call to be retried up to a specific amount of times with an configurable interval. GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19

          Maurice W. added a comment -

          Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop).

          But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5).

          A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that?

          Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x

          Maurice W. added a comment - Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop). But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5 ). A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that? Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x

          Changing the poll interval didn't change anything here.
          The job already failed before polling the first time, because you get the 404 error.

          Do any of you guys know a way to check if the triggerd build is in the remote server's queue?

          Kevin Van Poppel added a comment - Changing the poll interval didn't change anything here. The job already failed before polling the first time, because you get the 404 error. Do any of you guys know a way to check if the triggerd build is in the remote server's queue?

            morficus Maurice W.
            v969540 Kevin Van Poppel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: