Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22427

Parameterized Remote Trigger Plugin fails when remote job waits for available executor

      When the remote job is still waiting for an available executor, the job that triggered the remote job, gives a build failed status.

      Triggering this remote job: *****
      Checking that the remote job ***** is not currently building.
      Remote job remote job ***** is not currenlty building.
      This job is build #[**] on the remote server.
      Triggering remote job now.
      Blocking local job until remote job completes
      ERROR: Remote build failed for the following reason:
      ERROR: http://****/job/***/**/api/json
      Finished: FAILURE

          [JENKINS-22427] Parameterized Remote Trigger Plugin fails when remote job waits for available executor

          Maurice W. added a comment - - edited

          My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus()

          Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far.

          Either way, I would still like to see the code you implemented.

          Maurice W. added a comment - - edited My alternate solution is to have RemoteBuildConfiguration::sendHTTPCall() treat 404's as a non-error when it's called from RemoteBuildConfiguration::getBuildStatus() Might not be the safest solution (since it walks up the call-stack to figure out who called it) but it certainly seems to be the most effective so far. Either way, I would still like to see the code you implemented.

          Hi Maurice,

          Here is the link to the code:
          https://github.com/v969540/ParamRemoteTrigger

          It would be nice if you could implement it in the official code or, if you know a better solution then mine
          I might have made little changes to the POM file etc, but these can be reverted if you like.

          I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this.

          Greetings,

          Kevin

          Kevin Van Poppel added a comment - Hi Maurice, Here is the link to the code: https://github.com/v969540/ParamRemoteTrigger It would be nice if you could implement it in the official code or, if you know a better solution then mine I might have made little changes to the POM file etc, but these can be reverted if you like. I don't think that it is safe to threat every 404 as a non-error. I was first thinking of a call to the build queue, but didn't found a way to do this. Greetings, Kevin

          Tim Brown added a comment -

          I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used.

          This allows any call to be retried up to a specific amount of times with an configurable interval.

          GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19

          Tim Brown added a comment - I had a similar issue here where we had network issues. I added a trying a retry in sendHTTPCall (when it caught a IOException is thrown). This currently waits 'pollInterval' seconds and retries sending the HTTP call again. This will retry up to this.getConnectionRetryLimit() times - which is currently hardcoded to 5. It would be goo to allow this as a configurable feature if used. This allows any call to be retried up to a specific amount of times with an configurable interval. GIST: https://gist.github.com/timbrown5/ec4add8797fbf4d9cd19

          Maurice W. added a comment -

          Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop).

          But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5).

          A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that?

          Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x

          Maurice W. added a comment - Tim - I like your solution of the recursive function with and tracking "numberOfAttempts". Requires minimal change to the current code base and is pretty easy to follow. And it still follows the same concept at Kevin outlined (a re-try loop). But the flaw they both have... is that they local build can still fail if the remote build does not complete before hitting the max number of retries. At the moment I have just added a new exception to give a clear indication as to why the build failed (you can see it here: https://gist.github.com/morficus/5bd94e330bf4212679b5 ). A work-around that would require no code change... is to increase the default "poll interval" from 10sec to 30sec or even 60sec. That at least decreases the odds of hitting the max number of retries (still hard-coded to 5) if the remote server does have a long-running build. What do you guys think of that? Also, all current changes are in this branch: https://github.com/morficus/Parameterized-Remote-Trigger-Plugin/tree/dev-v2.1.x

          Changing the poll interval didn't change anything here.
          The job already failed before polling the first time, because you get the 404 error.

          Do any of you guys know a way to check if the triggerd build is in the remote server's queue?

          Kevin Van Poppel added a comment - Changing the poll interval didn't change anything here. The job already failed before polling the first time, because you get the 404 error. Do any of you guys know a way to check if the triggerd build is in the remote server's queue?

          Tim Brown added a comment -

          Are you using 'wait to trigger remote builds until no other builds are running'?
          If not does it fix your issue? I found I needed both working together for it to be reliable.

          If so is the problem that the build gets triggered but is waiting in the queue (as it's waiting for and executor)?

          I will see if I can have a look tomorrow. Had a lot on the last week.

          Tim Brown added a comment - Are you using 'wait to trigger remote builds until no other builds are running'? If not does it fix your issue? I found I needed both working together for it to be reliable. If so is the problem that the build gets triggered but is waiting in the queue (as it's waiting for and executor)? I will see if I can have a look tomorrow. Had a lot on the last week.

          Tim Brown added a comment -

          It looks like the root issue is that we are not getting a response from the REST API. This is because until the build get's an executor it won't have a REST API page. whenw e try and call the page we get a null or response. There is an update coming for the JSON issue you're seeing, thanks to @scotthains, which will likely fix your issue.

          I think this should work because getBuildStatus method takes a null response (from sendHTTPCall) to mean the build has not yet started - the problem with this though is that this means if someone cancels the job before it gets an executor the Remote trigger plugsdsdin will wait indefinitely.

          The best way I can see to solve this is to try and get Jenkins Core to give jobs an API page before they get an executor. I'm not sure how easy that would be as, if I remember correctly, the job is a different class of object after it gets and executor and before.

          Did you check with the link Maurice posted?

          Tim Brown added a comment - It looks like the root issue is that we are not getting a response from the REST API. This is because until the build get's an executor it won't have a REST API page. whenw e try and call the page we get a null or response. There is an update coming for the JSON issue you're seeing, thanks to @scotthains, which will likely fix your issue. I think this should work because getBuildStatus method takes a null response (from sendHTTPCall) to mean the build has not yet started - the problem with this though is that this means if someone cancels the job before it gets an executor the Remote trigger plugsdsdin will wait indefinitely. The best way I can see to solve this is to try and get Jenkins Core to give jobs an API page before they get an executor. I'm not sure how easy that would be as, if I remember correctly, the job is a different class of object after it gets and executor and before. Did you check with the link Maurice posted?

          'wait to trigger remote builds until no other builds are running' doesn't fix the issue for me.

          Do you have any idea when this JSON update is getting released?

          Kevin Van Poppel added a comment - 'wait to trigger remote builds until no other builds are running' doesn't fix the issue for me. Do you have any idea when this JSON update is getting released?

          Code changed in jenkins
          User: Maurice Williams
          Path:
          CHANGELOG.md
          src/main/java/org/jenkinsci/plugins/ParameterizedRemoteTrigger/RemoteBuildConfiguration.java
          http://jenkins-ci.org/commit/parameterized-remote-trigger-plugin/0f704da096418c11c543c36382efc997b6125a2a
          Log:
          fixing JENKINS-22427 (https://issues.jenkins-ci.org/browse/JENKINS-22427)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Maurice Williams Path: CHANGELOG.md src/main/java/org/jenkinsci/plugins/ParameterizedRemoteTrigger/RemoteBuildConfiguration.java http://jenkins-ci.org/commit/parameterized-remote-trigger-plugin/0f704da096418c11c543c36382efc997b6125a2a Log: fixing JENKINS-22427 ( https://issues.jenkins-ci.org/browse/JENKINS-22427 )

          Maurice W. added a comment -

          this fix is part of the 2.1.2 release done on April 26th

          Maurice W. added a comment - this fix is part of the 2.1.2 release done on April 26th

            morficus Maurice W.
            v969540 Kevin Van Poppel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: