Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22537

Seemingly random failures of GitHub plugin PR builder

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ghprb-plugin
    • None
    • Linux

      Sometimes when trying to build PR after GitHub hook has triggered, job fails while trying to checkout non-existent commit:

      Cloning the remote Git repository
      Cloning repository <private repo>
      Fetching upstream changes from <private repo>
      using GIT_SSH to set credentials GitHub jenkins-admin
      Fetching upstream changes from <private repo>
      using GIT_SSH to set credentials GitHub jenkins-admin
      Checking out Revision 58ef9bff662b45d099f0b7faba822661cbc144b6 (detached)
      FATAL: Could not checkout null with start point 58ef9bff662b45d099f0b7faba822661cbc144b6
      hudson.plugins.git.GitException: Could not checkout null with start point 58ef9bff662b45d099f0b7faba822661cbc144b6
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1448)
      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:896)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1411)
      at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
      at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:561)
      at hudson.model.Run.execute(Run.java:1665)
      at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:246)
      Caused by: hudson.plugins.git.GitException: Command "git checkout -f 58ef9bff662b45d099f0b7faba822661cbc144b6" returned status code 128:
      stdout:
      stderr: fatal: reference is not a tree: 58ef9bff662b45d099f0b7faba822661cbc144b6

      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1276)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1253)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1249)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1065)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1075)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1443)
      ... 9 more

      Launching parametrised build with manually defined ${sha1} always runs fine for same PR's. Exact trigger criteria is unclear but it seems to be related to force pushed to PR branch.

          [JENKINS-22537] Seemingly random failures of GitHub plugin PR builder

          BTW, the checkout fails because for some reason is trying to checkout an old for that pull request (that was force-pushed). Sometimes it works, sometimes it doesn't.

          Leandro Lucarella added a comment - BTW, the checkout fails because for some reason is trying to checkout an old for that pull request (that was force-pushed). Sometimes it works, sometimes it doesn't.

          Mark Waite added a comment - - edited

          I wonder if that is related to JENKINS-21980 ? That bug reports that polling can fail when a prior detached head commit no longer exists in the repository.

          Mark Waite added a comment - - edited I wonder if that is related to JENKINS-21980 ? That bug reports that polling can fail when a prior detached head commit no longer exists in the repository.

          Hm, exactly the case quoted in the issue description does not have any detached commits, it is a normal pull request branch. With this specific branch I can reliably trigger the issue by just doing `git commit --amend; git push -f origin branch-name` with no actual changes. Every new job triggered in such way fails. But I can't reproduce it on any artificially crafted branch.

          Mihails Strasuns added a comment - Hm, exactly the case quoted in the issue description does not have any detached commits, it is a normal pull request branch. With this specific branch I can reliably trigger the issue by just doing `git commit --amend; git push -f origin branch-name` with no actual changes. Every new job triggered in such way fails. But I can't reproduce it on any artificially crafted branch.

          Mark Waite added a comment -

          I thought that if the commit which was created prior to "git commit --amend" were seen by the Jenkins job, then after the "git commit --amend" that prior commit would now be "detached" because there are no longer any references to the prior commit.

          I'm not a gerrit user, so I may misunderstand the use cases, but I'm accustomed to "git commit --amend" leaving an "orphan" commit in the git reflog.

          Mark Waite added a comment - I thought that if the commit which was created prior to "git commit --amend" were seen by the Jenkins job, then after the "git commit --amend" that prior commit would now be "detached" because there are no longer any references to the prior commit. I'm not a gerrit user, so I may misunderstand the use cases, but I'm accustomed to "git commit --amend" leaving an "orphan" commit in the git reflog.

          Yeah, there is one. But I am not passing it to job manually, where it could possibly take the hash from? It is not referenced from any of branch commits anymore.

          Mihails Strasuns added a comment - Yeah, there is one. But I am not passing it to job manually, where it could possibly take the hash from? It is not referenced from any of branch commits anymore.

          Mark Waite added a comment -

          I thought that the git plugin remembered the SHA1 of the preceding commit it processed so that it could use that earlier SHA1 to generate the list of changes. If the SHA1 of the preceding commit was in the Jenkins job, then was somehow removed, it seems like that might cause the failure you're describing.

          Is there any chance the job moved from one node to another between the passing and the failing run?

          Was the workspace wiped between the passing and the failing case? If it was wiped, then the plugin might remember the SHA1 which no longer exists in the newly cloned repository.

          Mark Waite added a comment - I thought that the git plugin remembered the SHA1 of the preceding commit it processed so that it could use that earlier SHA1 to generate the list of changes. If the SHA1 of the preceding commit was in the Jenkins job, then was somehow removed, it seems like that might cause the failure you're describing. Is there any chance the job moved from one node to another between the passing and the failing run? Was the workspace wiped between the passing and the failing case? If it was wiped, then the plugin might remember the SHA1 which no longer exists in the newly cloned repository.

          Yeah, that's what JENKINS-21980 seems to be suggesting, that Jenkins remembers the hash of the last build (now detached) to check if there are changes or not.

          Leandro Lucarella added a comment - Yeah, that's what JENKINS-21980 seems to be suggesting, that Jenkins remembers the hash of the last build (now detached) to check if there are changes or not.

          Shouldn't it be present in reflog of local copy then? git reflog | grep 58ef has shown me nothing.
          Or it is last successful build?

          Mihails Strasuns added a comment - Shouldn't it be present in reflog of local copy then? git reflog | grep 58ef has shown me nothing. Or it is last successful build?

          Mark Waite added a comment -

          I think it should have been present in the repository reflog, unless the repository was wiped from one job to the next. At a minimum, that SHA1 should be somewhere in the Jenkins jobs directory, since the value needs to be remembered on disc somewhere.

          Mark Waite added a comment - I think it should have been present in the repository reflog, unless the repository was wiped from one job to the next. At a minimum, that SHA1 should be somewhere in the Jenkins jobs directory, since the value needs to be remembered on disc somewhere.

          Ok, we have done more detailed investigation and here are new findings:

          • faulty hash is hash of merge commit created by GitHub automatically from one of older builds (before force push)
          • problem manifests when using build configurations: in global build instance description correct hashes are used, but configuration-specific consoles still try to checkout the old one
          • it is still unclear how to get PR in that state but once it gets there, it is stuck always trying to checkout same hash in configuration

          Mihails Strasuns added a comment - Ok, we have done more detailed investigation and here are new findings: faulty hash is hash of merge commit created by GitHub automatically from one of older builds (before force push) problem manifests when using build configurations: in global build instance description correct hashes are used, but configuration-specific consoles still try to checkout the old one it is still unclear how to get PR in that state but once it gets there, it is stuck always trying to checkout same hash in configuration

          Is there any way to check parameters / env vars for configuration job? Using `export` in job script itself does not help because failed checkout happens before it.

          Mihails Strasuns added a comment - Is there any way to check parameters / env vars for configuration job? Using `export` in job script itself does not help because failed checkout happens before it.

          Mark Waite added a comment -

          There is a "pre-SCM build step" plugin which will allow you to place a job step to report the environment settings even prior to the start of the checkout.

          Mark Waite added a comment - There is a "pre-SCM build step" plugin which will allow you to place a job step to report the environment settings even prior to the start of the checkout.

          We used the EnvInject plugin to run some tests before the repo is checked out. When building multiple configurations, there seems to be 3 git repositories, the one in the regular workspace that is checked out by the main job, and then for each configuration, another repo living in workspace/VAR/VALUE, where the project is really built.

          We suspect those configuration-specific copies are not updated for some reason, and thus origin/pr/XXX/merge points to an old commit that is not present in that particular repo anymore.

          We wanted to force a git fetch on those repos before the checkout, but the problem is bofore the check those repo don't even exist! It looks like jenkins do a clone for them or something.

          Another weird thing is after doing this experimentation with EnvInject, the problematic PR got "fixed". We suspect that something might have been cleaned when the pre-checkout commit failed, and the checkout was never done, but all these are just wild guesses.

          We'll continue to investigate this as much as possible, but all points to some conflict with multiple configuration builds.

          Leandro Lucarella added a comment - We used the EnvInject plugin to run some tests before the repo is checked out. When building multiple configurations, there seems to be 3 git repositories, the one in the regular workspace that is checked out by the main job, and then for each configuration, another repo living in workspace/VAR/VALUE, where the project is really built. We suspect those configuration-specific copies are not updated for some reason, and thus origin/pr/XXX/merge points to an old commit that is not present in that particular repo anymore. We wanted to force a git fetch on those repos before the checkout, but the problem is bofore the check those repo don't even exist! It looks like jenkins do a clone for them or something. Another weird thing is after doing this experimentation with EnvInject, the problematic PR got "fixed". We suspect that something might have been cleaned when the pre-checkout commit failed, and the checkout was never done, but all these are just wild guesses. We'll continue to investigate this as much as possible, but all points to some conflict with multiple configuration builds.

          Also, I think this problem only happens when a PR is being tested and there is a push -f in the same PR, so a new build is queued. After that happens, the build seems to stay broken.

          Any idea about how this push -f while the PR is being tested can end up in this situation? Is any information about the hash being stored somewhere that is read then by the queued build for the same PR?

          Leandro Lucarella added a comment - Also, I think this problem only happens when a PR is being tested and there is a push -f in the same PR, so a new build is queued. After that happens, the build seems to stay broken. Any idea about how this push -f while the PR is being tested can end up in this situation? Is any information about the hash being stored somewhere that is read then by the queued build for the same PR?

          I can reproduce this easily now, is that, pushing --force to a PR that's being tested already in Jenkins.

          A simple test is just have a dummy project, which just a build script that is a sleep 60.

          Then create a PR, wait until jenkins runs the job to build it, while is at it, do a git rebase --force HEAD^ to the project and push --force to the PR branch and boom!

          If configurations are really a problem or not needs to be confirmed still, I'll do that as soon as possible.

          Leandro Lucarella added a comment - I can reproduce this easily now, is that, pushing --force to a PR that's being tested already in Jenkins. A simple test is just have a dummy project, which just a build script that is a sleep 60. Then create a PR, wait until jenkins runs the job to build it, while is at it, do a git rebase --force HEAD^ to the project and push --force to the PR branch and boom! If configurations are really a problem or not needs to be confirmed still, I'll do that as soon as possible.

          Mark Waite added a comment -

          Thanks very much for the detailed investigation! I think that means this bug has the same symptoms as JENKINS-21980. Could you construct a Java based test case (in the git-plugin source code) which shows the failure? That would increase the chances of someone who knows the plugin finding a solution for the problem.

          I think the issue is in the git-plugin handling of the case where a SHA1 was valid for a previous build, but is no longer valid. In that case, I assume the plugin should ignore the bad SHA1 and checkout something else instead. Do you have a recommendation for what should be used instead of the invalid SHA1?

          Mark Waite added a comment - Thanks very much for the detailed investigation! I think that means this bug has the same symptoms as JENKINS-21980 . Could you construct a Java based test case (in the git-plugin source code) which shows the failure? That would increase the chances of someone who knows the plugin finding a solution for the problem. I think the issue is in the git-plugin handling of the case where a SHA1 was valid for a previous build, but is no longer valid. In that case, I assume the plugin should ignore the bad SHA1 and checkout something else instead. Do you have a recommendation for what should be used instead of the invalid SHA1?

          I think the pr builder plugin should just do the git fetch with the pull request references and then just use the symbolic name of the references (origin/pr/N/merge) instead of using any hash. But I don't know exactly how is the interaction between this plugin and the git-plugin, I'm just thinking in terms of plain git commands. I really don't see why any of the plugins should use any numeric old hash.

          I'll do my best to try to reproduce this with a Java test case, but I can't promise anything because my Java-fu is practically non-existant (even when Java as a language is easy, the thousands layers of libraries are not easy to grasp at all :S).

          Also, I'll take another look at JENKINS-21980 now that I have a better idea of where the issue is.

          Leandro Lucarella added a comment - I think the pr builder plugin should just do the git fetch with the pull request references and then just use the symbolic name of the references (origin/pr/N/merge) instead of using any hash. But I don't know exactly how is the interaction between this plugin and the git-plugin, I'm just thinking in terms of plain git commands. I really don't see why any of the plugins should use any numeric old hash. I'll do my best to try to reproduce this with a Java test case, but I can't promise anything because my Java-fu is practically non-existant (even when Java as a language is easy, the thousands layers of libraries are not easy to grasp at all :S). Also, I'll take another look at JENKINS-21980 now that I have a better idea of where the issue is.

          I looked at JENKINS-21980 and even when it looks similar, the effects are different. GitHub report no changes, but they mention no problems with the checkout itself. So I wonder if is the same root cause for this or not.

          Leandro Lucarella added a comment - I looked at JENKINS-21980 and even when it looks similar, the effects are different. GitHub report no changes, but they mention no problems with the checkout itself. So I wonder if is the same root cause for this or not.

          BTW, guidance on how to fix this would be appreciated. I'm willing to spend some time on this, but I have no idea about Jenkins internals or how the plugins hooks to the different jenkins stages.

          Leandro Lucarella added a comment - BTW, guidance on how to fix this would be appreciated. I'm willing to spend some time on this, but I have no idea about Jenkins internals or how the plugins hooks to the different jenkins stages.

          There is also a possibility to provide some funding to fix this, is the FreedomSponsors.org sponsoring really used?

          Leandro Lucarella added a comment - There is also a possibility to provide some funding to fix this, is the FreedomSponsors.org sponsoring really used?

          Mark Waite added a comment -

          I don't think I've seen any bug that was fixed due to sponsoring. I don't watch all the bugs, but I do watch all the git-plugin and git-client-plugin bugs, and I don't recall any bug which has been fixed as a result of sponsoring.

          I'm not a Cloudbees employee, but there are Cloudbees employees who might be persuaded to work on this bug, though it would require that you have paid support from them. Paid support from them probably requires that you purchase their Jenkins Enterprise product.

          Mark Waite added a comment - I don't think I've seen any bug that was fixed due to sponsoring. I don't watch all the bugs, but I do watch all the git-plugin and git-client-plugin bugs, and I don't recall any bug which has been fixed as a result of sponsoring. I'm not a Cloudbees employee, but there are Cloudbees employees who might be persuaded to work on this bug, though it would require that you have paid support from them. Paid support from them probably requires that you purchase their Jenkins Enterprise product.

          OK, thanks for the advice. Since what we really need is GitHub integration, and I don't see a lot about it in cloudbees website, I'd be more inclined in finding a freelance jenkins devel that might be interested in addressing a few issues we have. I'll contact cloudbees anyway, you never know...

          Leandro Lucarella added a comment - OK, thanks for the advice. Since what we really need is GitHub integration, and I don't see a lot about it in cloudbees website, I'd be more inclined in finding a freelance jenkins devel that might be interested in addressing a few issues we have. I'll contact cloudbees anyway, you never know...

          Leandro Lucarella added a comment - Moved to: https://github.com/janinko/ghprb/issues/148

          Sorry, long thread, from description i see that it not github-plugin issue, but ghprb-plugin. Can this issue be reassigned or closed?

          Kanstantsin Shautsou added a comment - Sorry, long thread, from description i see that it not github-plugin issue, but ghprb-plugin. Can this issue be reassigned or closed?

            Unassigned Unassigned
            dicebot Mihails Strasuns
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: