• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • git-plugin
    • None

      Thanks for all your great work with this plugin; not sure how I'd live without it!

      Around ~1% of jobs triggered by GitHub pull requests fail to find the merge commit with output like the below. I'd love to help you debug this as it's been driving me mad for a wee while (I don't know why I've only just filed this now). If I immediately rebuild the job then it always works fine so it's a weird intermittent issue. Even a workaround that would let me e.g. retry the Git operations here would be fantastic.

      Thanks folks!

      Started by upstream project "Homebrew Pull Requests" build number 19312
      originally caused by:
      GitHub pull request #35564 of commit 13c8c95db0d399cd387ec362e35d9917df1465f8 automatically merged.
      [EnvInject] - Loading node environment variables.
      Building remotely on yosemite in workspace /Users/brew/Jenkins/workspace/Homebrew Pull Requests/version/yosemite
      > git rev-parse --is-inside-work-tree # timeout=10
      Fetching changes from the remote Git repository
      > git config remote.origin.url https://github.com/Homebrew/homebrew.git # timeout=10
      Fetching upstream changes from https://github.com/Homebrew/homebrew.git
      > git --version # timeout=10
      > git -c core.askpass=true fetch --tags --progress https://github.com/Homebrew/homebrew.git +refs/heads/:refs/remotes/origin/ +refs/pull/:refs/remotes/origin/pr/ # timeout=5
      Checking out Revision c1d392f8a050cb7049b3afee8772b34491150e21 (origin/pr/35564/merge)
      > git config core.sparsecheckout # timeout=10
      > git checkout -f c1d392f8a050cb7049b3afee8772b34491150e21
      FATAL: Could not checkout null with start point c1d392f8a050cb7049b3afee8772b34491150e21
      hudson.plugins.git.GitException: Could not checkout null with start point c1d392f8a050cb7049b3afee8772b34491150e21
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1674)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:152)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:145)
      at hudson.remoting.UserRequest.perform(UserRequest.java:121)
      at hudson.remoting.UserRequest.perform(UserRequest.java:49)
      at hudson.remoting.Request$2.run(Request.java:324)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
      at ......remote call to yosemite(Native Method)
      at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1356)
      at hudson.remoting.UserResponse.retrieve(UserRequest.java:221)
      at hudson.remoting.Channel.call(Channel.java:752)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:145)
      at sun.reflect.GeneratedMethodAccessor100.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:131)
      at com.sun.proxy.$Proxy52.execute(Unknown Source)
      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:992)
      at hudson.scm.SCM.checkout(SCM.java:488)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1257)
      at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:622)
      at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:528)
      at hudson.model.Run.execute(Run.java:1745)
      at hudson.matrix.MatrixRun.run(MatrixRun.java:146)
      at hudson.model.ResourceController.execute(ResourceController.java:89)
      at hudson.model.Executor.run(Executor.java:240)
      Caused by: hudson.plugins.git.GitException: Command "git checkout -f c1d392f8a050cb7049b3afee8772b34491150e21" returned status code 128:
      stdout:
      stderr: fatal: reference is not a tree: c1d392f8a050cb7049b3afee8772b34491150e21

      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1444)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:85)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1669)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:152)
      at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:145)
      at hudson.remoting.UserRequest.perform(UserRequest.java:121)
      at hudson.remoting.UserRequest.perform(UserRequest.java:49)
      at hudson.remoting.Request$2.run(Request.java:324)
      at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      Also viewable at http://bot.brew.sh/job/Homebrew%20Pull%20Requests/19312/version=yosemite/console

          [JENKINS-26290] Git plugin periodically fails to find commit

          Mark Waite added a comment - - edited

          I've seen that message infrequently myself. If you're willing to insert some debugging code into the git plugin and deploy it into your environment where the random failure happens, it could provide deeper understanding of the problem.

          What I'd suggest (without having written the code):

          • Insert additional debug code into the GitException which only executes if the stderr of the exception includes the message "reference is not a tree"
          • In that additional debug code, run "git show <SHA1-in-the-error>"
          • In that additional debug code, run "git ls-remote origin <SHA1-in-the-error>"

          I think the intermittent nature of the problem means that you'll need to insert the debug code into your production instance, or you'll need to configure another instance which shows the problem.

          Mark Waite added a comment - - edited I've seen that message infrequently myself. If you're willing to insert some debugging code into the git plugin and deploy it into your environment where the random failure happens, it could provide deeper understanding of the problem. What I'd suggest (without having written the code): Insert additional debug code into the GitException which only executes if the stderr of the exception includes the message "reference is not a tree" In that additional debug code, run "git show <SHA1-in-the-error>" In that additional debug code, run "git ls-remote origin <SHA1-in-the-error>" I think the intermittent nature of the problem means that you'll need to insert the debug code into your production instance, or you'll need to configure another instance which shows the problem.

          Mike McQuaid added a comment -

          I'm willing to insert that code, yeh. Longer-term I'd also be happy with just a retry on this too.

          Mike McQuaid added a comment - I'm willing to insert that code, yeh. Longer-term I'd also be happy with just a retry on this too.

          Mark Waite added a comment -

          The last retry logic change I submitted made things worse rather than better. I submitted JENKINS-26225 to remind me that it needs to be fixed, and to remind me that I need to write more automated tests surrounding retry logic.

          Mark Waite added a comment - The last retry logic change I submitted made things worse rather than better. I submitted JENKINS-26225 to remind me that it needs to be fixed, and to remind me that I need to write more automated tests surrounding retry logic.

          We saw this frequently (over 30% of the time I'd say - it was a fire) with our repo. This is the same issue as JENKINS-22537, with much further discussion at https://github.com/janinko/ghprb/issues/148. directhex explains it well there:

          It's a problem only with matrix jobs.

          Here's what happens:

          • "master" job of matrix gets executed, which determines the sha1 of the target commit (i.e. it turns pr/XXX/merge into abcdef1)
          • the matrix slaves get scheduled to build this sha1sum
          • the matrix slaves eventually execute the build

          The race condition here is that because the PR merge commit is ephemeral (i.e. it gets recreated every time there's a commit to either the PR, or the repo the PR targets). If there's any commit between when the matrix master running and the matrix slaves running (e.g. if one of your matrix axis slaves is overloaded so the scheduling takes some time), the problem occurs because the commitid that the matrix master saw no longer exists by the time the matrix slaves fetch the repo.

          The "easy" fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don't use matrix jobs at all.

          We have high PR traffic, so it's not unusual for PR builds to remain in the queue for 1-2 hours, and one configuration tends to start later than the others due to having fewer slave resources. In other words, we have the worst case scenario for this race condition, and trigger it all the time.

          We resolved by bluntly disabling the special-case matrix build logic. Patch against 2.4.0:

          diff --git a/src/main/java/hudson/plugins/git/GitSCM.java b/src/main/java/hudson/plugins/git/GitSCM.java
          index aca465d..bbddc53 100644
          --- a/src/main/java/hudson/plugins/git/GitSCM.java
          +++ b/src/main/java/hudson/plugins/git/GitSCM.java
          @@ -915,6 +915,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {
           
           
                   // every MatrixRun should build the same marked commit ID
          +        /*
                   if (build instanceof MatrixRun) {
                       MatrixBuild parentBuild = ((MatrixRun) build).getParentBuild();
                       if (parentBuild != null) {
          @@ -926,6 +927,7 @@ public class GitSCM extends GitSCMBackwardCompatibility {
                           }
                       }
                   }
          +        */
           
                   // parameter forcing the commit ID to build
                   if (candidates.isEmpty() ) {
          

          This introduces some risk that child configurations will build different merge commits. For us, that's by far the least of evils. We went from failing to checkout on a third of our builds to zero, and (unlike using the branch SHA1) retain high confidence that the merge won't break master.

          I'm pinging this issue because I think ghprb is doing something pretty reasonable - requesting the pr/<id>/merge ref - but it interacts poorly with the git plugin's matrix build handling. I think the optimal solution would be for ghprb to specify both the current branch commit and target commit (SHA1's), then for each slave to fetch both commits and merge them locally before building. That would test the merge, keep child builds consistent, and avoid the ephemeral merge ref issues. But I'd guess it requires changes to both plugins.

          Patrick Mihelich added a comment - We saw this frequently (over 30% of the time I'd say - it was a fire) with our repo. This is the same issue as JENKINS-22537 , with much further discussion at https://github.com/janinko/ghprb/issues/148 . directhex explains it well there: It's a problem only with matrix jobs. Here's what happens: "master" job of matrix gets executed, which determines the sha1 of the target commit (i.e. it turns pr/XXX/merge into abcdef1) the matrix slaves get scheduled to build this sha1sum the matrix slaves eventually execute the build The race condition here is that because the PR merge commit is ephemeral (i.e. it gets recreated every time there's a commit to either the PR, or the repo the PR targets). If there's any commit between when the matrix master running and the matrix slaves running (e.g. if one of your matrix axis slaves is overloaded so the scheduling takes some time), the problem occurs because the commitid that the matrix master saw no longer exists by the time the matrix slaves fetch the repo. The "easy" fix is to make sure matrix slaves build the merge ref, not the sha1. Or, as we did, don't use matrix jobs at all. We have high PR traffic, so it's not unusual for PR builds to remain in the queue for 1-2 hours, and one configuration tends to start later than the others due to having fewer slave resources. In other words, we have the worst case scenario for this race condition, and trigger it all the time. We resolved by bluntly disabling the special-case matrix build logic. Patch against 2.4.0: diff --git a/src/main/java/hudson/plugins/git/GitSCM.java b/src/main/java/hudson/plugins/git/GitSCM.java index aca465d..bbddc53 100644 --- a/src/main/java/hudson/plugins/git/GitSCM.java +++ b/src/main/java/hudson/plugins/git/GitSCM.java @@ -915,6 +915,7 @@ public class GitSCM extends GitSCMBackwardCompatibility { // every MatrixRun should build the same marked commit ID + /* if (build instanceof MatrixRun) { MatrixBuild parentBuild = ((MatrixRun) build).getParentBuild(); if (parentBuild != null ) { @@ -926,6 +927,7 @@ public class GitSCM extends GitSCMBackwardCompatibility { } } } + */ // parameter forcing the commit ID to build if (candidates.isEmpty() ) { This introduces some risk that child configurations will build different merge commits. For us, that's by far the least of evils. We went from failing to checkout on a third of our builds to zero, and (unlike using the branch SHA1) retain high confidence that the merge won't break master. I'm pinging this issue because I think ghprb is doing something pretty reasonable - requesting the pr/<id>/merge ref - but it interacts poorly with the git plugin's matrix build handling. I think the optimal solution would be for ghprb to specify both the current branch commit and target commit (SHA1's), then for each slave to fetch both commits and merge them locally before building. That would test the merge, keep child builds consistent, and avoid the ephemeral merge ref issues. But I'd guess it requires changes to both plugins.

          Ryan Hitchman added a comment -

          There's another bug in the Git plugin, unrelated to matrix builds. It's easy to diagnose by launching Jenkins with "-Dhudson.plugins.git.GitSCM.verbose=true". This build log shows the error with debugging on. Note how it attempts to check out a different branch from what it's trying to build!

          This code in the Git plugin does the wrong thing under these circumstances:

          1. The PR has already been built. This is a retest.
          2. The merge commit has not changed (upstream and the PR are unmodified), so "origin/pr/123/merge" has the same revision.
          3. The previous build was for a different PR.

          The Git plugin was originally written to continuously build a particular branch. GHPRB triggers it directly, causing the bad code flow:

          1. GHPRB launches build against origin/pr/123/merge
          2. Git plugin fetches origin/pr/123/merge, then tries to figure out what revision to build. It starts with the thing we triggered the build with.
          3. Git plugin sees that it has already built this revision, so it removes it from consideration.
          4. The "revisions to build" is empty, so it hits fallback logic that defaults to the previously built revision.
          5. The previously built revision is a different PR's merge, which hasn't been fetched, so the checkout fails.

          Rebuilding the git plugin with if (isPollCall) guarding the for-loop to remove already-built builds entirely fixed the problem for me, by preventing step 3 in the error flow.

          The fallback code is very suspicious-- if you manually triggered a build against some branch, I would never expect it to rebuild the last build's revision if it can't successfully find the revision I specified.

          Ryan Hitchman added a comment - There's another bug in the Git plugin, unrelated to matrix builds. It's easy to diagnose by launching Jenkins with "-Dhudson.plugins.git.GitSCM.verbose=true". This build log shows the error with debugging on. Note how it attempts to check out a different branch from what it's trying to build! This code in the Git plugin does the wrong thing under these circumstances: 1. The PR has already been built. This is a retest. 2. The merge commit has not changed (upstream and the PR are unmodified), so "origin/pr/123/merge" has the same revision. 3. The previous build was for a different PR. The Git plugin was originally written to continuously build a particular branch. GHPRB triggers it directly, causing the bad code flow: 1. GHPRB launches build against origin/pr/123/merge 2. Git plugin fetches origin/pr/123/merge, then tries to figure out what revision to build. It starts with the thing we triggered the build with. 3. Git plugin sees that it has already built this revision, so it removes it from consideration. 4. The "revisions to build" is empty, so it hits fallback logic that defaults to the previously built revision. 5. The previously built revision is a different PR's merge, which hasn't been fetched, so the checkout fails. Rebuilding the git plugin with if (isPollCall) guarding the for-loop to remove already-built builds entirely fixed the problem for me, by preventing step 3 in the error flow. The fallback code is very suspicious-- if you manually triggered a build against some branch, I would never expect it to rebuild the last build's revision if it can't successfully find the revision I specified.

          Anmol Bal added a comment -

          rmmh We are having a similar issue do you mind linking what changes you made for the plugin and how I may be able to reproduce this to get around this issue? Thank you.

          Anmol Bal added a comment - rmmh We are having a similar issue do you mind linking what changes you made for the plugin and how I may be able to reproduce this to get around this issue? Thank you.

          Ryan Hitchman added a comment -

          abal55 I just sent this PR to fix it. You should be able to backport it to whatever version you're using, and build your own plugin package with Gradle. Don't forget to modify pom.xml's version tag!

          Ryan Hitchman added a comment - abal55 I just sent this PR to fix it. You should be able to backport it to whatever version you're using, and build your own plugin package with Gradle. Don't forget to modify pom.xml's version tag!

          Anmol Bal added a comment -

          Thank you, I was able to get a build out with this and will give it a go.

          Anmol Bal added a comment - Thank you, I was able to get a build out with this and will give it a go.

            Unassigned Unassigned
            mikemcquaid Mike McQuaid
            Votes:
            7 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: