Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20750

Git plugin 2.0 sometimes fail to fetch (timeouts) with weird error

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • git-plugin
    • None

      Last week I started to experience some weird problems with the GIT fetch. As far as I remember, I did not update Jenkins or any of its plugins and I'm also not aware of any company GIT/networks changes. All of sudden, jobs started to fail fetching. And the problem is it happens randomly. Sometimes it works as before, sometimes it don't.

      Here is a typical error I'm getting:

      Started by timer
      Building on master in workspace C:\Documents and Settings\Tester\.jenkins\jobs\litebox3d_tiff_32_64\workspace
      Updating svn://krivan/Ranorex/trunk/xSpector at revision '2013-11-25T02:14:53.975 +0100'
      At revision 260
      no change for svn://krivan/Ranorex/trunk/xSpector since the previous build
      Fetching changes from the remote Git repository
      Fetching upstream changes from git@swserv:litebox3d
      ERROR: Timeout after 10 minutes
      FATAL: Failed to fetch from git@swserv:litebox3d
      hudson.plugins.git.GitException: Failed to fetch from git@swserv:litebox3d
      at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:612)
      at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:836)
      at hudson.plugins.git.GitSCM.checkout(GitSCM.java:861)
      at org.jenkinsci.plugins.multiplescms.MultiSCM.checkout(MultiSCM.java:117)
      at hudson.model.AbstractProject.checkout(AbstractProject.java:1412)
      at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:652)
      at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
      at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:557)
      at hudson.model.Run.execute(Run.java:1679)
      at hudson.matrix.MatrixBuild.run(MatrixBuild.java:304)
      at hudson.model.ResourceController.execute(ResourceController.java:88)
      at hudson.model.Executor.run(Executor.java:230)
      at hudson.model.OneOffExecutor.run(OneOffExecutor.java:43)
      Caused by: hudson.plugins.git.GitException: Command "fetch -t git@swserv:litebox3d +refs/heads/:refs/remotes/origin/" returned status code -1:
      stdout:
      stderr: Could not create directory 'c/Documents and Settings/Tester/.ssh'.

      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:981)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:920)
      at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.fetch(CliGitAPIImpl.java:187)
      at hudson.plugins.git.GitAPI.fetch(GitAPI.java:229)
      at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:610)
      ... 12 more

      It does not happen always! Just from time to time, but enought frequently to break our build/test workflow.

      What's really weird is this error:
      stderr: Could not create directory 'c/Documents and Settings/Tester/.ssh'.
      Please notice the missing colon character in the path! In any case, the pointed path exists (of course, only if the colon is available) so there should be no reason for such error, except if the reason is missing colon? But then why it does not happen always?

      I also noticed, that after this failure, it's impossible to manually delete the content of the workspace/.git folder, because of some locked files. And it does not help to just stop Jenkins. The machine must be restarted to be able to delete the remaining files.

      I'm aware of the issue JENKINS-20445 (too small Git plugin timeout) and while it appears to be also my problem, the reason/source of problem seems to something else? Up to last week, I never experienced such timeout problem. Please don't hesitate to contact me in case of questions regarding our jenkins/jobs setup.

          [JENKINS-20750] Git plugin 2.0 sometimes fail to fetch (timeouts) with weird error

          Mike added a comment -

          this is happening to me as well. Jenking Git client plugin v2.0, Windows 2008 R2, Git v1.8.4

          Mike added a comment - this is happening to me as well. Jenking Git client plugin v2.0, Windows 2008 R2, Git v1.8.4

          Pavel Kudrys added a comment -

          I've updated Jenkins as well as Jenkins GIT plugin and GIT itself, but nothing helped. I also increased the GIT plugin timeout to one hour (using -Dorg.jenkinsci.plugins.gitclient.Git.timeOut=60 parameter) but no change either. Simply, the jobs sometimes run, but most often, they fail with the above mentioned error.

          What I found really weird, and what's probably connected to this issue, is the number of git.exe/ssh.exe processes, which stay open (see the attached screenshot). All of these processes point to the (failed) Jenkins jobs. If I kill all these "hanged" processes, and then manually run the jobs, they all work!

          I found this discussion, where is mentioned a "workaround" to kill the SCM processes...
          http://stackoverflow.com/questions/10732940/git-operations-occasionally-hang-in-jenkins-on-windows
          The problem is that it kills all processes, so it's useless in my case, because I don't want to kill processes from working builds.

          Any other idea?

          Pavel Kudrys added a comment - I've updated Jenkins as well as Jenkins GIT plugin and GIT itself, but nothing helped. I also increased the GIT plugin timeout to one hour (using -Dorg.jenkinsci.plugins.gitclient.Git.timeOut=60 parameter) but no change either. Simply, the jobs sometimes run, but most often, they fail with the above mentioned error. What I found really weird, and what's probably connected to this issue, is the number of git.exe/ssh.exe processes, which stay open (see the attached screenshot). All of these processes point to the (failed) Jenkins jobs. If I kill all these "hanged" processes, and then manually run the jobs, they all work! I found this discussion, where is mentioned a "workaround" to kill the SCM processes... http://stackoverflow.com/questions/10732940/git-operations-occasionally-hang-in-jenkins-on-windows The problem is that it kills all processes, so it's useless in my case, because I don't want to kill processes from working builds. Any other idea?

          Mark Waite added a comment - - edited

          Pull requests have been accepted to both git-client-plugin and git-plugin. This fix is available in git-client-plugin 1.6.2. It will also need the next release of the git-plugin after 2.0.2

          Mark Waite added a comment - - edited Pull requests have been accepted to both git-client-plugin and git-plugin. This fix is available in git-client-plugin 1.6.2. It will also need the next release of the git-plugin after 2.0.2

          Mark Waite added a comment -

          Could you check the value of your HOMEDRIVE and HOMEPATH environment variables? I believe the original report (C/Documents and Settings/Tester/.ssh might indicate that HOMEDRIVE is set as "C" when it needs to be "C:".

          You might also check the value of your (optional) HOME environment variable, in case it is affecting things. If the HOME environment variable had been set for the Git plugin, it might have needed to be /C/Documents and Settings/Tester. Maybe you're missing the leading slash on the value in the HOME environment variable?

          Mark Waite added a comment - Could you check the value of your HOMEDRIVE and HOMEPATH environment variables? I believe the original report (C/Documents and Settings/Tester/.ssh might indicate that HOMEDRIVE is set as "C" when it needs to be "C:". You might also check the value of your (optional) HOME environment variable, in case it is affecting things. If the HOME environment variable had been set for the Git plugin, it might have needed to be /C/Documents and Settings/Tester. Maybe you're missing the leading slash on the value in the HOME environment variable?

          I've run into this as well. It has something to do with concurrent msys processes. I've reported my findings to the msys and msysgit mailing lists. I haven't had time to really dig into it and figure out what's going on.

          I've done a few things to try and work around the issue:

          1. Minimized concurrent jobs on each node. Matrix jobs in particular tend to cause more issues.
          2. Replaced all shell scripts in jobs with Python scripts. Windows batch scripts were also an option, but Windows batch scripting is absolutely terrible.
          3. Jobs often need to run additional git commands to properly clean the repository, deal with deleted submodules, etc. I run all of these through the following Python function that checks for a corrupt environment and aborts instead of just hanging.
            def print_exec_corrupt_check(args):
                """Prints a command to be executed and then executes it
                """
                cmdline = " ".join(args)
            
                print "============================================================"
                print "Exec:", cmdline
                print
                sys.stdout.flush()
            
                # Add an environment check before the command
                cmdline = r'if [ ! -r "$HOME" ]; then echo "Corrupt environment detected, aborting..."; exit 1; fi; ' + cmdline
            
                subprocess.check_call([r'C:\Program Files (x86)\Git\bin\bash.exe', r'-c', cmdline], bufsize=-1)
                print
                sys.stdout.flush()
            

          Possible future workarounds:

          1. Split matrix jobs out into individual jobs using the DSL plugin.
          2. Switch Jenkins to jgit. If jgit isn't mature enough yet, then maybe try Dulwich/Gittle.
          3. Fix the issue in msys.

          Michael Vincent added a comment - I've run into this as well. It has something to do with concurrent msys processes. I've reported my findings to the msys and msysgit mailing lists. I haven't had time to really dig into it and figure out what's going on. I've done a few things to try and work around the issue: Minimized concurrent jobs on each node. Matrix jobs in particular tend to cause more issues. Replaced all shell scripts in jobs with Python scripts. Windows batch scripts were also an option, but Windows batch scripting is absolutely terrible. Jobs often need to run additional git commands to properly clean the repository, deal with deleted submodules, etc. I run all of these through the following Python function that checks for a corrupt environment and aborts instead of just hanging. def print_exec_corrupt_check(args): """Prints a command to be executed and then executes it """ cmdline = " " .join(args) print "============================================================" print "Exec:" , cmdline print sys.stdout.flush() # Add an environment check before the command cmdline = r ' if [ ! -r "$HOME" ]; then echo "Corrupt environment detected, aborting..." ; exit 1; fi; ' + cmdline subprocess.check_call([r 'C:\Program Files (x86)\Git\bin\bash.exe' , r '-c' , cmdline], bufsize=-1) print sys.stdout.flush() Possible future workarounds: Split matrix jobs out into individual jobs using the DSL plugin. Switch Jenkins to jgit. If jgit isn't mature enough yet, then maybe try Dulwich/Gittle. Fix the issue in msys.

          Mark Waite added a comment -

          Since it seems the problem may be outside Jenkins and outside the git plugin, would you be willing to close this as "Not a Bug", or do you think there is still a reasonable chance this will be a bug in Jenkins or the git plugin?

          Mark Waite added a comment - Since it seems the problem may be outside Jenkins and outside the git plugin, would you be willing to close this as "Not a Bug", or do you think there is still a reasonable chance this will be a bug in Jenkins or the git plugin?

          Pavel Kudrys added a comment -

          I agree with Michael that this problem has something to do with concurrent processes. I guess it has something to do with GIT pooling? The solution that worked for me was switching to Jenkins Credentials! Previously, I did not use credential stored in Jenkins. Since I stored the GIT credentials in Jenkins and I set those credentials in GIT plugin, all works OK! So far, I experienced only one fetch timeout with the same error as before and sure enough, there was multiple instances of git running at the same time. Once I killed them, all operations returned back to normal.

          In my opinion, there is something wrong either in Jenskin or GIT plugin. A question worth of a million is, what could be a reason of such behavior? In any case, switching to Jenkins Credentials seems minimize the error rate to minimum.

          Pavel Kudrys added a comment - I agree with Michael that this problem has something to do with concurrent processes. I guess it has something to do with GIT pooling? The solution that worked for me was switching to Jenkins Credentials! Previously, I did not use credential stored in Jenkins. Since I stored the GIT credentials in Jenkins and I set those credentials in GIT plugin, all works OK! So far, I experienced only one fetch timeout with the same error as before and sure enough, there was multiple instances of git running at the same time. Once I killed them, all operations returned back to normal. In my opinion, there is something wrong either in Jenskin or GIT plugin. A question worth of a million is, what could be a reason of such behavior? In any case, switching to Jenkins Credentials seems minimize the error rate to minimum.

          Based on my analysis so far, I'm certain this is an msys (or Windows?) bug. Jenkins' usage patterns are quite different from a typical developer using git manually and I think that's what's causing issue to show up.

          Using Jenkins credentials is an interesting idea! I can see how that would enable ssh to work even with a corrupt environment. Running other git commands from a build script might still run into issues with a corrupt environment though.

          Michael Vincent added a comment - Based on my analysis so far, I'm certain this is an msys (or Windows?) bug. Jenkins' usage patterns are quite different from a typical developer using git manually and I think that's what's causing issue to show up. Using Jenkins credentials is an interesting idea! I can see how that would enable ssh to work even with a corrupt environment. Running other git commands from a build script might still run into issues with a corrupt environment though.

          Rob Duff added a comment - - edited

          Having spent days trying to track this down with our team, I thought I'd post in efforts to help others when this occurs. I'll explain as best I can, but I didn't actually fix the problem, so I may not have everything dead-on.

          In our case, we had contention between two instances of git running at the same time through SSH. The first instance would run, and the second would somehow get blocked when reading the known_hosts file and ask you to authenticate, causing the plugin to just sit there until the timeout occurs.

          This may be of use: http://www.joedog.org/2012/07/ssh-disable-known_hosts-prompt/

          Rob Duff added a comment - - edited Having spent days trying to track this down with our team, I thought I'd post in efforts to help others when this occurs. I'll explain as best I can, but I didn't actually fix the problem, so I may not have everything dead-on. In our case, we had contention between two instances of git running at the same time through SSH. The first instance would run, and the second would somehow get blocked when reading the known_hosts file and ask you to authenticate, causing the plugin to just sit there until the timeout occurs. This may be of use: http://www.joedog.org/2012/07/ssh-disable-known_hosts-prompt/

          Peter Drier added a comment -

          We're having a similar problem.. 2008 server, Jenkins 1.560, git-client-plugin 1.8.0.

          git polling hangs, strange errors with can't create ~/.ssh folder.. (the folder is there) running as jenkins user on the server, not as system account. No HOMEDRIVE or HOMEPATH environment variables are set.

          We use this script to kill the >3 minute SCM Polling processes, which seems to get things going again fairly reliably.

          Jenkins.instance.getTrigger("SCMTrigger").getRunners().each()
          {
            item ->
              println(item.getTarget().name)
              println(item.getDuration())
              println(item.getStartTime())
              long millis = Calendar.instance.time.time - item.getStartTime()
          
              if(millis > (1000 * 60 * 3)) // 1000 millis in a second * 60 seconds in a minute * 3 minutes
              {
                Thread.getAllStackTraces().keySet().each()
                { 
                  tItem ->
                    if (tItem.getName().contains("SCM polling") && tItem.getName().contains(item.getTarget().name))
                    { 
                      println "Interrupting thread " + tItem.getName(); 
                      tItem.interrupt()
                    }
                 }
              }
          }
          

          It would be nice if we could set the SCM polling timeout separately from the general GIT one. (1 minute should always be sufficient)

          Peter Drier added a comment - We're having a similar problem.. 2008 server, Jenkins 1.560, git-client-plugin 1.8.0. git polling hangs, strange errors with can't create ~/.ssh folder.. (the folder is there) running as jenkins user on the server, not as system account. No HOMEDRIVE or HOMEPATH environment variables are set. We use this script to kill the >3 minute SCM Polling processes, which seems to get things going again fairly reliably. Jenkins.instance.getTrigger( "SCMTrigger" ).getRunners().each() { item -> println(item.getTarget().name) println(item.getDuration()) println(item.getStartTime()) long millis = Calendar.instance.time.time - item.getStartTime() if (millis > (1000 * 60 * 3)) // 1000 millis in a second * 60 seconds in a minute * 3 minutes { Thread .getAllStackTraces().keySet().each() { tItem -> if (tItem.getName().contains( "SCM polling" ) && tItem.getName().contains(item.getTarget().name)) { println "Interrupting thread " + tItem.getName(); tItem.interrupt() } } } } It would be nice if we could set the SCM polling timeout separately from the general GIT one. (1 minute should always be sufficient)

            ndeloof Nicolas De Loof
            odklizec Pavel Kudrys
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: