Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-22547

Git timeout setting does not work for checkout

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • git-plugin
    • Windows (8) (the slow disk access probably makes it more likely to manifest)

      Some git processes error out with ERROR: Timeout after 10 minutes even though longer timeout is configured in the job.

      It affects the `git checkout` operation itself. The fetch operation is OK.

      The log says:

      using GIT_SSH to set credentials jenkins key for git
       > git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --prune
      Checking out Revision 159bc2b21669bc7b5217341fc8de9cd6b48439b2 (origin/dev/jan.hudec/pu)
       > git config core.sparsecheckout
       > git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2
      ERROR: Timeout after 10 minutes
      FATAL: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2
      

      When I manually removed the lock and repeated the checkout operation, it indeed took 11 minutes 15 seconds on the node where it failed.

      The global timeout does work, so it's not a blocker anymore. It is, however, rather non-obvious configuration as the -Dorg.jenkinsci.plugins.gitclient.Git.timeOut=30 (or whatever sufficiently large value) option needs to be added to both JVM options of the master and JVM options of all slaves. The master options can only be configured in the servlet container and while the slave options can be configured in node settings (hidden out under "Advanced" button), slaves running as windows service don't take this into account without reinstalling the service.

          [JENKINS-22547] Git timeout setting does not work for checkout

          Jan Hudec created issue -

          Jan Hudec added a comment - - edited

          It appears that the fix to make timeout configurable was incomplete.

          Jan Hudec added a comment - - edited It appears that the fix to make timeout configurable was incomplete.
          Jan Hudec made changes -
          Link New: This issue is related to JENKINS-20445 [ JENKINS-20445 ]

          Jan Hudec added a comment - - edited

          In the issue that there should be timeout it is mentioned that builds probably shouldn't be using it. Which I agree with; jobs can hang for many reasons. Also interrupting the clone causes additional damage, because often next time it fails because it remained locked.

          Jan Hudec added a comment - - edited In the issue that there should be timeout it is mentioned that builds probably shouldn't be using it. Which I agree with; jobs can hang for many reasons. Also interrupting the clone causes additional damage, because often next time it fails because it remained locked.
          Jan Hudec made changes -
          Link New: This issue is related to JENKINS-11286 [ JENKINS-11286 ]

          Mark Waite added a comment -

          I'm not understanding your description, or I can't duplicate the bug you're describing.

          I created a multi-configuration job which runs on a Windows machine and a Linux machine. The multi-configuration job is cloning a 3 GB git repository.

          The Linux machine has a reference copy of the repository stored at a known location, and that reference location is included in the job definition. That allows the Linux clone to complete very quickly (much less than a minute to clone).

          The Windows machine does not have a reference copy, so the clone takes much longer than the Linux machine. On my network, that clone seems to take as much as 2 minutes.

          I set the clone timeout to 1 minute. The Linux clone completes in less than 1 minute and is successful. The Windows machine performs the clone for 1 minute and then is interrupted at 1 minute (as expected by the timeout setting). The clone timeout value set on the multi-configuration job was honored by the job running on the Windows slave.

          Can you give more description about how you're configuring the longer timeout, or any other hints that may explain why I see timeout honored by the multi-configuration jobs and you do not see it being honored by multi-configuration jobs?

          Mark Waite added a comment - I'm not understanding your description, or I can't duplicate the bug you're describing. I created a multi-configuration job which runs on a Windows machine and a Linux machine. The multi-configuration job is cloning a 3 GB git repository. The Linux machine has a reference copy of the repository stored at a known location, and that reference location is included in the job definition. That allows the Linux clone to complete very quickly (much less than a minute to clone). The Windows machine does not have a reference copy, so the clone takes much longer than the Linux machine. On my network, that clone seems to take as much as 2 minutes. I set the clone timeout to 1 minute. The Linux clone completes in less than 1 minute and is successful. The Windows machine performs the clone for 1 minute and then is interrupted at 1 minute (as expected by the timeout setting). The clone timeout value set on the multi-configuration job was honored by the job running on the Windows slave. Can you give more description about how you're configuring the longer timeout, or any other hints that may explain why I see timeout honored by the multi-configuration jobs and you do not see it being honored by multi-configuration jobs?

          Jan Hudec added a comment -

          No reference copies involved. Well, I want to involve them, but I wanted to create them with a job.

          The clone takes about an hour for me. It is a local network, but the server is a slow virtual. I am configuring timeout via the advanced clone behaviours option in project configuration. It uses native (msys) git and passes ssh credentials.

          I have Jenkins 1.557 (it's always rather big pain to update as redeploy does not work correctly on the Windows glassfish), git-client-plugin 1.8.0 and git-plugin 2.2.0.

          I had problems with configurations run on different node than master (with or without shallow) and problems with configurations with shallow clone selected even on master
          while some builds on master seem to have passed.

          Jan Hudec added a comment - No reference copies involved. Well, I want to involve them, but I wanted to create them with a job. The clone takes about an hour for me. It is a local network, but the server is a slow virtual. I am configuring timeout via the advanced clone behaviours option in project configuration. It uses native (msys) git and passes ssh credentials. I have Jenkins 1.557 (it's always rather big pain to update as redeploy does not work correctly on the Windows glassfish), git-client-plugin 1.8.0 and git-plugin 2.2.0. I had problems with configurations run on different node than master (with or without shallow) and problems with configurations with shallow clone selected even on master while some builds on master seem to have passed.

          Mark Waite added a comment -

          I used the reference copy only as a way to assure that one of the multi-configuration jobs would complete before the timeout, while the other would exceed the timeout value.

          The msysgit client has a known bandwidth limit that it can only transfer about 1 MB / second over the ssh transport. It is much faster over the git transport, and I believe it is also faster over the https transport. The msysgit port uses a very old version of OpenSSH that has that bandwidth limit. Unfortunately, updating the OpenSSH version inside the msysgit port is very difficult, so no one has made that change yet.

          I still don't understand the difference between my configuration (where multi-configuration jobs honor the git timeout) and yours. Some of the differences you might try exploring include:

          • I used Linux, Windows 7 and Windows 8.1 as target operating systems, while yours seem to be Windows 8
          • I used a timeout less than the default 10 minutes, you use a timeout greater than the default 10 minutes
          • I used a git protocol URL while yours is ssh

          Can you upload the job definition file for further comparison?

          Can you upload a log from the failed build?

          Mark Waite added a comment - I used the reference copy only as a way to assure that one of the multi-configuration jobs would complete before the timeout, while the other would exceed the timeout value. The msysgit client has a known bandwidth limit that it can only transfer about 1 MB / second over the ssh transport. It is much faster over the git transport, and I believe it is also faster over the https transport. The msysgit port uses a very old version of OpenSSH that has that bandwidth limit. Unfortunately, updating the OpenSSH version inside the msysgit port is very difficult, so no one has made that change yet. I still don't understand the difference between my configuration (where multi-configuration jobs honor the git timeout) and yours. Some of the differences you might try exploring include: I used Linux, Windows 7 and Windows 8.1 as target operating systems, while yours seem to be Windows 8 I used a timeout less than the default 10 minutes, you use a timeout greater than the default 10 minutes I used a git protocol URL while yours is ssh Can you upload the job definition file for further comparison? Can you upload a log from the failed build?

          Mark Waite added a comment - - edited

          bulb Jan, I can't duplicate the problem you've reported and I haven't seen any response from you on my request for more information. I intend to close this bug in a week as "Could not reproduce", unless more details from you can help reproduce the bug.

          Mark Waite added a comment - - edited bulb Jan, I can't duplicate the problem you've reported and I haven't seen any response from you on my request for more information. I intend to close this bug in a week as "Could not reproduce", unless more details from you can help reproduce the bug.
          Mark Waite made changes -
          Resolution New: Cannot Reproduce [ 5 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

            markewaite Mark Waite
            bulb Jan Hudec
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: