-
Bug
-
Resolution: Fixed
-
Major
-
Windows (8) (the slow disk access probably makes it more likely to manifest)
-
Powered by SuggestiMate
Some git processes error out with ERROR: Timeout after 10 minutes even though longer timeout is configured in the job.
It affects the `git checkout` operation itself. The fetch operation is OK.
The log says:
using GIT_SSH to set credentials jenkins key for git > git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --prune Checking out Revision 159bc2b21669bc7b5217341fc8de9cd6b48439b2 (origin/dev/jan.hudec/pu) > git config core.sparsecheckout > git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2 ERROR: Timeout after 10 minutes FATAL: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2
When I manually removed the lock and repeated the checkout operation, it indeed took 11 minutes 15 seconds on the node where it failed.
The global timeout does work, so it's not a blocker anymore. It is, however, rather non-obvious configuration as the -Dorg.jenkinsci.plugins.gitclient.Git.timeOut=30 (or whatever sufficiently large value) option needs to be added to both JVM options of the master and JVM options of all slaves. The master options can only be configured in the servlet container and while the slave options can be configured in node settings (hidden out under "Advanced" button), slaves running as windows service don't take this into account without reinstalling the service.
- duplicates
-
JENKINS-20387 git submodule update timeout value should be configurable per job
-
- Closed
-
-
JENKINS-37185 Checkout timeout is not honored when used with local branch parameter
-
- Closed
-
- is related to
-
JENKINS-11286 Git plugin does not timeout
-
- Closed
-
-
JENKINS-20445 Git plugin timeout is too small
-
- Closed
-
[JENKINS-22547] Git timeout setting does not work for checkout
In the issue that there should be timeout it is mentioned that builds probably shouldn't be using it. Which I agree with; jobs can hang for many reasons. Also interrupting the clone causes additional damage, because often next time it fails because it remained locked.
I'm not understanding your description, or I can't duplicate the bug you're describing.
I created a multi-configuration job which runs on a Windows machine and a Linux machine. The multi-configuration job is cloning a 3 GB git repository.
The Linux machine has a reference copy of the repository stored at a known location, and that reference location is included in the job definition. That allows the Linux clone to complete very quickly (much less than a minute to clone).
The Windows machine does not have a reference copy, so the clone takes much longer than the Linux machine. On my network, that clone seems to take as much as 2 minutes.
I set the clone timeout to 1 minute. The Linux clone completes in less than 1 minute and is successful. The Windows machine performs the clone for 1 minute and then is interrupted at 1 minute (as expected by the timeout setting). The clone timeout value set on the multi-configuration job was honored by the job running on the Windows slave.
Can you give more description about how you're configuring the longer timeout, or any other hints that may explain why I see timeout honored by the multi-configuration jobs and you do not see it being honored by multi-configuration jobs?
No reference copies involved. Well, I want to involve them, but I wanted to create them with a job.
The clone takes about an hour for me. It is a local network, but the server is a slow virtual. I am configuring timeout via the advanced clone behaviours option in project configuration. It uses native (msys) git and passes ssh credentials.
I have Jenkins 1.557 (it's always rather big pain to update as redeploy does not work correctly on the Windows glassfish), git-client-plugin 1.8.0 and git-plugin 2.2.0.
I had problems with configurations run on different node than master (with or without shallow) and problems with configurations with shallow clone selected even on master
while some builds on master seem to have passed.
I used the reference copy only as a way to assure that one of the multi-configuration jobs would complete before the timeout, while the other would exceed the timeout value.
The msysgit client has a known bandwidth limit that it can only transfer about 1 MB / second over the ssh transport. It is much faster over the git transport, and I believe it is also faster over the https transport. The msysgit port uses a very old version of OpenSSH that has that bandwidth limit. Unfortunately, updating the OpenSSH version inside the msysgit port is very difficult, so no one has made that change yet.
I still don't understand the difference between my configuration (where multi-configuration jobs honor the git timeout) and yours. Some of the differences you might try exploring include:
- I used Linux, Windows 7 and Windows 8.1 as target operating systems, while yours seem to be Windows 8
- I used a timeout less than the default 10 minutes, you use a timeout greater than the default 10 minutes
- I used a git protocol URL while yours is ssh
Can you upload the job definition file for further comparison?
Can you upload a log from the failed build?
bulb Jan, I can't duplicate the problem you've reported and I haven't seen any response from you on my request for more information. I intend to close this bug in a week as "Could not reproduce", unless more details from you can help reproduce the bug.
I finally got around to trying it again. The important point is that the operation that fails is checkout.
I created a job to check out a reference copy on each node and that worked for an hour and succeeded just fine the first time around. But I set it with sparse checkout of just a few files.
Then I've set up the actual job. It succeeded building some configurations, but not others. It has problems specifically on one slave node. It might be that it's disks are slower and the repository is so large that checking it out takes just over 10 minutes on that node and just under 10 minutes on the other or that that node happened to be more loaded or something.
It should also be noted, that killing the checkout operation leaves the lock behind, so Jenkins won't recover from this without serious manual surgery.
Below is relevant log. Note, that the build took 26 minutes in total, but it clearly says it timed out after 10 minutes, so the limit was not applied cumulatively. It is possible, that the checkout command alone indeed took 10 minutes.
Building remotely on Win8-builder (various labels) in workspace D:\Jenkins\workspace\Project\LABEL\Android Cloning the remote Git repository Using shallow clone Cloning repository git@git.company.com:project > git init D:\Jenkins\workspace\Project\LABEL\Android\src Fetching upstream changes from git@git.company.com:project > git --version using GIT_SSH to set credentials jenkins key for git > git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --depth=1 > git config remote.origin.url git@git.company.com:project > git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/* > git config remote.origin.url git@git.company.com:project Pruning obsolete local branches Fetching upstream changes from git@git.company.com:project using GIT_SSH to set credentials jenkins key for git > git fetch --tags --progress git@git.company.com:project +refs/heads/*:refs/remotes/origin/* --prune Checking out Revision 159bc2b21669bc7b5217341fc8de9cd6b48439b2 (origin/dev/jan.hudec/pu) > git config core.sparsecheckout > git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2 ERROR: Timeout after 10 minutes FATAL: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2 hudson.plugins.git.GitException: Could not checkout null with start point 159bc2b21669bc7b5217341fc8de9cd6b48439b2 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1479) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153) at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:326) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at hudson.remoting.Engine$1$1.run(Engine.java:63) at java.lang.Thread.run(Unknown Source) Caused by: hudson.plugins.git.GitException: Command "git checkout -f 159bc2b21669bc7b5217341fc8de9cd6b48439b2" returned status code -1: stdout: stderr: at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1307) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1283) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1279) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1084) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1094) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$8.execute(CliGitAPIImpl.java:1474) ... 11 more
Indeed, when looking at the source (https://github.com/jenkinsci/git-client-plugin/blob/master/src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java), the configurable timeout only applies to commands launched via launchCommandWithCredentials, but local commands including checkout are launched via launchCommand which just uses the TIMEOUT constant.
The org.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT global does apply and so should the `org.jenkinsci.plugins.gitclient.Git.timeOut` global property. Unfortunately it's been changed to final between it's introduction and current version, so I can't easily test whether it works using script console. Mind you, one can't restart production build server on a whim.
So I restarted the server. It now has correct value on the master. But getting it to the appropriate slave was rather complicated. The jvm options in node config didn't have effect on the web-start-installed-service (would need to reinstall it, probably). Only editing the jenkins-slave.xml worked.
Finally managed to update the global timeout. It appears to fix the issue, so I downgraded the severity, but it is rather inconvenient. At the very least it needs to be prominently documented.
I am also seeing this behavior. The clone operation completes quickly (I am using a reference copy since my repository is over 3gb in size), but the checkout tends to take 15-20 minutes and that is what fails. It is a blocker for me to upgrade as the workaround of setting a global timeout is not practical in my environment.
We have about 20 repositories and over 30 unix and Windows slaves. Most of our repositories are small and the ten-minute timeout is sufficient. But we have three large legacy projects and for those projects only we need to increase or eliminate the timeout.
Another work around which may work for some users is to use sparse checkout and only checkout the subtree which is actually used. If the checkout of the subtree you need can be completed in less than 10 minutes, then sparse checkout would be a solution.
Add the "Additional Behaviours" - "Sparse Checkout paths", then list the directories needed. The plugin will copy the entire repository, but only checkout the directories specified by the sparse checkout.
The next git-client-plugin release after 1.10.0 will include the API support (and unit tests) for timeout on the checkout command. That is necessary to add timeout on checkout, but it is not sufficient.
I hope to prepare a merge request for the git plugin which will provide the user interface necessary to access that API. While reviewing the git-plugin, it looks like the simplest approach will be to add a new "Additional Behaviour" for "Advanced checkout behaviours". The initial implementation would contain a single field for the user provided value of the checkout timeout in minutes.
Change has been submitted to git-plugin and should be available in the next release after 2.2.2.
@Jan Hudec: How did you update the org.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT global?
Use -Dorg.jenkinsci.plugins.gitclient.CliGitAPIImpl.TIMEOUT=30 from the command line which starts a slave agent, and from the command line which starts the Jenkins master.
I don't have access to the command line. How can I do that if I simply restart Jenkins from the Web interface?
Setting a Jenkins startup property requires command line access. The TIMEOUT variable is immutable, defined at process startup, and cannot be changed after that. Without command line access, you can't change that variable.
With the changes included in git plugin 2.2.3 and later, checkout timeout can now be set from the user interface. The clone timeout has been adjustable for a long time.
The only operation requested in this bug report which can't yet have its timeout adjusted is the "git clean" operation. If clean is timing out for you, then you could instead uncheck the "Clean before checkout" and "Clean after checkout" boxes, and place a first build step "git clean -xfd" or "git clean -xffd" if you use submodules. Command performed as part of build steps have no timeout.
@Mark Waite: Thanks. I didn't notice the clone and checkout timeouts were different so I was setting the checkout timeout and thinking it was strange the timeout while cloning was still set to 10mins.
It's working now.
Code changed in jenkins
User: Paul Sokolovsky
Path:
src/main/resources/hudson/plugins/git/extensions/impl/CheckoutOption/help-timeout.html
src/main/resources/hudson/plugins/git/extensions/impl/CloneOption/help-timeout.html
src/main/resources/hudson/plugins/git/extensions/impl/SubmoduleOption/help-timeout.html
http://jenkins-ci.org/commit/git-plugin/a4732399d57fabc8d6b5237871609a8f4ef15329
Log:
help-timeout.html: Global timeout property should be set on master and slave.
As explained in https://issues.jenkins-ci.org/browse/JENKINS-22547 and
confirmed first-hand.
I don't like to be the devils one, but I rehit this issue in 3.0.0-beta2 and some versions before (at least the 2.5 beta).
This is a multibranch pipeline project.
Clone as well as checkout and submodule timeout are set to 45 minutes
> C:\Program Files\Git\bin\git.exe rev-parse "origin/master^{commit}" # timeout=10 Checking out Revision f844df99b9bb4d27ebb3dfa111e458090a5cb9a7 (origin/master) > C:\Program Files\Git\bin\git.exe config core.sparsecheckout # timeout=10 > C:\Program Files\Git\bin\git.exe checkout -f f844df99b9bb4d27ebb3dfa111e458090a5cb9a7 ERROR: Timeout after 10 minutes
manschwetus thanks for the report and for reopening the bug. I think I have confirmed the same failure as you reported. The steps I took:
- Define an agent using a directory on a slow USB 2.0 disc drive, limit that agent to only run jobs assigned to the agent
- Create a job which clones the linux kernel (git://github.com/torvalds/linux.git) with a clone timeout of 40 minutes and a checkout timeout of 14 minutes
- Run the job, watch the console log, notice the last line in the log shows "timeout=10" when it should show "timeout=14"
Started by an SCM change [EnvInject] - Loading node environment variables. Building remotely on mark-pc1-slow-disc (Ubuntu-14.04 amd64-Ubuntu-14.04 amd64-Ubuntu linux Ubuntu-14 14.04 amd64 authenticating-git linux-with-supported-git slow-disc Ubuntu 64bit) in workspace /media/mwaite/BACKUP-DRIVE/Jenkins/workspace/JENKINS-22547-checkout-timeout-ignored Cloning the remote Git repository Cloning repository mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git > git init /media/mwaite/BACKUP-DRIVE/Jenkins/workspace/JENKINS-22547-checkout-timeout-ignored # timeout=10 Fetching upstream changes from mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git > git --version # timeout=10 using GIT_SSH to set credentials mwaite@mark-pc1 > git -c core.askpass=true fetch --tags --progress mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git +refs/heads/*:refs/remotes/origin/* # timeout=40 > git config remote.origin.url mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git # timeout=10 Pruning obsolete local branches Fetching upstream changes from mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git using GIT_SSH to set credentials mwaite@mark-pc1 > git -c core.askpass=true fetch --tags --progress mwaite@wheezy64b.markwaite.net:/var/lib/git/mwaite/linux.git +refs/heads/*:refs/remotes/origin/* --prune # timeout=40 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision 076501ff6ba265a473689c112eda9f1f34f620b5 (refs/remotes/origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f 076501ff6ba265a473689c112eda9f1f34f620b5 # timeout=10
More investigation will be needed.
Same problem as markewaite.
I've set Checkout and Clone timeout to 120 minutes.
It fails on the checkout command with timeout=10
I use Jenkins Pipeline with Jenkins 2.11
Refer to pull request 423 for a proposed fix. The pull request still needs further code review and investigaton into the history to understand why I missed this regression during reviews of earlier pull requests.
After some investigation, I've fixed my problem by removing git lfs filters.
Now, I'm manually calling "git lfs pull" in my build after the checkout step.
Indeed, without git lfs this step is very fast and should not take too much time.
But that's during this step that git lfs try to download its files.
It might be the reason why you didn't need to set the timeout on this step during your development.
Is this also fixed in the beta, needed for submodule credentials?
superboum has shown that the fix that I inserted into 2.5.3 was incomplete. He's also detected that the test I wrote to detect the problem is hardly testing the problem at all. Special thanks to superboum!
He's working on a correct and complete fix as part of his pull request to the git client plugin.
The partial fix has not been included in any of the beta releases yet. I'd prefer to have a complete fix and then release in both the main line and the beta releases.
markewaite : I am using git plugin 3.0.1 & Git client plugin 2.2.0 and could reproduce the timeout issue with sparse checkout. I tried setting up the timeout in Additional Behaviors >> Timeout=120, Still its not working find below the Jenkins output. Let me know If I am missing anything here.
Building remotely on JenkinsSlave in workspace /home/balaji/workspace/PublishOverSSH > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://bitbucket.org/test.git # timeout=10 Fetching upstream changes from https://bitbucket.org/test.git > git --version # timeout=10 using GIT_ASKPASS to set credentials > git fetch --tags --progress https://bitbucket.org/test.git +refs/heads/*:refs/remotes/origin/* # timeout=120 > git rev-parse refs/remotes/origin/testing^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/testing^{commit} # timeout=10 Checking out Revision 5d2b54126a0771fc4c205b1c0d4deafd3185d07b (refs/remotes/origin/testing) > git config core.sparsecheckout # timeout=10 > git checkout -f 5d2b54126a0771fc4c205b1c0d4deafd3185d07b ERROR: Timeout after 10 minutes
Update : The Advanced Checkout Behavior seem to do the trick for me. So closing the ticket.
Thanks
Balaji
I have the same problem - timeout is always "10" and there is no way to change it.
I've added the advanced checkout behavior and that did not change a thing. Here's the pipeline syntax I'm using:
checkout scm: [$class: 'GitSCM', branches: [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions: [[$class: 'CheckoutOption', timeout: 240]], gitTool: 'Default', submoduleCfg: [], userRemoteConfigs: [[ credentialsId: '981cbff3-3a30-4a61-be40-99f441ea0559', url: 'git@server:project.git' ]]]
And it looks the same - the logs shows `timeout=10` and it ends with:
ERROR: Timeout after 10 minutes
ERROR: Error cloning remote repo 'origin'
guss77 can you provide an example or link of what you mean by advanced clone behaviour? I'm still getting the 10 minute timeout using the following:
git branch: '$BRANCH', credentialsId: 'XXXX', url: 'ssh://git@example.com/myRepo.git', extensions: [[$class: 'CheckoutOption', timeout: 100]]
I tried the `scm checkout` syntax you reference in your first comment and I see the 10 minute timeout just the same as you were describing.
Additionally, how did you confirm the "advanced behaviour" was working? Will the log messages begin to display the configured timeout value, or do I just have to test it and hope my network is running slow enough to validate the test?
Yes - adding an "advanced clone behavior" and then setting the timeout there solved my problem.
vadivel the bug was fixed over 2 years ago. and is as described by knighttp01 in the comment above your comment that you're reopening the bug.
It is not enough to say "the same issue occurring". You'll need to provide much more context than "same issue occurring". What have you tried? What is the context where timeouts are not behaving as you expect? What is the log content when the timeout does not behave as you expect? What job type are you using?
Please gather those details and submit a new bug report, rather than reopening this report.
There are many, many users that are successfully using extended timeouts to clone and checkout large repositories. I've presented talks at Jenkins World 2016, Jenkins World 2017,at a 2016 online meetup and at a 2017 online meetup that describe techniques to better manage large repositories. All of those talks depend on adjusting timeout values as needed, and they work.
In addition to those resources, CloudBees support has provided detailed instructions for configuring a reference repository to speed clone operations.
Seems resolved by this way: set timeout by "CloneOption"
checkout scm: [$class: 'GitSCM', branches: [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions: [[$class: 'CloneOption', timeout: 240]], // CheckoutOption -> CloneOption gitTool: 'Default', submoduleCfg: [], userRemoteConfigs: [[ credentialsId: '981cbff3-3a30-4a61-be40-99f441ea0559', url: 'git@server:project.git' ]]]
Hi,
We encountered this timeout as well, but we suspect that the git config core.sparsecheckout with out explicit true argument causes this.
Adding the CloneOption with timeout didn't help in our case.
Is there a way to configure git config core.sparsecheckout true in the checkout step? I've read the documentation but coudn't figure out how to do so...
Thank you,
Yarden
judgedredd it is generally considered bad form to ask an unrelated question in a bug report. It clutters the bug reports and wastes the time of maintainers that are notified when the unrelated comment is added to the bug report. Please don't do that in the future. Use the mailing list or chat questions and answers so that more than a few people are notified and might be able to assist.
The question interested me enough that I added it to my JENKINS-52746 test case. Refer to that example for details that will allow you to use a sparse checkout path definition in a declarative Pipeline checkout statement.
If you are using declarative Pipeline, you may also need to `skipDefaultCheckout(true)` in the options section, otherwise the full repository checkout happens implicitly before the first step.
It appears that the fix to make timeout configurable was incomplete.