-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins v1 1.514 , GIT Plugin 1.4.0 ,
-
Powered by SuggestiMate
Most of our jenkins jobs which are running on the Windows systems are failing due to issues in "deleting the workspace".
ERROR::
7:07:03 Deleting project workspace... Cannot delete workspace: java.io.IOException: Unable to delete D:\Jenkins\workspace\Ifl GnB Server Gerrit Voter\.git\objects\pack\pack-0a42cbec0651000a728565330f096a38e565b10b.pack 07:07:09 ERROR: Cannot delete workspace: java.io.IOException: Unable to delete D:\Jenkins\workspace\Ifl GnB Server Gerrit Voter\.git\objects\pack\pack-0a42cbec0651000a728565330f096a38e565b10b.pack
The workaround is that we have go manually kill the process and re-start the jenkins services & finally re-trigger the job.
But we need permanent solution for this , could you please support us here?
Regards
Aravind BK
- is duplicated by
-
JENKINS-18843 Jenkins Wiping out workspace hudson.util.IOException2: remote file operation failed
-
- Closed
-
-
JENKINS-14717 Wipe out workspace nonfunctional with local subdirecory
-
- Closed
-
- is related to
-
JENKINS-15103 jenkins git does not release pack files, prevents wiping workspace
-
- Closed
-
[JENKINS-19994] Unable to delete windows workspace due to busy git pack file
I think the simplest work around is to not wipe the workspace, but only clean the workspace. The clean should work because it does not depend on removing the pack file that some portion of either jgit or the git plugin has left the file open.
We have the same problem and it is a major pain point for our testing because lot os testruns are getting aborted. So far we haven't really figured out what's going on but as it looks like only Windows 8 and 8.1 are affected. Any Windows version lower than that is fine.
Also the problem only seems to start when one of those machines are getting restarted. It could be because of an update or a system config change like changing a hostname. Once the restart has been finished and you reconnect the machine to the cluster, the next testruns for it will fail.
We are using Jenkins 1.509.x (LTS).
The failure itself for us varies but looks like:
00:00:17.878 Deleting project workspace... Cannot delete workspace: java.nio.file.FileSystemException: c:\jenkins\workspace\ondemand_functional\mozmill-env\python\Lib\lib2to3\tests\data\fixers\myfixes_init_.py: The process cannot access the file because it is being used by another process.
Can you describe the use case where you need to wipe the workspace rather than just clean it?
It seems like a clean would meet most use cases and would avoid this bug, without requiring a bug fix.
I don't want to have to do both of them. Why am I forced to manually clean the workspace after I restarted on of the machines? If something is necessary Jenkins should handle that. Similar as when I disconnect the slave, and reconnect it again.
I can answer for Henrik as we work in the same team. We cannot use clean as in our setup we are using the Clone Workspace SCM plugin, and also copying artifacts from other jobs. Therefore in order to ensure the workspace does not have any previous artifacts we use the Workspace Cleanup plugin to delete the workspace. I suspect this may be a separate bug, considering the different plugins in use, unless there is an underlying cause common to both issues.
Thanks for the clarification. I'm not sure it is a separate bug, but it could be. The bug I've seen is that the Git plugin leaves one or more of the .git/objects/pack/* files open when it is using the jgit implementation. It does not seem to have the same problem when using the git command line implementation.
You may have discovered a case where both the command line and the jgit implementation leave a pack file open. That's not a problem on Linux, and it is a "showstopper" on Windows.
I have seen the same problem again an hour ago and used my time to further investigate it. By using the Process Explorer I have seen that when for the file affected there is a sharing violation shown. That means another process still has an handle to that file open. Checking this file via the Explorer has shown that it has a file size of 0 Byte! The last access (modification) time was even 7 days old. Further research revealed that I could use handle (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx) to show which processes have an handle open to that file. Doing that no other process except javaw.exe is shown. So some class in Jenkins itself seems to keep a handle open.
I think that this can happen to any file within the workspace. I only don't know yet, what triggers the problem. I hope that when any of our jobs fail the next time, the previous job is still in history. That way I could compare times of the last file access and task run at this time.
Oh, and what I forgot. A simple reconnect of Jenkins on the slave was enough to free the handle. After the restart of the client deleting the workspace was working fine.
+Henrik, can you clarify what you mean by "simple reconnect of Jenkins on the slave"?
Our slaves are started with JNLP. When the Jenkins master stops, our JNLP slaves continue running, retrying the connection to the Jenkins server. We like that, but it also means that the slave will not close the files it has open unless we explicitly kill the JNLP process.
How are you starting your Windows slaves?
Are the slave agent processes killed and restarted in your "simple reconnect", or are they still running and they reconnect to the master server?
In our case we are using Java Web Start to connect the individual slaves to the master. So closing the Jenkins window and calling javaws with the node URL again, made it work.
So we are using the same technique here, and your problem description sounds reasonable. But what me still wonders is why those kind of things also happen when we restart a Windows node and reconnecting it to the master. Not sure if that might be a different issue or related here.
If by "restart a Windows node" you mean anything which stops the Java slave agent process (Windows reboot, exit the slave and restart), then you are seeing a different problem, a problem which I have not seen before and which may need a different solution.
The problem I'm seeing can be resolved either by killing the slave agent process and starting it again, or by restarting the computer and starting the slave agent.
Yeah, it's a reboot but not in all cases. So we see this problem a lot after installing updates and rebooting those nodes. But also have the problem that it sporadically starts, as what I have seen on Saturday. But this might be related to a restart, given that this job was run the last time on Nov 2nd on this node. So we might not have observed the problem earlier. How often do you reboot your machines?
To help with the investigation we will add some logging to the delete workspace plugin to hopefully get more information when this problem occurs.
The bug JENKINS-15103 may be a duplicate of this bug, even though that bug was fixed in 1.5.0, since it has reappeared in 2.0.
Our own specific instance of the failure I'm also tracking at https://github.com/mozilla/mozmill-ci/issues/358. If I read the stack attached over there correctly, the failure is not inside of the delete workspace plugin, but in core. So the delete workspace plugin and other tasks like rotate log can't get a handle for their operations because the file is locked. We will further investigate it the next days.
Same problem here, with windows 2008 r2 server, git plugin 2.0 with JGit implementation.
The leaked files plugin (run on Debian Linux) showed at least one leaked file handle from the RevWalk allocated in the ChangelogCommand of the JGit implementation. I've created a series of changes which were able to fix the condition detected by the leaked files plugin, but still did not allow me to delete the workspace on my Windows machine.
I assume the fixes to the issues detected by the leaked files plugin are a necessary part of the final fix, but they are not sufficient to fix the problem on Windows. Unfortunately, I don't know how to run the leaked files plugin on Windows, it fails to execute there.
If you'd like to see my current prototype (not yet ready for a pull request), refer to https://github.com/MarkEWaite/git-client-plugin/tree/JENKINS-19994-cannot-delete-jgit-workspace
Code changed in jenkins
User: Mark Waite
Path:
src/main/java/org/jenkinsci/plugins/gitclient/JGitAPIImpl.java
http://jenkins-ci.org/commit/git-client-plugin/25af38588ceb3b7850b940e587ed832423da155c
Log:
JENKINS-19994 Close files opened by JGit operations
The JGit Repository object opens files and requires that the close()
method on the repository object is called to close the files.
The JGit RevWalk object documentation states that the dispose()
method should be called on the RevWalk object to unlock any resources
it is holding.
The JGit ObjectReader documentation status that its resources should
be released by calling the release() method.
This change is not enough to allow workspace cleanup in all cases, since
there are other cases (outside the JGit implementation) which will cause
files to be kept open in the workspace. Refer to JENKINS-20585 for one
example.
Code changed in jenkins
User: Nicolas De loof
Path:
src/main/java/org/jenkinsci/plugins/gitclient/ChangelogCommand.java
src/main/java/org/jenkinsci/plugins/gitclient/CliGitAPIImpl.java
src/main/java/org/jenkinsci/plugins/gitclient/JGitAPIImpl.java
src/test/java/org/jenkinsci/plugins/gitclient/GitAPITestCase.java
http://jenkins-ci.org/commit/git-client-plugin/8ffb0e7140445deb41c960eff01878b7c9f8e93d
Log:
Merge pull request #61 from MarkEWaite/JENKINS-19994-jgit-close-files
JENKINS-19994 jgit close files
Compare: https://github.com/jenkinsci/git-client-plugin/compare/7fd8d1604ea8...8ffb0e714044
git client plugin 1.6.1 or git client plugin 1.7.0 (which ever is chosen as the next release version number of the git client plugin) should resolve some of the cases where this problem happens.
There are likely still other cases which are not yet fixed (like the case documented in JENKINS-20585).
A new release of the git plugin will also be required before this bug is fixed. There is at least one call in the git plugin which needs to use the ChangelogCommand.abort() method that was added after git client plugin 1.6.0 was released.
The git plugin version will need to be newer than 2.0 before this bug fix will be visible, assuming the pull request for the change is accepted.
Code changed in jenkins
User: Mark Waite
Path:
src/main/java/hudson/plugins/git/GitSCM.java
http://jenkins-ci.org/commit/git-plugin/2faba2a2604090ffdd978dbe61d268bc05effc66
Log:
JENKINS-19994 Close files opened by computeChangeLog, use new abort() method
All calls to a changelog instance must conclude with either a call to
execute() or a call to abort(). Files will be left open if one of
those two methods is not called before the changelog instance goes out
of scope.
Code changed in jenkins
User: Nicolas De loof
Path:
src/main/java/hudson/plugins/git/GitSCM.java
http://jenkins-ci.org/commit/git-plugin/d267bd6b16fc8d2d9c04b27a9ed805082592bab3
Log:
Merge pull request #185 from MarkEWaite/JENKINS-19994-close-changelog-opened-files
JENKINS-19994 Close files opened by computeChangeLog, use abort()
Compare: https://github.com/jenkinsci/git-plugin/compare/5ac9acd02624...d267bd6b16fc
Git plugin pull request is included in Git plugin version 2.0.1.
It still happening on version 1.9.1, Jenkins 1.567, Tomcat 7.0.50, everytime, after a build fail.
The only workaround I kknow for this issue still the same: stop the tomcat service to kill all processes and starts back.
This is the job output:
Started by user Thiago Zanetti Building in workspace C:\.jenkins\jobs\coel-atccontrol\workspace Wiping out workspace first. java.nio.file.FileSystemException: C:\.jenkins\jobs\coel-atccontrol\workspace\.git\objects\pack\pack-25dae05cb1d8170399a3f3893718c81ce22fee02.pack: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102) at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269) at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103) at java.nio.file.Files.delete(Files.java:1077) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at hudson.Util.deleteFile(Util.java:238) at hudson.Util.deleteRecursive(Util.java:301) at hudson.Util.deleteContentsRecursive(Util.java:203) at hudson.Util.deleteRecursive(Util.java:292) at hudson.Util.deleteContentsRecursive(Util.java:203) at hudson.Util.deleteRecursive(Util.java:292) at hudson.Util.deleteContentsRecursive(Util.java:203) at hudson.Util.deleteRecursive(Util.java:292) at hudson.Util.deleteContentsRecursive(Util.java:203) at hudson.FilePath$13.invoke(FilePath.java:1098) at hudson.FilePath$13.invoke(FilePath.java:1095) at hudson.FilePath.act(FilePath.java:920) at hudson.FilePath.act(FilePath.java:893) at hudson.FilePath.deleteContents(FilePath.java:1095) at hudson.plugins.git.extensions.impl.WipeWorkspace.beforeCheckout(WipeWorkspace.java:28) at hudson.plugins.git.GitSCM.checkout(GitSCM.java:877) at hudson.model.AbstractProject.checkout(AbstractProject.java:1252) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:604) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513) at hudson.model.Run.execute(Run.java:1710) at hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:529) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:231) Retrying after 10 seconds Build was aborted Aborted by Thiago Zanetti Skipping sonar analysis due to bad build status ABORTED Finished: ABORTED
thiagozanetti Are you using JGit as your git implementation or are you using command line git?
If command line git, is there a git process still running on the machine?
If a git process is not running, then are there any other processes running which might hold that pack file open (besides java.exe which is running Jenkins)?
markewaite We are using JGit (git-client-plugin) version 1.9.1
As far as I know there's no other service using git besides Jenkins in this machine.
I wasn't clear enough with my question. Are you using the default git provider ("Default" - which uses command line git as its back end implementation), or the "JGit" provider ("jgit" - which uses the Eclipse jgit implementation as its "back end provider")?
You can tell which you're using in the job definition. If there is a pick list to choose your git, then read the value of that pick list. If there is no pick list, then you're using the default and have not configured jgit at all.
Also, can you use Windows task manager to search for any processes named "git.exe". If a git process blocks waiting for authentication, it may hold files open so that they cannot be deleted (on Windows).
markewaite I've attached a screenshot from my job configuration. As you can see I'm using jgit.
I've followed the steps (which is basically make a build fail by aborting) to reproduce the issue and took a look at Windows Task Manager. There is no git process running when the error occurs.
OK, then one work around is to switch from JGit to command line git. Because command line git runs an external process, it is much less likely to hold files busy.
Another work around is to switch from running on Windows to run on Linux, FreeBSD, Solaris or one of the other variants where an open file handle is non-fatal.
I tried to configure the command line git and found it a bit hard to accomplish with the limited time I had in that moment. I'll give another try.
Therefore, I'll create a linux machine to support my CI services but still have to maintain a Windows node for the projects that are Windows only.
Thank you for your time.
Best regards,
Thiago Zanetti
I am frequently getting this error as well. But I am using Vault, not git.
The error I am getting is:
""
Deleting project workspace... Cannot delete workspace: java.io.IOException: Remote call on slave1 failed
ERROR: Cannot delete workspace: java.io.IOException: Remote call on slave1 failed
Finished: FAILURE
""
Running on windows 7...
Is this the same problem, or a completely different problem?
As far as I can tell, that is a completely different problem. You probably want to report a bug against the Vault plugin.
Trying to delete workspace before build starts causes this error
I have a similar issue with just trying to delete the workspace at the beginning of the build. Jenkins v1.593 on Windows Server 2008. The Java.exe is holding a handle on the file.
Just trying to delete the project workspace before build starts
ioannis I have never seen the menu whose screenshot you attached. What plugin provides that menu?
What is the functionality it is providing?
Is there a reason you're using a form of deletion which leaves files in the directory, but removes the .git repository?
Indeed danielbeck , MarkeWaite! This is from the Workspace Cleanup plugin.
But I have noticed a similar issue even trying a post build custom groovy cleanup script. The file that fails to delete is always the xml configuration file used by the Summary Display plugin, so I start suspecting it may have something to do with the particular plugin.
What do you think? Thanks!
You'll need to submit a bug report against that plugin (and any other plugin which you discover has files open at the time you need to delete files).
Alternately, sometimes you can switch away from Windows to a Unix variant so that you aren't blocked from deleting files which are left open due to file handling bugs in plugins.
Thanks for the feedback. Done: https://issues.jenkins-ci.org/browse/JENKINS-26827
hujirong you need to submit a bug report against the git changelog plugin if that is the one which allows you to see the problem. As far as I can tell, the file leaks in the git plugin and the git client plugin are all resolved.
Code changed in jenkins
User: Tomas Bjerre
Path:
CHANGELOG.md
pom.xml
http://jenkins-ci.org/commit/git-changelog-plugin/a4037ad2a9b8d3a23a28acc84f67b5b42eb2a9fc
Log:
Closing RevWalk JENKINS-19994
Code changed in jenkins
User: Tomas Bjerre
Path:
CHANGELOG.md
pom.xml
http://jenkins-ci.org/commit/git-changelog-plugin/4dd0b1c8a82edebb4a129c0c13342485330c0e38
Log:
Closing RevWalk JENKINS-19994 #15
It is not clear based on the bug report if you're using the git command line implementation or if you're using the jgit implementation in your project definition. Can you clarify which you are using?
When I use Git plugin 2.0 and git client plugin 1.4.4 and I restrict the job to only run on windows and I add "Wipe out repository & force clone" as an additional action in the git configuration for that job, the first execution of the job succeeds, then all subsequent executions fail with a report that one of the pack files in the git repository is held open by another process and cannot be deleted.
The steps I used to define that failing job:
0 - Configure jgit as an available git implementation in the global configuration
1 - Create a new Jenkins job (I named mine jgit-windows)
2 - Configure the Jenkins job to use Git for SCM
3 - Enter https://github.com/MarkEWaite/check_git.git as the Git URL
4 - Switch from the git command line to jgit as the git executable
4 - Add "Checkout to specific local branch" to the branch master-local as part of the git configuration
5 - Restrict the job to only run on Windows
6 - Add "Wipe out repository & force clone" as part of the git configuration
7 - Save the job
Once the job has been saved, run it twice. The first run will succeed, the second run will fail with a report that pack file is busy