Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-15331

Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Microsoft Windows

      Please enhance the hudson.Util.deleteContentsRecursive method to:

      1. delete everything it can
      2. try several times to delete everything
      3. only throw an exception if it can't delete everything (listing everything that it can't delete)

      Reasoning...
      Unlike unix, the Microsoft Windows OS does not allow a file to be deleted if something has that file open. This causes delete operations to fail.
      Furthermore, most installations of Windows have software that monitors the filesystem for activity and then inspects the contents of recently added/removed files (which means that it'll lock them, albeit temporarily), e.g. the Windows Search service & anti-virus software to name but two (but Windows Vista & Windows 7 seem to have additional complications)

      This means that builds which rely on cleaning a workspace before they start will sometimes fail (claiming that they couldn't delete everything because a file was locked), resulting in a build failing with the following output:

      Started by an SCM change
      Building remotely on jenkinsslave27 in workspace C:\hudsonSlave\workspace\MyProject
      Purging workspace...
      hudson.util.IOException2: remote file operation failed: C:\hudsonSlave\workspace\MyProject at hudson.remoting.Channel@6f0564d7:jenkinsslave27
      	at hudson.FilePath.act(FilePath.java:835)
      	at hudson.FilePath.act(FilePath.java:821)
      	at hudson.plugins.accurev.AccurevSCM.checkout(AccurevSCM.java:331)
      	at hudson.model.AbstractProject.checkout(AbstractProject.java:1218)
      	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475)
      	at hudson.model.Run.run(Run.java:1434)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:239)
      Caused by: java.io.IOException: Unable to delete C:\hudsonSlave\workspace\MyProject\...\src\...\foo - files in dir: [C:\hudsonSlave\workspace\MyProject\...\src\...\foo\bar]
      	at hudson.Util.deleteFile(Util.java:236)
      	at hudson.Util.deleteRecursive(Util.java:287)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:28)
      	at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:11)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2161)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:287)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:60)
      	at java.lang.Thread.run(Unknown Source)
      

      What's needed is a retry mechanism. i.e. the equivalent of using Ant's <retry><delete file="foo"/></retry>, but with a (small) delay between attempts (and maybe a call to the garbage collector, just in case the process holding the file open is the build slave process itself).

          [JENKINS-15331] Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

          pjdarton created issue -

          pjdarton added a comment -

          Note: This file locking behavior also causes non-Jenkins issues, e.g. deleting multiple folders using Windows explorer will sometimes leave one (usually empty) folder behind, and even a simple "RD /S /Q MyFolder" will sometimes fail to delete the folder on its first attempt. In these cases, simply retrying the operation will succeed. Personally, I think it's a Windows "feature".

          As a workaround, I've wrapped most of my calls to Ant's <delete> task in <retry>, and this has eliminated this problem from any of my builds that manage to start BUT this doesn't help if Jenkins doesn't get as far as running my builds.
          e.g. I'm using the accurev plugin for my SCM and it cleans the working directory before it grabs the source - I typically get about a 1% failure rate at this stage. Whilst 1% is not a blocking issue, it's not reliable, which is not what one wants from a build system.

          Personally, I've found that excluding the build areas from Search & anti-virus helps reduce the problem, but it is insufficient to stop these failures completely (at least on Windows 7) - something, somewhere, will still lock files, sometimes, but any investigation (after the build has failed failed) shows that no process has the file "open".

          pjdarton added a comment - Note: This file locking behavior also causes non-Jenkins issues, e.g. deleting multiple folders using Windows explorer will sometimes leave one (usually empty) folder behind, and even a simple "RD /S /Q MyFolder" will sometimes fail to delete the folder on its first attempt. In these cases, simply retrying the operation will succeed. Personally, I think it's a Windows "feature". As a workaround, I've wrapped most of my calls to Ant's <delete> task in <retry>, and this has eliminated this problem from any of my builds that manage to start BUT this doesn't help if Jenkins doesn't get as far as running my builds. e.g. I'm using the accurev plugin for my SCM and it cleans the working directory before it grabs the source - I typically get about a 1% failure rate at this stage. Whilst 1% is not a blocking issue, it's not reliable, which is not what one wants from a build system. Personally, I've found that excluding the build areas from Search & anti-virus helps reduce the problem, but it is insufficient to stop these failures completely (at least on Windows 7) - something, somewhere, will still lock files, sometimes, but any investigation (after the build has failed failed) shows that no process has the file "open".

          pjdarton added a comment - - edited

          Features:

          • Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms).
          • Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b").
          • Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete.
          • Added some unit tests for these methods.
          • After posting this back in October 2012, I built a version of Jenkins LTS with this patch applied. I've been using it at work for all our development stuff and I've not had file locking problems since. I'm pretty confident that it fixes the problem.

          Disclaimers:

          • I've not tested this on Linux (or the unit-tests). It should be harmless (behaviorial changes are conditional on being on Windows), but it'd be worth running the unit-tests on Linux just to verify that.

          pjdarton added a comment - - edited Features: Added two new system properties that control behavior: "Util.deletionRetries" (an integer, defaults to 3) and "Util.deletionRetryWait" (an integer, defaults to 500ms). Delete operations that affect directories now try to delete the entire contents of the directory, continuing on to subfolders etc even after encountering files that wouldn't die, before eventually throwing an exception about what wouldn't die. i.e. if a folder has a file "a", "b" and "c", and you can't delete "b", then "a" and "c" would get deleted (and you'll still get the exception about "b"). Delete operations now have multiple attempts at deleting things, so if not everything could be deleted first time around, maybe they'll get deleted 2nd/3rd etc time around. An exception is only thrown if all retry attempts are exhausted and there are still files/directories that won't delete. Added some unit tests for these methods. After posting this back in October 2012, I built a version of Jenkins LTS with this patch applied. I've been using it at work for all our development stuff and I've not had file locking problems since. I'm pretty confident that it fixes the problem. Disclaimers: I've not tested this on Linux (or the unit-tests). It should be harmless (behaviorial changes are conditional on being on Windows), but it'd be worth running the unit-tests on Linux just to verify that.

          pjdarton added a comment -

          pjdarton added a comment - JENKINS-15331 should fix JENKINS-10905 .
          pjdarton made changes -
          Link New: This issue is related to JENKINS-10905 [ JENKINS-10905 ]
          pjdarton made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          pjdarton added a comment -

          Uploaded git patch file; this was produced using the git command-line and isn't claiming to change the entire file. This will probably be a lot easier to merge.

          This is my "New-and-improved" solution.
          In addition to retrying the deletes, this also calls System.gc() if it's on Windows (a tactic that's also used in Apache Ant's Delete task to workaround the same problem).

          pjdarton added a comment - Uploaded git patch file; this was produced using the git command-line and isn't claiming to change the entire file. This will probably be a lot easier to merge. This is my "New-and-improved" solution. In addition to retrying the deletes, this also calls System.gc() if it's on Windows (a tactic that's also used in Apache Ant's Delete task to workaround the same problem).
          pjdarton made changes -
          Attachment New: 0001-JENKINS-15331.patch [ 22814 ]

          pjdarton added a comment -

          Have re-done my GitHub pull request to reflect the new changes (and to fix the CRLF issue with the previous pull request).
          New pull request is https://github.com/jenkinsci/jenkins/pull/615

          pjdarton added a comment - Have re-done my GitHub pull request to reflect the new changes (and to fix the CRLF issue with the previous pull request). New pull request is https://github.com/jenkinsci/jenkins/pull/615
          pjdarton made changes -
          Link New: This issue is related to JENKINS-3053 [ JENKINS-3053 ]

            Unassigned Unassigned
            pjdarton pjdarton
            Votes:
            28 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated:
              Resolved: