Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-15331

Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Microsoft Windows

      Please enhance the hudson.Util.deleteContentsRecursive method to:

      1. delete everything it can
      2. try several times to delete everything
      3. only throw an exception if it can't delete everything (listing everything that it can't delete)

      Reasoning...
      Unlike unix, the Microsoft Windows OS does not allow a file to be deleted if something has that file open. This causes delete operations to fail.
      Furthermore, most installations of Windows have software that monitors the filesystem for activity and then inspects the contents of recently added/removed files (which means that it'll lock them, albeit temporarily), e.g. the Windows Search service & anti-virus software to name but two (but Windows Vista & Windows 7 seem to have additional complications)

      This means that builds which rely on cleaning a workspace before they start will sometimes fail (claiming that they couldn't delete everything because a file was locked), resulting in a build failing with the following output:

      Started by an SCM change
      Building remotely on jenkinsslave27 in workspace C:\hudsonSlave\workspace\MyProject
      Purging workspace...
      hudson.util.IOException2: remote file operation failed: C:\hudsonSlave\workspace\MyProject at hudson.remoting.Channel@6f0564d7:jenkinsslave27
      	at hudson.FilePath.act(FilePath.java:835)
      	at hudson.FilePath.act(FilePath.java:821)
      	at hudson.plugins.accurev.AccurevSCM.checkout(AccurevSCM.java:331)
      	at hudson.model.AbstractProject.checkout(AbstractProject.java:1218)
      	at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:586)
      	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:475)
      	at hudson.model.Run.run(Run.java:1434)
      	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
      	at hudson.model.ResourceController.execute(ResourceController.java:88)
      	at hudson.model.Executor.run(Executor.java:239)
      Caused by: java.io.IOException: Unable to delete C:\hudsonSlave\workspace\MyProject\...\src\...\foo - files in dir: [C:\hudsonSlave\workspace\MyProject\...\src\...\foo\bar]
      	at hudson.Util.deleteFile(Util.java:236)
      	at hudson.Util.deleteRecursive(Util.java:287)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.Util.deleteRecursive(Util.java:278)
      	at hudson.Util.deleteContentsRecursive(Util.java:198)
      	at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:28)
      	at hudson.plugins.accurev.PurgeWorkspaceContents.invoke(PurgeWorkspaceContents.java:11)
      	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2161)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:118)
      	at hudson.remoting.UserRequest.perform(UserRequest.java:48)
      	at hudson.remoting.Request$2.run(Request.java:287)
      	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
      	at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
      	at java.util.concurrent.FutureTask.run(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      	at hudson.remoting.Engine$1$1.run(Engine.java:60)
      	at java.lang.Thread.run(Unknown Source)
      

      What's needed is a retry mechanism. i.e. the equivalent of using Ant's <retry><delete file="foo"/></retry>, but with a (small) delay between attempts (and maybe a call to the garbage collector, just in case the process holding the file open is the build slave process itself).

          [JENKINS-15331] Workaround Windows unpredictable file locking in Util.deleteContentsRecursive

          Jörg Ziegler added a comment -

          are there any plans to backport this to LTS/1.651? The issue persists on Windows Server 2012R2 running with 1.651.2.

          Jörg Ziegler added a comment - are there any plans to backport this to LTS/1.651? The issue persists on Windows Server 2012R2 running with 1.651.2.

          Daniel Beck added a comment -

          This was not considered for backporting as this issue is an Improvement and not a Bug.

          Now it's too late, the 1.651.3 RC is out.

          Daniel Beck added a comment - This was not considered for backporting as this issue is an Improvement and not a Bug. Now it's too late, the 1.651.3 RC is out.

          pjdarton added a comment -

          The only reason this was logged as an "improvement" is because the fault really lies within the Windows OS / JRE and not within Jenkins itself, but all the symptoms (the issues that link to this) are bugs from an end-user's point of view - Jenkins builds "fail at random" on Windows (which is a bug), and this "improvement" is the cure.
          i.e. For anyone trying to do builds on Windows, this is a bugfix (as evidenced by all the issues that link to this).

          So, sure, this is an "improvement" - Jenkins now works reliably on Windows, and that's a huge improvement - but the reason I coded this was to fix a whole load of unreliability (aka "bugs") that are seen on Windows.

          This was flagged as an lts-candidate, so I was rather hoping that it'd be backported to the LTS release.
          As it stands now, either all Windows users have to upgrade to Jenkins 2, or they have to build their own LTS version (as I had to) ... or it gets included in the next LTS - You can probably guess which option I'm in favour of

          pjdarton added a comment - The only reason this was logged as an "improvement" is because the fault really lies within the Windows OS / JRE and not within Jenkins itself, but all the symptoms (the issues that link to this) are bugs from an end-user's point of view - Jenkins builds "fail at random" on Windows (which is a bug), and this "improvement" is the cure. i.e. For anyone trying to do builds on Windows, this is a bugfix (as evidenced by all the issues that link to this). So, sure, this is an "improvement" - Jenkins now works reliably on Windows, and that's a huge improvement - but the reason I coded this was to fix a whole load of unreliability (aka "bugs") that are seen on Windows. This was flagged as an lts-candidate, so I was rather hoping that it'd be backported to the LTS release. As it stands now, either all Windows users have to upgrade to Jenkins 2, or they have to build their own LTS version (as I had to) ... or it gets included in the next LTS - You can probably guess which option I'm in favour of

          Jörg Ziegler added a comment -

          Thanks pjdarton - this bug is pretty much killing our productivity as it requires manually restarting slaves every few hours. I strongly agree that it's more than an improvement.

          Jörg Ziegler added a comment - Thanks pjdarton - this bug is pretty much killing our productivity as it requires manually restarting slaves every few hours. I strongly agree that it's more than an improvement.

          Daniel Beck added a comment -

          pjdarton Not my fault – olivergondza filters for issue type and resolution, and anything that's not a fixed bug doesn't qualify, label or not.

          This could have been corrected before the RC was published, by now it's too late for .3.

          Daniel Beck added a comment - pjdarton Not my fault – olivergondza filters for issue type and resolution, and anything that's not a fixed bug doesn't qualify, label or not. This could have been corrected before the RC was published, by now it's too late for .3.

          Jörg Ziegler added a comment -

          danielbeck thanks for the quick replies. Is there any field that would need updating in this issue so that it will be included in a .4?

          Jörg Ziegler added a comment - danielbeck thanks for the quick replies. Is there any field that would need updating in this issue so that it will be included in a .4?

          Oleg Nenashev added a comment -

          Actually we still can merge it to .3 if olivergondza agrees. But I'm not so happy about it since RC is under testing now.
          Regarding .4, it will unlikely happen according to the current release model. Needs a wide discussion in the developer list.

          BR, Oleg

          Oleg Nenashev added a comment - Actually we still can merge it to .3 if olivergondza agrees. But I'm not so happy about it since RC is under testing now. Regarding .4, it will unlikely happen according to the current release model. Needs a wide discussion in the developer list. BR, Oleg

          Daniel Beck added a comment -

          We don't do .4's, except when we mess up so badly there's no way around it, but this doesn't qualify.

          Daniel Beck added a comment - We don't do .4's, except when we mess up so badly there's no way around it , but this doesn't qualify.

          I decided not to squeeze this into .3 (last in its line) for stability's sake. We need to be extra careful as we do not do much testing on windows, unfortunately.

          Oliver Gondža added a comment - I decided not to squeeze this into .3 (last in its line) for stability's sake. We need to be extra careful as we do not do much testing on windows, unfortunately.

          Consumed by 2.7.X line so need to backport.

          Oliver Gondža added a comment - Consumed by 2.7.X line so need to backport.

            Unassigned Unassigned
            pjdarton pjdarton
            Votes:
            28 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated:
              Resolved: