Hi folks,

      From time to time, I'm experiencing a weird problem with deleted/lost workspace directory. One minute it's here and then it mysteriously "disappears" during/before the next build?

      I'm using Master-Slave strategy, where one project is build on two slaves. During day, project is build on SCM change. During night, there is performed a scheduled night build (with some SVN/GIT cleanups and some time consuming tests).

      As far as I noticed, it happens during night builds when there are performed some SVN/GIT clean-ups. But this probably does not matter, because the workspace directory is "lost" before the SVN/GIT starts cleaning. Here is what I usually see in the log:

      Started by upstream project "projectABC_32_64" build number 284
      originally caused by:
      Started by timer
      Building remotely on LB3D-Wxpp32sp3 in workspace c:\Jenkins\workspace\projectABC_32_64\lb3d_32_64\LB3D-Wxpp32sp3
      Checking out a fresh workspace because C:\Jenkins\workspace\projectABC_32_64\lb3d_32_64\LB3D-Wxpp32sp3\Ranorex doesn't exist
      Cleaning local Directory Ranorex
      Checking out svn://loremipsum... at revision '2013-09-23T02:30:34.438 +0200'

      Then it does SVN checkout, followed by GIT fetch and both GIT and SVN-based project builds. All would be fine, except that after the SVN checkout, there is apparently no directory created, because after an attempt to build the SVN-based project, msbuild fails because of missing project file.

      Here is an excerpt from failed build...

      Path To MSBuild.exe: c:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe
      Executing the command cmd.exe /C "c:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe " /p:Configuration=Release /t:Clean;Build /p:lb3d_32_64=LB3D-Wxpp32sp3 .\Ranorex\xSpector.sln && exit %%ERRORLEVEL%% from c:\Jenkins\workspace\projectABC_32_64\lb3d_32_64\LB3D-Wxpp32sp3
      [LB3D-Wxpp32sp3] $ cmd.exe /C "c:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe " /p:Configuration=Release /t:Clean;Build /p:lb3d_32_64=LB3D-Wxpp32sp3 .\Ranorex\xSpector.sln && exit %%ERRORLEVEL%%
      Microsoft (R) Build Engine Version 4.0.30319.1
      [Microsoft .NET Framework, Version 4.0.30319.1]
      Copyright (C) Microsoft Corporation 2007. All rights reserved.

      MSBUILD : error MSB1009: Project file does not exist.
      Switch: .\Ranorex\xSpector.sln

      Any idea what could be the problem of this random workspace deletion? Anyone else experienced such a problem?

          [JENKINS-19686] Workspace directory randomly deleted

          John Elliot added a comment -

          I have seen similar behavior.

          Every once in a while I get the 'workspace doesn't exist' error and jenkins starts resyncing the repository. This seems to be related to connection problems between a master and slave server. If I restart the slave agent the problem goes away.

          This is using Jenkins 1.480.3 on Mac OSX.

          John Elliot added a comment - I have seen similar behavior. Every once in a while I get the 'workspace doesn't exist' error and jenkins starts resyncing the repository. This seems to be related to connection problems between a master and slave server. If I restart the slave agent the problem goes away. This is using Jenkins 1.480.3 on Mac OSX.

          Daniel Beck added a comment -

          Are you using Cloudbees Folders Plugin on Jenkins 1.551 or earlier?

          Daniel Beck added a comment - Are you using Cloudbees Folders Plugin on Jenkins 1.551 or earlier?

          Andrew Barber added a comment - - edited

          I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was
          Deleting <dir> on <old slave name>
          We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave.
          This appears to be the trouble code:

                       for (Node node : nodes) {
                           FilePath ws = node.getWorkspaceFor(item);
                           if (ws == null) {
                               continue; // offline, fine
                           }
                           boolean check;
                           try {
                               check = shouldBeDeleted(item, ws, node);
          

          The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted()

                       Node lb = p.getLastBuiltOn();
                       LOGGER.log(Level.FINER, "Directory {0} is last built on {1}", new Object[] {dir, lb});
                       if(lb!=null && lb.equals(n)) {
                           // this is the active workspace. keep it.
                           LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}", new Object[] {dir, p});
                           return false;
                       }
          

          But since the for loop code takes action before checking all nodes, this check can be pointless.

          Andrew Barber added a comment - - edited I think there is a bug in the workspace cleanup code. We've been battling with disappearing workspaces and I could not figure out what was happening until I stumbled upon this thread. In our case, I moved a job from one slave to a different slave, but the cleanup code seemed to think it was ok to delete based on the old slave. The message in the log was Deleting <dir> on <old slave name> We haven't been using that old slave for this job for at least a few weeks. To make matters worse, it deleted the workspace WHILE the job was running on the new slave. This appears to be the trouble code: for (Node node : nodes) { FilePath ws = node.getWorkspaceFor(item); if (ws == null ) { continue ; // offline, fine } boolean check; try { check = shouldBeDeleted(item, ws, node); The first node that it comes across that returns shouldBeDeleted true causes the workspace deleted even if another node (later in the list) is the last builder of that job (meaning the job is still active). This tries to get caught in shouldBeDeleted() Node lb = p.getLastBuiltOn(); LOGGER.log(Level.FINER, "Directory {0} is last built on {1}" , new Object [] {dir, lb}); if (lb!= null && lb.equals(n)) { // this is the active workspace. keep it. LOGGER.log(Level.FINE, "Directory {0} is the last workspace for {1}" , new Object [] {dir, p}); return false ; } But since the for loop code takes action before checking all nodes, this check can be pointless.

          Andrew Barber added a comment -

          A follow up on my case. I just realized that what is different for me is that I share a slave root across slaves. This is ok for me, because jobs are only tied to a single node. But by moving the job off a slave to another, it opened up the workspace for reaping (in the context of the old slave) which I don't want. You could argue user error, but then there is no checking in jenkins to ensure slaves have different roots (which implies that it must be allowed to share roots). The cleanup thread could be enhanced to check with all slaves that share a root to make sure none of them own the active workspace.

          Andrew Barber added a comment - A follow up on my case. I just realized that what is different for me is that I share a slave root across slaves. This is ok for me, because jobs are only tied to a single node. But by moving the job off a slave to another, it opened up the workspace for reaping (in the context of the old slave) which I don't want. You could argue user error, but then there is no checking in jenkins to ensure slaves have different roots (which implies that it must be allowed to share roots). The cleanup thread could be enhanced to check with all slaves that share a root to make sure none of them own the active workspace.

          Markus Winter added a comment -

          How should Jenkins check if slaves share the same slave root? It the slaves connect to the same machine that is possible (but why do you have then different slaves at all?), but what if several different machines have the same NFS directory mounted? It will be quite hard to reliably find out if a slave root is shared.

          Markus Winter added a comment - How should Jenkins check if slaves share the same slave root? It the slaves connect to the same machine that is possible (but why do you have then different slaves at all?), but what if several different machines have the same NFS directory mounted? It will be quite hard to reliably find out if a slave root is shared.

          Sean Jones added a comment -

          Currently experiencing this as well. I have many servers connected and workspaces just disappearing.

          I am seeing a lot of ConcurrentModifcationExpceptions as well.

          Sean Jones added a comment - Currently experiencing this as well. I have many servers connected and workspaces just disappearing. I am seeing a lot of ConcurrentModifcationExpceptions as well.

          Kim Abbott added a comment -

          I, too, have been seeing this happen recently.  It's happening on slave jobs (they are restricted to which slave they run on and this configuration never changes) but also on jobs utilizing Publish over SSH plugin for copying files to/from other machines and executing commands via SSH Publishers.

          I believe it's related to the Workspace clean-up process.  I'm now seeing other bugs related to this (https://issues.jenkins-ci.org/browse/JENKINS-27329 for example). But we run under tomcat - I'm not sure how to affect a change in this troubling behavior.  The workarounds mentioned there don't look like something I can use.  If anyone has guidance, I'm all ears.

          Kim Abbott added a comment - I, too, have been seeing this happen recently.  It's happening on slave jobs (they are restricted to which slave they run on and this configuration never changes) but also on jobs utilizing Publish over SSH plugin for copying files to/from other machines and executing commands via SSH Publishers. I believe it's related to the Workspace clean-up process.  I'm now seeing other bugs related to this ( https://issues.jenkins-ci.org/browse/JENKINS-27329  for example). But we run under tomcat - I'm not sure how to affect a change in this troubling behavior.  The workarounds mentioned there don't look like something I can use.  If anyone has guidance, I'm all ears.

          I am also facing similar issue. I've configured my builds and deployments on both Master and Slave.

          Which ever jobs running on Slave are facing similar issue. My SourceCode repository is TFS. Example: when the build is triggered source code is downloaded and build completed and when the package is about to be checked-in the entire workspace is gone, not just for that job or project, but Slave is cleaning up all the workspace on the Slave directory.

          Since i've to run all the jobs on Master, now i'm facing Space issues too. Any immediate help would be much appreciated.

          Dheeraj Gundavaram added a comment - I am also facing similar issue. I've configured my builds and deployments on both Master and Slave. Which ever jobs running on Slave are facing similar issue. My SourceCode repository is TFS. Example:  when the build is triggered source code is downloaded and build completed and when the package is about to be checked-in the entire workspace is gone, not just for that job or project, but Slave is cleaning up all the workspace on the Slave directory. Since i've to run all the jobs on Master, now i'm facing Space issues too. Any immediate help would be much appreciated.

          georges imad added a comment -

          I am facing the same issue. The event viewer details shows the below:

          georges imad added a comment - I am facing the same issue. The event viewer details shows the below:

            kbell Kevin Bell
            odklizec Pavel Kudrys
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: