Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27329

WorkspaceCleanupThread may delete workspaces of running jobs

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • Linux host, Linux, OSX, Windows, slaves. Jenkins version 1.602.

      The problem is as described in JENKINS-4501. As requested in JENKINS-4501, I am creating a new issue as this problem still exists in 1.602.

      In short, Jenkins silently and erroneously deletes workspaces on slaves for matrix projects that are not old.

      Over the course of the time I've worked with Jenkins this behavior has created literally days of work and waiting on very long running builds that rely on cached workspaces to be manageable. It's cost me more hours again today after restoring jenkins to a new server after a hardware failure. This setting was reset since it exists outside normal recommended backup files and I didn't think to add it when I "fixed" this last time.

      Would it not be easier to have hudson.model.WorkspaceCleanupThread.disabled default to true? Having the default behavior be "destroy my data" seems bad, especially with how cheap disk is now. I'm sure when this option was implemented it made a lot of sense, but when I can get a 1TB for $50, it just seems wrong-headed. Let the fallow workspaces lie. I can clean them up if I need to.

      If that's not an acceptable solution, could it not be moved to a config location in the Jenkins home? That way we can be relatively sure that the setting will be propagated in backups and not bite someone who thought they solved this problem and had forgotten about it?

          [JENKINS-27329] WorkspaceCleanupThread may delete workspaces of running jobs

          David Aldrich added a comment -

          Thanks for your help Matthew.

          David Aldrich added a comment - Thanks for your help Matthew.

          Tony Wallace added a comment -

          Thanks to all who wrote on this bug in 2017. I think the recent activity helped me find this bug when I searched for something to explain what was happening. This workaround does seem to work and I'm very grateful.  

          Respectfully, I only wish I'd found it when I searched for the same, last year. 

           

          Tony Wallace added a comment - Thanks to all who wrote on this bug in 2017. I think the recent activity helped me find this bug when I searched for something to explain what was happening. This workaround does seem to work and I'm very grateful.   Respectfully, I only wish I'd found it when I searched for the same, last year.   

          Reinhold Füreder added a comment - - edited

          Also experienced for scripted pipeline (on master-only Jenkins installation): I think this issue should really just be fixed (instead of being addressed by implementing one of the various possible and more or less nice workarounds), especially because I hope it should not be too difficult for a Jenkins (core) developer? => I dare to put my money on jglick

          According to https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/WorkspaceCleanupThread.java (see #shouldBeDeleted() method) there is "only special" support for AbstractProject (and thus FreeStyleProject) – but even that is IMHO not 100% safe (only the workspace on the node of the last build is kept, i.e. in case of concurrently running builds it may delete the non-last still running one too)...

          My naive search in Jenkins JavaDoc only showed a very easy (but unfortunately also non-perfect) possibility, based on
          http://javadoc.jenkins-ci.org/hudson/model/Job.html#isBuilding--

          if (item instanceof Job<?,?>) {
            Job<?,?> j = (Job<?,?>) item;
            if (j.isBuilding()) {
              return false;
            }
          }
          

          Here the problem might be that old workspaces on other nodes might never be deleted, I think. (Maybe that might be nonetheless still better than the current behaviour.)

          => Actually only in case when Job#isBuilding() returns true, then all the (possible concurrent) running builds need to be checked and only their (active) workspace should be skipped from deletion? => Therefore still hoping and praying for Jesse...

          (Very) naive PR: https://github.com/jenkinsci/jenkins/pull/3444

          Reinhold Füreder added a comment - - edited Also experienced for scripted pipeline (on master-only Jenkins installation): I think this issue should really just be fixed (instead of being addressed by implementing one of the various possible and more or less nice workarounds), especially because I hope it should not be too difficult for a Jenkins (core) developer? => I dare to put my money on jglick According to https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/WorkspaceCleanupThread.java (see #shouldBeDeleted() method) there is "only special" support for AbstractProject (and thus FreeStyleProject ) – but even that is IMHO not 100% safe (only the workspace on the node of the last build is kept, i.e. in case of concurrently running builds it may delete the non-last still running one too)... My naive search in Jenkins JavaDoc only showed a very easy (but unfortunately also non-perfect) possibility, based on http://javadoc.jenkins-ci.org/hudson/model/Job.html#isBuilding-- if (item instanceof Job<?,?>) { Job<?,?> j = (Job<?,?>) item; if (j.isBuilding()) { return false ; } } Here the problem might be that old workspaces on other nodes might never be deleted, I think. (Maybe that might be nonetheless still better than the current behaviour.) => Actually only in case when Job#isBuilding() returns true, then all the (possible concurrent) running builds need to be checked and only their (active) workspace should be skipped from deletion? => Therefore still hoping and praying for Jesse... (Very) naive PR: https://github.com/jenkinsci/jenkins/pull/3444

          Kim Abbott added a comment -

          I, too, have been seeing this happen recently.  It's happening on slave jobs (they are restricted to which slave they run on and this configuration never changes) but also on jobs utilizing Publish over SSH plugin for copying files to/from other machines and executing commands via SSH Publishers.

          We run under tomcat - I'm not sure how to affect a change in this troubling behavior.  The workarounds mentioned there don't look like something I can use.  If anyone has guidance, I'm all ears.

          Kim Abbott added a comment - I, too, have been seeing this happen recently.  It's happening on slave jobs (they are restricted to which slave they run on and this configuration never changes) but also on jobs utilizing Publish over SSH plugin for copying files to/from other machines and executing commands via SSH Publishers. We run under tomcat - I'm not sure how to affect a change in this troubling behavior.  The workarounds mentioned there don't look like something I can use.  If anyone has guidance, I'm all ears.

          rupunzlkim to your problem report, can you please add the version of Jenkins you are running.

          Matthew Webber added a comment - rupunzlkim to your problem report, can you please add the version of Jenkins you are running.

          Adam Hong added a comment -

          also seeing this happen recently, version 2.60.3

          Adam Hong added a comment - also seeing this happen recently, version 2.60.3

          Kim Abbott added a comment -

          Sorry for the delay.  The version that we've noticed this on is 2.7.4

          Kim Abbott added a comment - Sorry for the delay.  The version that we've noticed this on is 2.7.4

          Code changed in jenkins
          User: Reinhold Füreder
          Path:
          core/src/main/java/hudson/model/WorkspaceCleanupThread.java
          http://jenkins-ci.org/commit/jenkins/f258aff7a736a81306ecb7d3c56cacc9b3a09a68
          Log:
          JENKINS-27329 Less aggressive WorkspaceCleanupThread (#3444)

          I dare to claim that the default behaviour of WorkspaceCleanupThread is too aggressive => this little change is by no means perfect (or admittedly even far from perfect), but IMHO a saner or slightly more defensive default behaviour.

          Mind that according to https://github.com/jenkinsci/jenkins/blob/9e64bcdcb4a2cf12d59dfa334e09ffb448d361e9/core/src/main/java/hudson/model/Job.java#L301 this "only" checks whether or not the last build of a job is in progress, while the JavaDoc says "Returns true if a build of this project is in progress." (cf. http://javadoc.jenkins-ci.org/hudson/model/Job.html#isBuilding--)

          • Fix compilation
          • Dummy commit to trigger pipeline

          Previous pipeline execution (https://ci.jenkins.io/blue/organizations/jenkins/Core%2Fjenkins/detail/PR-3444/2/tests) failed with one failing test that at first glance appears to be unrelated with my change(s) and looks like a flaky test?

          • Add fine logging message

          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Reinhold Füreder Path: core/src/main/java/hudson/model/WorkspaceCleanupThread.java http://jenkins-ci.org/commit/jenkins/f258aff7a736a81306ecb7d3c56cacc9b3a09a68 Log: JENKINS-27329 Less aggressive WorkspaceCleanupThread (#3444) JENKINS-27329 Less aggressive WorkspaceCleanupThread I dare to claim that the default behaviour of WorkspaceCleanupThread is too aggressive => this little change is by no means perfect (or admittedly even far from perfect), but IMHO a saner or slightly more defensive default behaviour. Mind that according to https://github.com/jenkinsci/jenkins/blob/9e64bcdcb4a2cf12d59dfa334e09ffb448d361e9/core/src/main/java/hudson/model/Job.java#L301 this "only" checks whether or not the last build of a job is in progress, while the JavaDoc says "Returns true if a build of this project is in progress." (cf. http://javadoc.jenkins-ci.org/hudson/model/Job.html#isBuilding-- ) Fix compilation Dummy commit to trigger pipeline Previous pipeline execution ( https://ci.jenkins.io/blue/organizations/jenkins/Core%2Fjenkins/detail/PR-3444/2/tests ) failed with one failing test that at first glance appears to be unrelated with my change(s) and looks like a flaky test? Add fine logging message * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.

          Oleg Nenashev added a comment - - edited

          Fix has been applied in 2.125. IMHO the fix is not complete for parallel AbstractProject builds, but it is better than nothing. Will create a follow-up ticket

          Oleg Nenashev added a comment - - edited Fix has been applied in 2.125. IMHO the fix is not complete for parallel AbstractProject builds, but it is better than nothing. Will create a follow-up ticket

          Oleg Nenashev added a comment -

          danielbeck this thing is marked as RFE in the changelog, but I think this is a bug. Would you agree if I recategorize it?

          Oleg Nenashev added a comment - danielbeck this thing is marked as RFE in the changelog, but I think this is a bug. Would you agree if I recategorize it?

            Unassigned Unassigned
            qhartman Quentin Hartman
            Votes:
            13 Vote for this issue
            Watchers:
            20 Start watching this issue

              Created:
              Updated:
              Resolved: