Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-2111

removing a job (including multibranch/org folder branches/repos) does not remove the workspace

    • 2.0.21

      Removal of a job leaves the workspace intact.

          [JENKINS-2111] removing a job (including multibranch/org folder branches/repos) does not remove the workspace

          Olaf Lenz added a comment -

          This is still true. But is this really a bug?

          Olaf Lenz added a comment - This is still true. But is this really a bug?

          pjdarton added a comment -

          It's certainly messy.
          It means that if you've got various slaves, and folks create/rename/delete jobs, you end up with all sorts of junk data left on your slaves.
          I would have expected that any data that isn't reachable from the Jenkins UI to be removed.

          At present, we've got a resource leak that will ultimately kill the slave.
          (it's a bug, but I wouldn't call it a "Major" bug)

          pjdarton added a comment - It's certainly messy. It means that if you've got various slaves, and folks create/rename/delete jobs, you end up with all sorts of junk data left on your slaves. I would have expected that any data that isn't reachable from the Jenkins UI to be removed. At present, we've got a resource leak that will ultimately kill the slave. (it's a bug, but I wouldn't call it a "Major" bug)

          I agree that this is a bug. When a build gets deleted, so should all the data be deleted.
          When a build gets renamed, the files or directories on the file system (with the previous name) should be renamed as well.
          At present Jenkins will use space on the file system unnecessarily. Over a longer period of time, this will cause an issue, particularly for jobs that used a fair bit of space prior to being deleted or renamed.

          Jason Spotswood added a comment - I agree that this is a bug. When a build gets deleted, so should all the data be deleted. When a build gets renamed, the files or directories on the file system (with the previous name) should be renamed as well. At present Jenkins will use space on the file system unnecessarily. Over a longer period of time, this will cause an issue, particularly for jobs that used a fair bit of space prior to being deleted or renamed.

          I have used Hudson for a long time and now Jenkins, and as far as I remember, the default behaviour always was delete the workspace when deleting jobs. IMHO, it is a big regression.

          Michel Graciano added a comment - I have used Hudson for a long time and now Jenkins, and as far as I remember, the default behaviour always was delete the workspace when deleting jobs. IMHO, it is a big regression.

          William Zhang added a comment -

          William Zhang added a comment - you can use script and https://wiki.jenkins-ci.org/display/JENKINS/NodeLabel+Parameter+Plugin to do the same thing,such as my script https://gist.github.com/jollychang/5260975

          Daniel Beck added a comment -

          I agree that this is a bug. When a build gets deleted, so should all the data be deleted.

          This may not be possible in the case of distributed builds (nodes may be offline). It may not be desirable in the case of custom workspaces shared with other projects or not meant to be deleted at all (it happens).

          I have used Hudson for a long time and now Jenkins, and as far as I remember, the default behaviour always was delete the workspace when deleting jobs.

          Possibly since new installs use a different top level directory for workspaces than for jobs. It used to be jobs/foo/workspace, now it is workspace/foo, like slave FS layout always was. However, for the master node, this is configurable in global config.

          Daniel Beck added a comment - I agree that this is a bug. When a build gets deleted, so should all the data be deleted. This may not be possible in the case of distributed builds (nodes may be offline). It may not be desirable in the case of custom workspaces shared with other projects or not meant to be deleted at all (it happens). I have used Hudson for a long time and now Jenkins, and as far as I remember, the default behaviour always was delete the workspace when deleting jobs. Possibly since new installs use a different top level directory for workspaces than for jobs. It used to be jobs/foo/workspace, now it is workspace/foo, like slave FS layout always was. However, for the master node, this is configurable in global config.

          Hi
          this issue is very annoying when dealing with hundred jobs you manage dynamically. In long term, this causes a lot of space disk issues.
          I agree this not desirable when having a workspace shared with others jobs. In this case it should be the responsibility of the job author to know if the job deletion (along with its workspace) should impact others jobs.
          To avoid such case I suggest to add a check box "Don't delete the workspace if this job is deleted" in job configuration and find a way to mark a directory not eraseable.

          Concerning the slaves I suggest to check once a the node is reconnected, the list of orphan job workspaces and remove them automaticaly.
          Is it conceivable?

          Laurent TOURREAU added a comment - Hi this issue is very annoying when dealing with hundred jobs you manage dynamically. In long term, this causes a lot of space disk issues. I agree this not desirable when having a workspace shared with others jobs. In this case it should be the responsibility of the job author to know if the job deletion (along with its workspace) should impact others jobs. To avoid such case I suggest to add a check box "Don't delete the workspace if this job is deleted" in job configuration and find a way to mark a directory not eraseable. Concerning the slaves I suggest to check once a the node is reconnected, the list of orphan job workspaces and remove them automaticaly. Is it conceivable?

          Nick Volynkin added a comment -

          This happens when you rename a job - its workspace is not renamed and not deleted either. The job page shows the size of the old workspace, but after next build the new workspace will be created and the used space will equal to the sum of two workspaces. When the job is deleted or when you wipe out the workspace, only the new workspace gets deleted.

          Just found that my Jenkins (v 1.635) "stashed" 35GB this way.

          Nick Volynkin added a comment - This happens when you rename a job - its workspace is not renamed and not deleted either. The job page shows the size of the old workspace, but after next build the new workspace will be created and the used space will equal to the sum of two workspaces. When the job is deleted or when you wipe out the workspace, only the new workspace gets deleted. Just found that my Jenkins (v 1.635) "stashed" 35GB this way.

          pjdarton added a comment - - edited

          I think that it would be very nice if, when a job does not have a custom workspace directory, renaming or deleting the job would rename or delete the workspace too (on ALL slaves).
          If a job has a custom workspace directory then things get more complicated, and that's where the "Don't delete the workspace if this job is deleted" check-box would be necessary.
          So, I'd suggest that this "Don't delete the workspace if this job is deleted" flag default to true where there is a custom workspace, and default to false (i.e. "do delete it") where a job just uses the workspace that Jenkins gives it.

          Or one could solve this another way:
          If the main Jenkins server were to iterate over all the jobs, it could build up a Map<Slave,List<expectedWorkspace>> listing, for each slave, what workspaces that slave might be asked about.
          Each slave could then be instructed to remove all the workspaces not in that list.
          This would ensure that each slave only retains the workspaces that the Jenkins server might ask it about - workspaces belonging to deleted/renamed builds would be removed, and workspaces belonging to valid builds that were last built on a different slave (and hence if the Jenkins server needed to access the workspace, it'd ask that slave not this one) would also be removed.

          The Distributed Workspace Clean plugin helps, but only works on valid builds - it doesn't remove data belonging to renamed/deleted builds.

          pjdarton added a comment - - edited I think that it would be very nice if, when a job does not have a custom workspace directory, renaming or deleting the job would rename or delete the workspace too (on ALL slaves). If a job has a custom workspace directory then things get more complicated, and that's where the "Don't delete the workspace if this job is deleted" check-box would be necessary. So, I'd suggest that this "Don't delete the workspace if this job is deleted" flag default to true where there is a custom workspace, and default to false (i.e. "do delete it") where a job just uses the workspace that Jenkins gives it. Or one could solve this another way: If the main Jenkins server were to iterate over all the jobs, it could build up a Map<Slave,List<expectedWorkspace>> listing, for each slave, what workspaces that slave might be asked about. Each slave could then be instructed to remove all the workspaces not in that list. This would ensure that each slave only retains the workspaces that the Jenkins server might ask it about - workspaces belonging to deleted/renamed builds would be removed, and workspaces belonging to valid builds that were last built on a different slave (and hence if the Jenkins server needed to access the workspace, it'd ask that slave not this one) would also be removed. The Distributed Workspace Clean plugin helps, but only works on valid builds - it doesn't remove data belonging to renamed/deleted builds.

          Daniel Beck added a comment - - edited

          Or one could solve this another way:
          If the main Jenkins server were to iterate over all the jobs, it could build up a Map<Slave,List<expectedWorkspace>> listing, for each slave, what workspaces that slave might be asked about.
          Each slave could then be instructed to remove all the workspaces not in that list.
          This would ensure that each slave only retains the workspaces that the Jenkins server might ask it about - workspaces belonging to deleted/renamed builds would be removed, and workspaces belonging to valid builds that were last built on a different slave (and hence if the Jenkins server needed to access the workspace, it'd ask that slave not this one) would also be removed.

          I think it's possible to reference environment variables in the 'Custom Workspace' option, and if those are only defined during a build or occasionally change values, this will result in the deletion of legitimate workspaces.

          Jenkins is a complex system, so any magic here will break things horribly for quite a few users – see the frequent complaints about the existing WorkspceCleanupThread.

          What would likely be feasible: If there are workspaces, ask whether they should be deleted as well when the project gets deleted. If a slave is offline, it's your responsibility to clean up.

          Daniel Beck added a comment - - edited Or one could solve this another way: If the main Jenkins server were to iterate over all the jobs, it could build up a Map<Slave,List<expectedWorkspace>> listing, for each slave, what workspaces that slave might be asked about. Each slave could then be instructed to remove all the workspaces not in that list. This would ensure that each slave only retains the workspaces that the Jenkins server might ask it about - workspaces belonging to deleted/renamed builds would be removed, and workspaces belonging to valid builds that were last built on a different slave (and hence if the Jenkins server needed to access the workspace, it'd ask that slave not this one) would also be removed. I think it's possible to reference environment variables in the 'Custom Workspace' option, and if those are only defined during a build or occasionally change values, this will result in the deletion of legitimate workspaces. Jenkins is a complex system, so any magic here will break things horribly for quite a few users – see the frequent complaints about the existing WorkspceCleanupThread. What would likely be feasible: If there are workspaces, ask whether they should be deleted as well when the project gets deleted. If a slave is offline, it's your responsibility to clean up.

          pjdarton added a comment -

          Re: "feasible"
          Would it also be feasible to have all (online) slaves (attempt to) rename their workspace when a job is renamed?

          I think that having all (online) slaves (attempt to) delete/rename the workspace belonging to a job that is being deleted/renamed would be a good idea.
          Jenkins already has a web-ui page asking "are you sure" when you ask it to delete/rename a job, so this page could also be the place where the question is asked about whether or not Jenkins should attempt to delete/rename the matching workspaces.

          Note: Any code that's iterating over multiple slaves, asking them to delete/rename workspace folders, MUST NOT STOP if one slave declares a failure. We have (far too many) Windows-based slaves, and filesystem operations on Windows aren't reliable as the OS can decide (at any point, albeit briefly) that a file is locked and hence cause a deletion/rename to fail because a "file is in use", so we've leaned the hard way that if you want something to "just work" then you need to cope with filesystem "failures" (because they're often not reporting a fatal problem).
          i.e. the loop going over all slaves should ask them to try, but shouldn't stop asking the 3rd slave just because the 2nd failed - this "tidy up" operation should be on a "best effort" basis, not a "succeed or die trying" basis.

          I would also suggest that any operation that's being done "on all slaves" should run in multiple threads, rather than asking each slave in turn.

          pjdarton added a comment - Re: "feasible" Would it also be feasible to have all (online) slaves (attempt to) rename their workspace when a job is renamed? I think that having all (online) slaves (attempt to) delete/rename the workspace belonging to a job that is being deleted/renamed would be a good idea. Jenkins already has a web-ui page asking "are you sure" when you ask it to delete/rename a job, so this page could also be the place where the question is asked about whether or not Jenkins should attempt to delete/rename the matching workspaces. Note: Any code that's iterating over multiple slaves, asking them to delete/rename workspace folders, MUST NOT STOP if one slave declares a failure. We have (far too many) Windows-based slaves, and filesystem operations on Windows aren't reliable as the OS can decide (at any point, albeit briefly) that a file is locked and hence cause a deletion/rename to fail because a "file is in use", so we've leaned the hard way that if you want something to "just work" then you need to cope with filesystem "failures" (because they're often not reporting a fatal problem). i.e. the loop going over all slaves should ask them to try, but shouldn't stop asking the 3rd slave just because the 2nd failed - this "tidy up" operation should be on a "best effort" basis, not a "succeed or die trying" basis. I would also suggest that any operation that's being done "on all slaves" should run in multiple threads, rather than asking each slave in turn.

          Nick Volynkin added a comment -

          From the user's side being asked to delete/rename workspace when deleting/renaming a job looks quite fine. The options on job deletion could be:

          1. delete altogether
          2. leave intact

          And on renaming:

          1. rename to the job's new name
          2. use the old workspace (the "Use custom workspace" option)
          3. delete and create a new workspace with job's name
          4. leave intact and create a new workspace with job's name

          Not every user is able to operate directly in the filesystem of a particular slave (either by ssh or directly). Maybe such users should not be able to leave a workspace intact, because otherwise they would waste storage space, which then will have to be cleaned by somebody else, like a sysadmin. (But there's a workaround with creating a new job with the old name, "catching" the workspace and wiping it.)

          There is a major exclusion with workspaces used by several jobs. They should be detected and only options 2 and 4 should be available. Maybe the whole feature/bug of not-auto-renaming exists to support reused workspaces?

          If a node is offline, it may be useful to allow only "use custom" and "leave intact". The user will be able to explicitly rename, delete or set up the new workspace when the node is online again.

          Nick Volynkin added a comment - From the user's side being asked to delete/rename workspace when deleting/renaming a job looks quite fine. The options on job deletion could be: delete altogether leave intact And on renaming: rename to the job's new name use the old workspace (the "Use custom workspace" option) delete and create a new workspace with job's name leave intact and create a new workspace with job's name Not every user is able to operate directly in the filesystem of a particular slave (either by ssh or directly). Maybe such users should not be able to leave a workspace intact, because otherwise they would waste storage space, which then will have to be cleaned by somebody else, like a sysadmin. (But there's a workaround with creating a new job with the old name, "catching" the workspace and wiping it.) There is a major exclusion with workspaces used by several jobs. They should be detected and only options 2 and 4 should be available. Maybe the whole feature/bug of not-auto-renaming exists to support reused workspaces? If a node is offline, it may be useful to allow only "use custom" and "leave intact". The user will be able to explicitly rename, delete or set up the new workspace when the node is online again.

          For me the workdirs on the Jenkins master (no slaves) are never deleted, no matter whether I delete them via REST or from the GUI. Our Jenkins version is 1.596.2. What am I doing wrong? All workdirs are stored unter /opt/.jenkins/workspace. Is that a non-default directory?

          Alexander Kriegisch added a comment - For me the workdirs on the Jenkins master (no slaves) are never deleted, no matter whether I delete them via REST or from the GUI. Our Jenkins version is 1.596.2. What am I doing wrong? All workdirs are stored unter /opt/.jenkins/workspace . Is that a non-default directory?

          With the multi-branch/pipeline setup that is getting popular now, this really is a MUST in order to handle disk usage well

          Even André Fiskvik added a comment - With the multi-branch/pipeline setup that is getting popular now, this really is a MUST in order to handle disk usage well

          We are also experiencing this. It seems to cause issues at the most inconvenient time. (Of course when would it ever be convenient). Please give this more priority.

          THANKS!

          Nicholas Bencriscutto added a comment - We are also experiencing this. It seems to cause issues at the most inconvenient time. (Of course when would it ever be convenient). Please give this more priority. THANKS!

          Adam Miller added a comment -

          +1

          Adam Miller added a comment - +1

          Andrew Bayer added a comment -

          From a jglick comment over on JENKINS-34177:

          I think this is actually a more general core issue: Job.delete (or some associated ItemListener.onDeleted should proactively delete any associated workspaces it can find on any connected nodes. WorkspaceCleanupThread as currently implemented is not going to find them.

          Andrew Bayer added a comment - From a jglick comment over on JENKINS-34177 : I think this is actually a more general core issue: Job.delete (or some associated ItemListener.onDeleted should proactively delete any associated workspaces it can find on any connected nodes. WorkspaceCleanupThread as currently implemented is not going to find them.

          Alessandro Dionisi added a comment - - edited

          We have written a script and scheduled it with cron:

          #TODO handle folders with spaces
          IFS=$'\n'
          
          #find empty job directories & form the new folder structure for workspace directories
          emptydirs_jobs=$(find . -type d -empty | cut -d '/' -f2-6)
          emptydirs_workspace=$(find . -type d -empty | cut -d '/' -f2,4,6)
          
          #remove the corresponding directory from workspace
          for i in $emptydirs_workspace; do
          rm -rf /var/jenkins_home/workspace/$i
          done
          
          #remove empty directories from jobs
          for j in $emptydirs_jobs; do
          rm -rf /var/jenkins_home/jobs/$j
          done
          
          

          I hope this can help you until a fix is provided.

          Alessandro Dionisi added a comment - - edited We have written a script and scheduled it with cron: #TODO handle folders with spaces IFS=$'\n' #find empty job directories & form the new folder structure for workspace directories emptydirs_jobs=$(find . -type d -empty | cut -d '/' -f2-6) emptydirs_workspace=$(find . -type d -empty | cut -d '/' -f2,4,6) #remove the corresponding directory from workspace for i in $emptydirs_workspace; do rm -rf /var/jenkins_home/workspace/$i done #remove empty directories from jobs for j in $emptydirs_jobs; do rm -rf /var/jenkins_home/jobs/$j done I hope this can help you until a fix is provided.

          Andrew Bayer added a comment -

          jglick Feels to me like Job.performDelete may be more of the right place to do this?

          Andrew Bayer added a comment - jglick Feels to me like Job.performDelete may be more of the right place to do this?

          Andrew Bayer added a comment -

          Also, blergh, finding all the workspaces for a Pipeline job is...hard. node.getWorkspaceFor isn't useful here. I think we'd need to look for all FlowNode on a WorkflowRun for a given WorkflowJob to see if they've got a WorkspaceAction and then act on those WorkspaceAction...which is demented. Oy.

          Andrew Bayer added a comment - Also, blergh, finding all the workspaces for a Pipeline job is...hard. node.getWorkspaceFor isn't useful here. I think we'd need to look for all FlowNode on a WorkflowRun for a given WorkflowJob to see if they've got a WorkspaceAction and then act on those WorkspaceAction ...which is demented. Oy.

          FWIW, I wrote this today:

          Closure cleanMultiBranchWorkspaces
          cleanMultiBranchWorkspaces = { item ->
            if (item instanceof com.cloudbees.hudson.plugins.folder.Folder) {
               if (item.name == 'archive') {
                println "Skipping $item"
              } else {
                println "Found folder $item, checking its items"
                item.items.each { cleanMultiBranchWorkspaces(it) }
              }
            } else if (item instanceof org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject) {
              println "Found a multi-branch workflow $item"
              
              workspaces = jenkins.model.Jenkins.instance.nodes.collect { it.getWorkspaceFor(item).listDirectories() }.flatten().findAll { it != null }
              
              def activeBranches = item.items.name
              println "Active branches = $activeBranches"
          
              if (workspaces) {
                workspaces.removeAll { workspace ->
                  activeBranches.any { workspace.name.startsWith(it) }
                }
          
                workspaces.each {
                  println "Removing workspace $it.name on ${it.toComputer().name} without active branch"
                }
              }
            }
          }
          
          jenkins.model.Jenkins.instance.items.each { cleanMultiBranchWorkspaces(it) }
          

          Need to switch the startsWith to a regex for more exact matching.

          Created a job with the Groovy plugin executing this as a system script.

          Joshua Spiewak added a comment - FWIW, I wrote this today: Closure cleanMultiBranchWorkspaces cleanMultiBranchWorkspaces = { item -> if (item instanceof com.cloudbees.hudson.plugins.folder.Folder) { if (item.name == 'archive' ) { println "Skipping $item" } else { println "Found folder $item, checking its items" item.items.each { cleanMultiBranchWorkspaces(it) } } } else if (item instanceof org.jenkinsci.plugins.workflow.multibranch.WorkflowMultiBranchProject) { println "Found a multi-branch workflow $item" workspaces = jenkins.model.Jenkins.instance.nodes.collect { it.getWorkspaceFor(item).listDirectories() }.flatten().findAll { it != null } def activeBranches = item.items.name println "Active branches = $activeBranches" if (workspaces) { workspaces.removeAll { workspace -> activeBranches.any { workspace.name.startsWith(it) } } workspaces.each { println "Removing workspace $it.name on ${it.toComputer().name} without active branch" } } } } jenkins.model.Jenkins.instance.items.each { cleanMultiBranchWorkspaces(it) } Need to switch the startsWith to a regex for more exact matching. Created a job with the Groovy plugin executing this as a system script.

          Jesse Glick added a comment -

          I will try to solve this for branch projects as part of JENKINS-34564, since these are especially likely to be created and discarded rapidly.

          I think a general implementation need not really be that difficult. Each node (master, agent) should just pay attention to when a workspace is used. (If in core, via WorkspaceList; otherwise, perhaps via WorkspaceListener.) Then record a workspaces.xml, a sibling of workspace/, with a list of records: relative workspace path, Item.fullName, timestamp. Periodically, or when an agent comes online, etc., iterate the list and check for jobs which no longer exist under that name (covers JENKINS-22240), or workspaces which have not been used in a long time. If in a plugin (JENKINS-26471) you could get fancy and modify behavior according to free disk space, etc.

          Jesse Glick added a comment - I will try to solve this for branch projects as part of JENKINS-34564 , since these are especially likely to be created and discarded rapidly. I think a general implementation need not really be that difficult. Each node (master, agent) should just pay attention to when a workspace is used. (If in core, via WorkspaceList ; otherwise, perhaps via WorkspaceListener .) Then record a workspaces.xml , a sibling of workspace/ , with a list of records: relative workspace path, Item.fullName , timestamp. Periodically, or when an agent comes online, etc., iterate the list and check for jobs which no longer exist under that name (covers JENKINS-22240 ), or workspaces which have not been used in a long time. If in a plugin ( JENKINS-26471 ) you could get fancy and modify behavior according to free disk space, etc.

          Yury Zaytsev added a comment -

          So was it addressed in JENKINS-34564 ? I've had a look at the commit mentioned in Jira, but I couldn't easily see any code pertaining to the deletion of the workspaces.

          Yury Zaytsev added a comment - So was it addressed in JENKINS-34564 ? I've had a look at the commit mentioned in Jira, but I couldn't easily see any code pertaining to the deletion of the workspaces.

          Jesse Glick added a comment -

          I couldn't easily see any code pertaining to the deletion of the workspaces.

          here it is

          Jesse Glick added a comment - I couldn't easily see any code pertaining to the deletion of the workspaces. here it is

          Michael Neale added a comment -

          jglick does that imply this can be closed as a newer branch-api-plugin has a fix for this?

          Michael Neale added a comment - jglick does that imply this can be closed as a newer branch-api-plugin has a fix for this?

          I'm not sure if my issue is related.

          We use mulitbranch to make a php site. Part of the script creates a database based on the branch name. It would be nice to have an onDelete hook we can include code to cleanup the site when we remove the branch. Something in the groovy script would be nice.

          onDelete {
          // clean up DB and composer files.
          }

          I can make a new ticket if this is not the right thread for this.

          Michael Porter added a comment - I'm not sure if my issue is related. We use mulitbranch to make a php site. Part of the script creates a database based on the branch name. It would be nice to have an onDelete hook we can include code to cleanup the site when we remove the branch. Something in the groovy script would be nice. onDelete { // clean up DB and composer files. } I can make a new ticket if this is not the right thread for this.

          Scott Epstein added a comment - - edited

          jglick
          Jesse, I'm new to Jenkins and this forum. I apologize for any newbie errors in advance.

          Am I seeing the problem that you resolved?

          I have directories for Multi-configuration projects sticking around after the Discard Old Builds / Max # of builds to keep (15) has been exceeded. Jobs at the bottom of the Build History (after the 15th job) disappear as new jobs complete. The directories that are kept that I'd expect to be deleted are named as follows.

          /export/build/<slave node>/sub-build/<Jenkins project>/<CONFIGURATION>/build/999033
          /export/build/<slave node>/sub-build/<Jenkins project>/<CONFIGURATION>/build/999035
          where CONFIGURATION=${label}${target}${platform}_${type}

          The 33 and 35 in the 999033 and 999035 at the end of the path match the Build History build numbers.

          Do the above directories correspond to workspaces?

          My Discard Old Build settings are:
          Strategy Log Rotation (note: this is the only option given)
          Days to keep builds 7
          Max # of builds to keep 15

          I turned on some logging. Are the negative ones below a problem? Is there a way for me to determine if ANYTHING is being deleted?

          Feb 02, 2017 4:42:37 PM FINE hudson.tasks.LogRotator
          Running the log rotation for hudson.matrix.MatrixConfiguration@6ec64ed8[<Jenkins project>/label=e,platform=a,target=s,type=d] with numToKeep=-1 daysToKeep=-1 artifactNumToKeep=-1 artifactDaysToKeep=-1
          Feb 02, 2017 4:44:30 PM FINE hudson.tasks.LogRotator
          Running the log rotation for hudson.matrix.MatrixConfiguration@7b748f18[<Jenkins project>/label=e,platform=a,target=c,type=d] with numToKeep=-1 daysToKeep=-1 artifactNumToKeep=-1 artifactDaysToKeep=-1

          I am running 1.609. I don't see your fix for this discarder issue listed in the change log: https://jenkins.io/changelog/

          Thank you for your help.

          Scott Epstein added a comment - - edited jglick Jesse, I'm new to Jenkins and this forum. I apologize for any newbie errors in advance. Am I seeing the problem that you resolved? I have directories for Multi-configuration projects sticking around after the Discard Old Builds / Max # of builds to keep (15) has been exceeded. Jobs at the bottom of the Build History (after the 15th job) disappear as new jobs complete. The directories that are kept that I'd expect to be deleted are named as follows. /export/build/<slave node>/ sub-build /<Jenkins project>/<CONFIGURATION>/ build /999033 /export/build/<slave node>/ sub-build /<Jenkins project>/<CONFIGURATION>/ build /999035 where CONFIGURATION=${label} ${target} ${platform}_${type} The 33 and 35 in the 999033 and 999035 at the end of the path match the Build History build numbers. Do the above directories correspond to workspaces? My Discard Old Build settings are: Strategy Log Rotation (note: this is the only option given) Days to keep builds 7 Max # of builds to keep 15 I turned on some logging. Are the negative ones below a problem? Is there a way for me to determine if ANYTHING is being deleted? Feb 02, 2017 4:42:37 PM FINE hudson.tasks.LogRotator Running the log rotation for hudson.matrix.MatrixConfiguration@6ec64ed8 [<Jenkins project>/label=e,platform=a,target=s,type=d] with numToKeep=-1 daysToKeep=-1 artifactNumToKeep=-1 artifactDaysToKeep=-1 Feb 02, 2017 4:44:30 PM FINE hudson.tasks.LogRotator Running the log rotation for hudson.matrix.MatrixConfiguration@7b748f18 [<Jenkins project>/label=e,platform=a,target=c,type=d] with numToKeep=-1 daysToKeep=-1 artifactNumToKeep=-1 artifactDaysToKeep=-1 I am running 1.609. I don't see your fix for this discarder issue listed in the change log: https://jenkins.io/changelog/ Thank you for your help.

          Jesse Glick added a comment -

          michaelneale

          does that imply this can be closed as a newer branch-api-plugin has a fix for this?

          No, it has a fix for the limited case that the Job is in fact a branch project, and the agent is online at the time.

          michaelpporter see JENKINS-40606.

          sepstein no those are build directories, not workspaces, so unrelated to this ticket.

          Jesse Glick added a comment - michaelneale does that imply this can be closed as a newer branch-api-plugin has a fix for this? No, it has a fix for the limited case that the Job is in fact a branch project, and the agent is online at the time. michaelpporter see JENKINS-40606 . sepstein no those are build directories, not workspaces, so unrelated to this ticket.

          Scott Epstein added a comment -

          Thank you for your response Jesse.

          Scott Epstein added a comment - Thank you for your response Jesse.

          Joe Harte added a comment -

          Any update on this one? Is a fix in progress? The fact that PR branch job workspaces are not deleted when the PR job is deleted is major drawback. Disk space on my slaves is being eaten up rapidly.

          Joe Harte added a comment - Any update on this one? Is a fix in progress? The fact that PR branch job workspaces are not deleted when the PR job is deleted is major drawback. Disk space on my slaves is being eaten up rapidly.

          boon FYI, I ended up rolling my own. I have a webhook on branch delete which calls a PHP script and passed the repo and branch.

           

          It would be nice to have a natural solution. 

          Michael Porter added a comment - boon  FYI, I ended up rolling my own. I have a webhook on branch delete which calls a PHP script and passed the repo and branch.   It would be nice to have a natural solution. 

          Lucas Lacroix added a comment -

          I was discussing with others how we could resolve this issues with a maintenance job. We are making use of the multi-branch pipeline jobs and this causes our servers to run out of space on a regular basis. This is what we came up with:

          1. assuming that we can ask Jenkins to generate the name of the workspace folder for each active job, we can create a list of "workspaces that may still be in use"
          2. a job on each worker could enumerate the workspace folders and any not on the list generate by #1 would be deleted

          Jenkins does appear to use a deterministic algorithm for generating the workspace name or stores the workspace name in the Job's data. The question is: is that workspace folder name accessible through APIs such that we can write this cleanup job? I haven't started looking at the Jenkins APIs to determine if the workspace folder name is exposed, but we're going to go down this road to see what we can do while we wait for the real solution (it's been 9 years already and we don't expect this to be fixed soon).

          This did get me thinking: if this was a feature in Jenkins, then the Worker process could easily do this periodically based on some system-wide configuration. No need for the main Jenkins instance to keep a running list of "workspaces to be deleted" or worrying about workers being offline at the time of the Job being deleted. Just let the worker do it's own cleanup IF the system is configured for it.

          Lucas Lacroix added a comment - I was discussing with others how we could resolve this issues with a maintenance job. We are making use of the multi-branch pipeline jobs and this causes our servers to run out of space on a regular basis. This is what we came up with: assuming that we can ask Jenkins to generate the name of the workspace folder for each active job, we can create a list of "workspaces that may still be in use" a job on each worker could enumerate the workspace folders and any not on the list generate by #1 would be deleted Jenkins does appear to use a deterministic algorithm for generating the workspace name or stores the workspace name in the Job's data. The question is: is that workspace folder name accessible through APIs such that we can write this cleanup job? I haven't started looking at the Jenkins APIs to determine if the workspace folder name is exposed, but we're going to go down this road to see what we can do while we wait for the real solution (it's been 9 years already and we don't expect this to be fixed soon). This did get me thinking: if this was a feature in Jenkins, then the Worker process could easily do this periodically based on some system-wide configuration. No need for the main Jenkins instance to keep a running list of "workspaces to be deleted" or worrying about workers being offline at the time of the Job being deleted. Just let the worker do it's own cleanup IF the system is configured for it.

          pjdarton added a comment -

          While Jenkins does have a deterministic algorithm for generating the workspace folder, it doesn't always get to use that algorithm: some jobs specify a custom workspace. Also, jobs can get renamed and this would mean that you'd get the wrong answer from the algorithm.
          However I agree that making the slaves responsible for driving this clean-up process is the right way to do this: it's only ever a problem for slaves that still exist, and it'll scale a lot better if it's driven by the slaves instead of by the master.

          What I would suggest is this:

          1. The Jenkins server be enhanced so that a slave can ask the master what workspace folders that the master expects it to possess.
            • This would basically be a summary of all the "workspace" links that point at the requesting slave.
            • If computing this is particularly computationally intensive at scale then Jenkins could cache a map of slave to List<workspacePath> in memory.
          2. The Jenkins slave be enhanced so that it keeps a list of folders that it has used as the workspace for builds that it has run, and also any parent folders that it needed to create in order to do this.
            • This list would be added to whenever a job starts using a workspace folder.
            • It would have to be persisted to disk.
            • Folders would be removed from the list once they'd been deleted.
            • Access to this list would have to be thread-safe, and "watchable" by the clean-up process.
          3. The Jenkins slave be enhanced so that, periodically, it compares this list of known workspace folders with the list of folders that the Jenkins server knows about, and then goes about deleting any folders that are no longer on the list known workspaces.
            • It would have to only remove a folder from the list once it'd been successfully deleted. It's quite possible for deletion to fail (especially on Windows) if processes have things "locked open", so we need to keep things on the list of things to be deleted until they've been deleted.
            • It would have to only delete folders that it had created itself.
            • It would have to avoid recursively deleting content that belonged to another build, e.g. if one old build had a workspace /foo/... which is no longer known to the server but a new (current) build is known to use /foo/bar/... then the /foo/bar/ folder (and contents) should not be removed when removing /foo/.
            • It's possible that something else might delete a folder (e.g. a user, a build process, or another cleanup plugin) and, if the folder doesn't exist anymore then it should immediately get removed from the list of stuff to be deleted.

          I'm less certain of exactly when would be a "good time" to trigger the clean-up process. Ideally, we would do the deletion in the background on the slave when the slave isn't otherwise busy, but that'd mean a busy slave would be most likely to run out of space. Maybe we should do it periodically in the background "regardless" but in the foreground (blocking the running of a new job) when space gets tight?
          Also, if we were to implement the deletion as a background job, the implementation would need to be able to halt the deletion the moment a build job started that used the same workspace as we were currently deleting (the method that adds a workspace to the list of workspace would have to block until it was sure no further deletion was in progress).

          I also think there's a fair amount of crossover between the resource cleanup functionality in the Jenkins master and what we require here, e.g. once this has been implemented, the resource cleanup functionality in the Jenkins master could be amended to simply delegate the removal of resources to the slave instead of having to take responsibility for it itself.

          Overall, this doesn't sound like it's going to be trivial to implement in a non-disruptive way (garbage-collection is rarely simple) but at least it's a fairly well defined problem

          pjdarton added a comment - While Jenkins does have a deterministic algorithm for generating the workspace folder, it doesn't always get to use that algorithm: some jobs specify a custom workspace. Also, jobs can get renamed and this would mean that you'd get the wrong answer from the algorithm. However I agree that making the slaves responsible for driving this clean-up process is the right way to do this: it's only ever a problem for slaves that still exist, and it'll scale a lot better if it's driven by the slaves instead of by the master. What I would suggest is this: The Jenkins server be enhanced so that a slave can ask the master what workspace folders that the master expects it to possess. This would basically be a summary of all the "workspace" links that point at the requesting slave. If computing this is particularly computationally intensive at scale then Jenkins could cache a map of slave to List<workspacePath> in memory. The Jenkins slave be enhanced so that it keeps a list of folders that it has used as the workspace for builds that it has run, and also any parent folders that it needed to create in order to do this. This list would be added to whenever a job starts using a workspace folder. It would have to be persisted to disk. Folders would be removed from the list once they'd been deleted. Access to this list would have to be thread-safe, and "watchable" by the clean-up process. The Jenkins slave be enhanced so that, periodically, it compares this list of known workspace folders with the list of folders that the Jenkins server knows about, and then goes about deleting any folders that are no longer on the list known workspaces. It would have to only remove a folder from the list once it'd been successfully deleted. It's quite possible for deletion to fail (especially on Windows) if processes have things "locked open", so we need to keep things on the list of things to be deleted until they've been deleted. It would have to only delete folders that it had created itself. It would have to avoid recursively deleting content that belonged to another build, e.g. if one old build had a workspace /foo/... which is no longer known to the server but a new (current) build is known to use /foo/bar/... then the /foo/bar/ folder (and contents) should not be removed when removing /foo/. It's possible that something else might delete a folder (e.g. a user, a build process, or another cleanup plugin) and, if the folder doesn't exist anymore then it should immediately get removed from the list of stuff to be deleted. I'm less certain of exactly when would be a "good time" to trigger the clean-up process. Ideally, we would do the deletion in the background on the slave when the slave isn't otherwise busy, but that'd mean a busy slave would be most likely to run out of space. Maybe we should do it periodically in the background "regardless" but in the foreground (blocking the running of a new job) when space gets tight? Also, if we were to implement the deletion as a background job, the implementation would need to be able to halt the deletion the moment a build job started that used the same workspace as we were currently deleting (the method that adds a workspace to the list of workspace would have to block until it was sure no further deletion was in progress). I also think there's a fair amount of crossover between the resource cleanup functionality in the Jenkins master and what we require here, e.g. once this has been implemented, the resource cleanup functionality in the Jenkins master could be amended to simply delegate the removal of resources to the slave instead of having to take responsibility for it itself. Overall, this doesn't sound like it's going to be trivial to implement in a non-disruptive way (garbage-collection is rarely simple) but at least it's a fairly well defined problem

          I would like the ability to call a cleanup script. We use Jenkins Mulitbranch Pipelines to build php websites. For me cleanup is removing a database and removing files from a custom workspace. At this time we call a php file on branch delete, which then processes a bash script that is saved on project build.

          Michael Porter added a comment - I would like the ability to call a cleanup script. We use Jenkins Mulitbranch Pipelines to build php websites. For me cleanup is removing a database and removing files from a custom workspace. At this time we call a php file on branch delete, which then processes a bash script that is saved on project build.

          pjdarton added a comment -

          michaelpporter I'd suggest that you take a look at the Post build task plugin and the PostBuildScript plugin.  Either of those might satisfy your requirements.  We have much the same requirements where I work and we use these (and we make our build setup code tolerant to a messy starting environment).

          That said, I'm don't think that the ability to call a cleanup script is really a core requirement of this issue - this issue is all to do with managing the workspace folders created by jobs, not in dealing with other job-specific resources used by jobs.

          pjdarton added a comment - michaelpporter I'd suggest that you take a look at the Post build task plugin and the PostBuildScript plugin .  Either of those might satisfy your requirements.  We have much the same requirements where I work and we use these (and we make our build setup code tolerant to a messy starting environment). That said, I'm don't think that the ability to call a cleanup script is really a core requirement of this issue - this issue is all to do with managing the workspace folders created by jobs, not in dealing with other job-specific resources used by jobs.

          Thank you for the input. The plugins you mentioned will not help as I we do not want to call the cleanup post build, that is easy in pipeline. We want a cleanup hook in Jenkins that fires when it detects a branch is removed. As you mentioned there might be custom workspaces that are not Jenkins aware. I imagine the solution for cleaning those assets could also clean up other items.

          Michael Porter added a comment - Thank you for the input. The plugins you mentioned will not help as I we do not want to call the cleanup post build, that is easy in pipeline. We want a cleanup hook in Jenkins that fires when it detects a branch is removed. As you mentioned there might be custom workspaces that are not Jenkins aware. I imagine the solution for cleaning those assets could also clean up other items.

          Lucas Lacroix added a comment - - edited

          michaelpporter

          Have you considered building up and breaking down the database and other "external" dependencies as part of your build? We do this using Docker in our builds and the overhead of creating, initializing, and the subsequent teardown is negligible.

          I will say that, at least for a multibranch pipeline, having the ability to kick off a job on child job deletion/creation would be an interesting feature, but I agree anything like that is outside the scope of the issue here.

          Lucas Lacroix added a comment - - edited michaelpporter Have you considered building up and breaking down the database and other "external" dependencies as part of your build? We do this using Docker in our builds and the overhead of creating, initializing, and the subsequent teardown is negligible. I will say that, at least for a multibranch pipeline, having the ability to kick off a job on child job deletion/creation would be an interesting feature, but I agree anything like that is outside the scope of the issue here.

          Jon Roberts added a comment -

          As a workaround for this had to write a cron job that scans the get branches, matches them up with the workspace folders, then removes them.  An option that would just do then whenever the job is removed would be fantastic.

          Jon Roberts added a comment - As a workaround for this had to write a cron job that scans the get branches, matches them up with the workspace folders, then removes them.  An option that would just do then whenever the job is removed would be fantastic.

          Jesse Glick added a comment -

          If anyone is interested in trying my fix for this (and other) issues, you can install this experimental build.

          Jesse Glick added a comment - If anyone is interested in trying my fix for this (and other) issues, you can install this experimental build .

          Michael Neale added a comment -

          nice to see this happening!

          Also what is up with the date on this ticket: create in 2008? It foretold Pipeline-as-Code? 

          Michael Neale added a comment - nice to see this happening! Also what is up with the date on this ticket: create in 2008? It foretold Pipeline-as-Code? 

          Jesse Glick added a comment -

          michaelneale the summary was edited after the fact.

          Jesse Glick added a comment - michaelneale the summary was edited after the fact.

          Tyler Smith added a comment - - edited

          I tried your experimental build and it worked for the slave agents. When the, apparently scheduled, clean up occurred, it removed <name> and <name>@tmp folders from all the slave agents. However, I still have a <name>@script folder on my Jenkins master. Is that the responsibility of another component to delete?

          Edit:

          Nevermind. I recently updated the subversion plugin which enabled lightweight checkouts and no longer creates <name>@script folders, so it is a non-issue.

          Tyler Smith added a comment - - edited I tried your experimental build and it worked for the slave agents. When the, apparently scheduled, clean up occurred, it removed <name> and <name>@tmp folders from all the slave agents. However, I still have a <name>@script folder on my Jenkins master. Is that the responsibility of another component to delete? Edit: Nevermind. I recently updated the subversion plugin which enabled lightweight checkouts and no longer creates <name>@script folders, so it is a non-issue.

          Jesse Glick added a comment -

          I still have a <name>@script folder on my Jenkins master. Is that the responsibility of another component to delete?

          Yes, this is created by Pipeline builds under certain SCM configurations (“heavyweight checkouts”), and is not covered by the workspace infrastructure. Probably those should be cleaned up, too, but it would be a separate patch.

          Jesse Glick added a comment - I still have a <name>@script folder on my Jenkins master. Is that the responsibility of another component to delete? Yes, this is created by Pipeline builds under certain SCM configurations (“heavyweight checkouts”), and is not covered by the workspace infrastructure. Probably those should be cleaned up, too, but it would be a separate patch.

          Uma Shankar added a comment -

          jglick Changes to Branch API has got strange behavior, please see my below images:

           

          Jenkins job:

           

          Workspace folder when builds on master Jenkins:

          Looks better:

          Looks same as before (but why one branch works and another fails to old behavior)

           

          when building on Slave: (Looks good, but the path is too long and variable -Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=2 doesn't help now, I was able to short path before, looks like new plugin isn't caring variable)

          Jenkins Version  - 2.138.2

          Branch API 2.0.21-rc632.1afb188ed43f

          Uma Shankar added a comment - jglick Changes to Branch API has got strange behavior, please see my below images:   Jenkins job:   Workspace folder when builds on master Jenkins: Looks better: Looks same as before (but why one branch works and another fails to old behavior)   when building on Slave: (Looks good, but the path is too long and variable -Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=2 doesn't help now, I was able to short path before, looks like new plugin isn't caring variable) Jenkins Version  - 2.138.2 Branch API  2.0.21-rc632.1afb188ed43f

          Jesse Glick added a comment -

          shankar4ever for compatibility, existing workspaces created under the old scheme continue to be used (but their names are now tracked). The new naming policy applies only when defining a workspace for a given job on a given node for the first time.

          Jesse Glick added a comment - shankar4ever for compatibility, existing workspaces created under the old scheme continue to be used (but their names are now tracked). The new naming policy applies only when defining a workspace for a given job on a given node for the first time.

          Uma Shankar added a comment -

          ah, I see. I will clear up everything and try again.

          Uma Shankar added a comment - ah, I see. I will clear up everything and try again.

          Uma Shankar added a comment - - edited

          jglick I cleared up everything from my workspace (master/slave), I still see the same issue with folder name on slave workspace (Still adding %2F. I created a new branch to test, so there were no history about this branch)

           

          Also, Jenkins master workspace has a file called `workspace.txt`, but slave doesn't have it.

           

          Build on master looks good

          Uma Shankar added a comment - - edited jglick I cleared up everything from my workspace (master/slave), I still see the same issue with folder name on slave workspace (Still adding %2F. I created a new branch to test, so there were no history about this branch)   Also, Jenkins master workspace has a file called `workspace.txt`, but slave doesn't have it.   Build on master looks good

          Jesse Glick added a comment -

          And you are using a multibranch project in each case? For now, the feature is limited to branch projects by default; you can run with -Djenkins.branch.WorkspaceLocatorImpl.MODE=ENABLED to apply it to all projects.

          Also check your system log for any warnings.

          Jesse Glick added a comment - And you are using a multibranch project in each case? For now, the feature is limited to branch projects by default; you can run with -Djenkins.branch.WorkspaceLocatorImpl.MODE=ENABLED to apply it to all projects. Also check your system log for any warnings.

          Jesse Glick added a comment -

          rsandell just released. Arguably should have been as 2.1, but I have never been able to make sense of the version numbering schemes used in plugins historically maintained by stephenconnolly (since we certainly are not enforcing semver).

          Jesse Glick added a comment - rsandell just released. Arguably should have been as 2.1, but I have never been able to make sense of the version numbering schemes used in plugins historically maintained by stephenconnolly (since we certainly are not enforcing semver).

          Uma Shankar added a comment -

          All the projects are using Declarative pipeline with multi-branch. I am running jenkins with -Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=2. I will add -Djenkins.branch.WorkspaceLocatorImpl.MODE=ENABLED and see if that helps.

          Uma Shankar added a comment - All the projects are using Declarative pipeline with multi-branch. I am running jenkins with -Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=2. I will add -Djenkins.branch.WorkspaceLocatorImpl.MODE=ENABLED and see if that helps.

          Jesse Glick added a comment -

          PATH_MAX is ignored by the new implementation except for purposes of identifying folders apparently created by the old implementation. In other words, it has no effect on a node without existing workspaces.

          Jesse Glick added a comment - PATH_MAX is ignored by the new implementation except for purposes of identifying folders apparently created by the old implementation. In other words, it has no effect on a node without existing workspaces.

            jglick Jesse Glick
            bll6969 bll
            Votes:
            90 Vote for this issue
            Watchers:
            101 Start watching this issue

              Created:
              Updated:
              Resolved: