Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59970

Launching two nodes with similar workspace deletes in use workspaces

      We run Jenkins with configuration as code in two environments:

      • Dev - For testing changes to our configuration and jobs
      • Prod - Used by all developers

      Some jobs are multibranch jobs, and most are normal pipeline jobs. 

      We recently started seeing the following message pop up in the multibranch job:

      ERROR: missing workspace /localdata/jenkins_workspace/workspace/<job_name> on <node_name>
      

      And we have now tracked this down to being caused by a job running in prod while at the same time someone launching Jenkins locally in their Dev environment. 

      After placing an immutable file in the workspace we got the following output in the Jenkins boot: 

      WARNING	j.b.WorkspaceLocatorImpl$Collector#onOnline: could not delete workspace /localdata/jenkins_workspace/workspace/<job_name>
      

      So apparently there does not seem to be any sort of check to see if any of the workspaces it decides to delete/clean are actually in use by anyone else. 

      Given that we are using two separate Jenkins nodes, and they both connect to the same workspace I admit we might be out of scope here, but I wanted to ask if this is something that can be avoided? does the {{workspace/workspaces.txt }}file have any purpose here?

      Having these two almost identical dev/prod environments allows us a good workflow of testing changes before pushing to production.

          [JENKINS-59970] Launching two nodes with similar workspace deletes in use workspaces

          I think this is "working as designed", because the workspaces.txt is the list of workspaces managed by the Branch API plugin in that workspace root, as well as the Items which own them, i.e. its purpose is associating ownership of that directory with a Jenkins Item. So having two different Jenkins instances, with different Item lists, trying to share the same workspace root on a node, will definitely hit this issue.

          It would be nice if this expectation was clearer. I suspect it's been a core expectation in Jenkins the whole time, but if you weren't using Branch API (which does cleanups on builder-nodes when they connect), WorkspaceCleanupThread, or various other workspace-cleanups steps, it wouldn't have been an issue because nothing was trying to delete the workspaces, and Jenkins would refuse to use a workspace directory it didn't recognise.

          It'd be nicer if the workspaces.txt handling could recognise when it was being processed by a separate Jenkins instance, and fail-safe (i.e. reject the node or the workspace root or something noisy), rather than accepting the workspaces.txt contents, and deleting all the listed directories because that instance doesn't have any of those items present.

          If the workspaces.txt filename was instance-specific somehow, then a workspace root could be safely shared by multiple instances, because Branch API (AFAIK) won't delete/overwrite a directory that its workspaces.txt doesn't say it owns, i.e. even if you had the same job name on two instances that would calculate to the same workspace name, the first one to create the directory would record it in its workspaces-instanceA.txt (and hence own it), and the other would see that the name it picked is already in use, not it it's workspaces-instanceB.txt, and pick a different name using a suffix, and record that and own it.

          In the end, it's probably better for other reasons to just not try and share the same workspace root between two Jenkins instances, but if you need to share nodes, have separate workspace roots for each instance using the node.

          Paul "TBBle" Hampson added a comment - I think this is "working as designed", because the workspaces.txt is the list of workspaces managed by the Branch API plugin in that workspace root, as well as the Items which own them, i.e. its purpose is associating ownership of that directory with a Jenkins Item. So having two different Jenkins instances, with different Item lists, trying to share the same workspace root on a node, will definitely hit this issue. It would be nice if this expectation was clearer. I suspect it's been a core expectation in Jenkins the whole time, but if you weren't using Branch API (which does cleanups on builder-nodes when they connect), WorkspaceCleanupThread, or various other workspace-cleanups steps, it wouldn't have been an issue because nothing was trying to delete the workspaces, and Jenkins would refuse to use a workspace directory it didn't recognise. It'd be nicer if the workspaces.txt handling could recognise when it was being processed by a separate Jenkins instance, and fail-safe (i.e. reject the node or the workspace root or something noisy), rather than accepting the workspaces.txt contents, and deleting all the listed directories because that instance doesn't have any of those items present. If the workspaces.txt filename was instance-specific somehow, then a workspace root could be safely shared by multiple instances, because Branch API (AFAIK) won't delete/overwrite a directory that its workspaces.txt doesn't say it owns, i.e. even if you had the same job name on two instances that would calculate to the same workspace name, the first one to create the directory would record it in its workspaces-instanceA.txt (and hence own it), and the other would see that the name it picked is already in use, not it it's workspaces-instanceB.txt, and pick a different name using a suffix, and record that and own it. In the end, it's probably better for other reasons to just not try and share the same workspace root between two Jenkins instances, but if you need to share nodes, have separate workspace roots for each instance using the node.

            Unassigned Unassigned
            martinn_graphcore Martin Nordsletten
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: