Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70868

Multiple pipeline jobs sharing same jobname@script directory for Jenkinsfile

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • workflow-cps-plugin
    • None

       

      When a push to Gerrit spawns many builds of a pipeline they all share the same script directory to checkout the Jenkinsfile from git:

      Checking out git ssh://gerrit... into /path/to/job/somelonghash to read path/to/Jenkinsfile
      

      For each build spawned at the same time the somelonghash is the same, so occasionally these jobs step on each others git checkouts

          [JENKINS-70868] Multiple pipeline jobs sharing same jobname@script directory for Jenkinsfile

          Mark Waite added a comment -

          I've not seen this for my multibranch Pipeline builds.  My multibranch Pipeline don't use Gerrit, but they can run many branches at the same time.

          I think that you'll need to provide more detailed instructions so that others can see the failure in a fresh installation of Jenkins.

          Mark Waite added a comment - I've not seen this for my multibranch Pipeline builds.  My multibranch Pipeline don't use Gerrit, but they can run many branches at the same time. I think that you'll need to provide more detailed instructions so that others can see the failure in a fresh installation of Jenkins.

          Jim Searle added a comment -

          Yes, this is different than a multibranch pipeline. It's a single branch and with our Gerrit setup when you push multiple commits to the Gerrit branch it triggers Jenkins to start a job for each commit and each of those jobs shares the same directory to checkout the Jenkinsfile.

          Jim Searle added a comment - Yes, this is different than a multibranch pipeline. It's a single branch and with our Gerrit setup when you push multiple commits to the Gerrit branch it triggers Jenkins to start a job for each commit and each of those jobs shares the same directory to checkout the Jenkinsfile.

          Jim Searle added a comment -

          Reviewing the code I see the SCM checkout directory for the Jenkinsfile is from:

          https://github.com/jenkinsci/workflow-cps-plugin/blob/master/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L199

                  return baseWorkspace.withSuffix(getFilePathSuffix() + "script").child(CHECKOUT_DIR_KEY.mac(scm.getKey()));
          

          The Key is generated from:
          https://github.com/jenkinsci/workflow-cps-plugin/blob/master/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L73

              private static final HMACConfidentialKey CHECKOUT_DIR_KEY = new HMACConfidentialKey(CpsScmFlowDefinition.class, "filePathWithSuffix", 32);
          

          So every job triggered shares the same Jenkinsfile scm checkout area which can cause conflicts for concurrent jobs?

          But when it tries to use that directory it acquire's a lease, which I assume is a lock so only one job can use it at a time? So not sure why it continues.

                  try (WorkspaceList.Lease lease = computer.getWorkspaceList().acquire(dir)) {
          

          My Java programming knowledge is very weak, so I may not be understanding this correctly.

          Jim Searle added a comment - Reviewing the code I see the SCM checkout directory for the Jenkinsfile is from: https://github.com/jenkinsci/workflow-cps-plugin/blob/master/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L199 return baseWorkspace.withSuffix(getFilePathSuffix() + "script" ).child(CHECKOUT_DIR_KEY.mac(scm.getKey())); The Key is generated from: https://github.com/jenkinsci/workflow-cps-plugin/blob/master/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L73 private static final HMACConfidentialKey CHECKOUT_DIR_KEY = new HMACConfidentialKey(CpsScmFlowDefinition.class, "filePathWithSuffix" , 32); So every job triggered shares the same Jenkinsfile scm checkout area which can cause conflicts for concurrent jobs? But when it tries to use that directory it acquire's a lease, which I assume is a lock so only one job can use it at a time? So not sure why it continues. try (WorkspaceList.Lease lease = computer.getWorkspaceList().acquire(dir)) { My Java programming knowledge is very weak, so I may not be understanding this correctly.

          Murali added a comment -

          I was going through the Plugin code and suspect it is because of the race condition when multiple concurrent builds are triggered. I could be offtrack as well.
           
          The below line of code is getting executed and control is going into the else block:
           
          https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L136
           
          and setting the directory path as:
           
          https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L143
           
          Subsequently, this directory is used for checking out the source code here:
           
          https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L155
           
          As Jim mentioned, the plugin code is trying to acquire the directory to checkout the source code, but the method computer.getWorkspaceList().acquire(directory) just acquires the directory to checkout, but will not lock it.
           
          Instead of that, if the below-overloaded method is used, it might/will address the issue that we have seen:
           


           
          public WorkspaceList.Lease acquire(@NonNull FilePath p, boolean quick) throws InterruptedException
          See acquire(FilePath)Parameters:quick - If true, indicates that the acquired workspace will be returned quickly. This makes other calls to allocate(FilePath) to wait for the release of this workspace.Throws:InterruptedException
           


           
          Such concurrent issues are hard to debug, but I strongly feel it is an issue in the plugin code only, unless I am missing something obvious. 

          Murali added a comment - I was going through the Plugin code and suspect it is because of the race condition when multiple concurrent builds are triggered. I could be offtrack as well.   The below line of code is getting executed and control is going into the else block :   https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L136   and setting the directory path as:   https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L143   Subsequently, this directory is used for checking out the source code here:   https://github.com/jenkinsci/workflow-cps-plugin/blob/07ea433c90b4b535a3c9b2677982191b09647da0/plugin/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsScmFlowDefinition.java#L155   As Jim mentioned, the plugin code is trying to acquire the directory to checkout the source code, but the method computer.getWorkspaceList().acquire(directory) just acquires the directory to checkout, but will not lock it .   Instead of that, if the below-overloaded method is used, it might/will address the issue that we have seen:     public  WorkspaceList.Lease  acquire(@NonNull  FilePath  p, boolean quick) throws  InterruptedException See  acquire(FilePath) Parameters: quick  - If true, indicates that the acquired workspace will be returned quickly. This makes other calls to  allocate(FilePath)  to wait for the release of this workspace.Throws: InterruptedException     Such concurrent issues are hard to debug, but I strongly feel it is an issue in the plugin code only, unless I am missing something obvious. 

          Jim Searle added a comment -

          Any feedback from Jenkins experts?

          Jim Searle added a comment - Any feedback from Jenkins experts?

          Hubert added a comment -

          Hello, we are experiencing the issue as well.

          We are using Gerrit too. The pipeline code (.groovy) file is being checked out on master, causing issues with locking workspace, preventing multiple job executions at once.

           

          When examining thread dumps we can see multiple threads that are waiting to acquire workspace:

          Waiting to acquire /var/lib/jenkins/jobs/VERIFY/jobs/VERIFY/jobs/VERIFY_E2E_TESTS/workspace@script/0419e2442d94ef5f69637204e14a72fecf11342317af3b0f5723980143096367 : Executor #-1 for Built-In Node : executing VERIFY/VERIFY/VERIFY_E2E_TESTS #247669
          java.base@11.0.20.1/java.lang.Object.wait(Native Method)
          java.base@11.0.20.1/java.lang.Object.wait(Object.java:328)
          hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261)
          hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:240)
          hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:229)
          org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:155)
          org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:70)
          org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:310)
          hudson.model.ResourceController.execute(ResourceController.java:99)
          hudson.model.Executor.run(Executor.java:432)

           

          and the job stops at:

          Checking out git ssh://gerrit/main into /var/lib/jenkins/jobs/VERIFY/jobs/VERIFY/jobs/VERIFY_E2E_TESTS/workspace@script/0419e2442d94ef5f69637204e14a72fecf11342317af3b0f5723980143096367 to read src/com/testing/e2e/VerifyE2E.groovy 

           

          Hubert added a comment - Hello, we are experiencing the issue as well. We are using Gerrit too. The pipeline code (.groovy) file is being checked out on master, causing issues with locking workspace, preventing multiple job executions at once.   When examining thread dumps we can see multiple threads that are waiting to acquire workspace: Waiting to acquire / var /lib/jenkins/jobs/VERIFY/jobs/VERIFY/jobs/VERIFY_E2E_TESTS/workspace@script/0419e2442d94ef5f69637204e14a72fecf11342317af3b0f5723980143096367 : Executor #-1 for Built-In Node : executing VERIFY/VERIFY/VERIFY_E2E_TESTS #247669 java.base@11.0.20.1/java.lang. Object .wait(Native Method) java.base@11.0.20.1/java.lang. Object .wait( Object .java:328) hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:261) hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:240) hudson.slaves.WorkspaceList.acquire(WorkspaceList.java:229) org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:155) org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(CpsScmFlowDefinition.java:70) org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:310) hudson.model.ResourceController.execute(ResourceController.java:99) hudson.model.Executor.run(Executor.java:432)   and the job stops at: Checking out git ssh: //gerrit/main into / var /lib/jenkins/jobs/VERIFY/jobs/VERIFY/jobs/VERIFY_E2E_TESTS/workspace@script/0419e2442d94ef5f69637204e14a72fecf11342317af3b0f5723980143096367 to read src/com/testing/e2e/VerifyE2E.groovy  

            Unassigned Unassigned
            jimsearle Jim Searle
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: