Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40879

Locked areas are executed multiple times in parallel

      My pipeline consists of several exclusive sections like this, which should never be executed in parallel:

      milestone()
      lock(resource: "unit-tests", inversePrecedence: true) {
          milestone()
          stage('Unit Tests') {
          }
      }
      
      milestone()
      lock(resource: "integration-tests", inversePrecedence: true) {
          milestone()
          stage('Integration Tests') {
          }
      }
      

      This worked perfectly fine with lockable-resources-plugin 1.10.
      However with version 1.11 the locking does not work properly anymore.

      It happens that two builds are running the integration tests in parallel,
      although this should not be possible given the above configuration.

      Whenever this happens I have a look at the http://<jenkins>/lockable-resources overview.
      It shows that the resource is locked by only one build, although two builds are just concurrently executing the integration tests.

      I am afraid it has to do with the recently implemented features:

      Since I consider this a bug and regression, I changed the priority to critical.

          [JENKINS-40879] Locked areas are executed multiple times in parallel

          Is this always happening, or is it only under certain conditions? if so, can you describe such conditions?

          Antonio Muñiz added a comment - Is this always happening, or is it only under certain conditions? if so, can you describe such conditions?

          Daniel Wilmer added a comment - - edited

          First, thank you for your quick response.

          It always happens if multiple jobs of the same pipeline run in parallel.

          I have created a small example pipeline to reproduce the problem.
          The first two stages are short running, the third is longer running.
          If you start enough builds, it will happen that two builds are waiting to enter the third longer running stage.
          Then it may happen that both builds enter the third stage.

          I have tested this on different jenkins installations. The problem is always reproducible.

          milestone()
          lock(resource: "test-stage-1", inversePrecedence: true) {
              milestone()
          
              stage("stage 1") {
                  node {
                      sh "for i in 1 2 3 4 ; do echo 'working...'; sleep 1 ; done"
                  }
              }
          }
          
          milestone()
          lock(resource: "test-stage-2", inversePrecedence: true) {
              milestone()
          
              stage("stage 2") {
                  node {
                      sh "for i in 1 2 3 4 ; do echo 'working...'; sleep 1 ; done"
                  }
              }
          }
          
          milestone()
          lock(resource: "test-stage-3", inversePrecedence: true) {
              milestone()
          
              stage("stage 3") {
                  node {
                      sh "for i in 1 2 3 4 5 6 7 8 9 10; do echo 'working...'; sleep 1 ; done"
                  }
              }
          }
          

          Daniel Wilmer added a comment - - edited First, thank you for your quick response. It always happens if multiple jobs of the same pipeline run in parallel. I have created a small example pipeline to reproduce the problem. The first two stages are short running, the third is longer running. If you start enough builds, it will happen that two builds are waiting to enter the third longer running stage. Then it may happen that both builds enter the third stage. I have tested this on different jenkins installations. The problem is always reproducible. milestone() lock(resource: "test-stage-1" , inversePrecedence: true ) { milestone() stage( "stage 1" ) { node { sh " for i in 1 2 3 4 ; do echo 'working...' ; sleep 1 ; done" } } } milestone() lock(resource: "test-stage-2" , inversePrecedence: true ) { milestone() stage( "stage 2" ) { node { sh " for i in 1 2 3 4 ; do echo 'working...' ; sleep 1 ; done" } } } milestone() lock(resource: "test-stage-3" , inversePrecedence: true ) { milestone() stage( "stage 3" ) { node { sh " for i in 1 2 3 4 5 6 7 8 9 10; do echo 'working...' ; sleep 1 ; done" } } }

          Staffan Forsell added a comment - - edited

          I took dawi's example an reduced it to this:

          lock(resource: "test-stage-1") {
              stage("stage 1") {
                  node {
                      sh 'for i in 1 2 3 4 ; do echo "working...$i"; sleep 3 ; done'
                  }
              }
          }
          lock(resource: "test-stage-2") { 
              stage("stage 2") {
                  node {
                      sh 'for i in 1 2 3 4 ; do echo "working...$i"; sleep 3 ; done'
                  }
              }
          }
          lock(resource: "test-stage-3") {
              stage("stage 3") {
                  node {
                      sh 'for i in 1 2 3 4 5 6 7 8 9 10 ; do echo "working...$i"; sleep 3 ; done'
                  }
              }
          }
          

          The milestones or invertPrecedence were not needed to reproduce. This fail for me pretty fast if I queue ~10 builds using "Build now".
          Using jenkins 2.19.4 and lockable-resources 1.11.

          Staffan Forsell added a comment - - edited I took dawi 's example an reduced it to this: lock(resource: "test-stage-1" ) { stage( "stage 1" ) { node { sh ' for i in 1 2 3 4 ; do echo "working...$i" ; sleep 3 ; done' } } } lock(resource: "test-stage-2" ) { stage( "stage 2" ) { node { sh ' for i in 1 2 3 4 ; do echo "working...$i" ; sleep 3 ; done' } } } lock(resource: "test-stage-3" ) { stage( "stage 3" ) { node { sh ' for i in 1 2 3 4 5 6 7 8 9 10 ; do echo "working...$i" ; sleep 3 ; done' } } } The milestones or invertPrecedence were not needed to reproduce. This fail for me pretty fast if I queue ~10 builds using "Build now". Using jenkins 2.19.4 and lockable-resources 1.11.

          Daniel Wilmer added a comment - - edited

          UPDATE Maybe this information can help:

          private static final class Callback extends BodyExecutionCallback.TailCall {
          ...
          protected void finished(StepContext context) throws Exception {
              LockableResourcesManager.get().unlockNames(this.resourceNames, context.get(Run.class), this.inversePrecedence);
              context.get(TaskListener.class).getLogger().println("Lock released on resource [" + resourceDescription + "]");
              LOGGER.finest("Lock released on [" + resourceDescription + "]");
          }
          ...
          

          I have seen that in the first runs "this.resourceNames" only contains a single stage (e.g. [test-stage-2]).
          But then in further executions it suddenly contains all three stages of this example (e.g. [test-stage-1, test-stage-2, test-stage-3]).
          Next there are two loggers which print that a lock on "resourceDescription" is released, although "this.resourceNames" is what is really released.

          I don't have enough background information to decide wether this is correct or not, but I changed the implementation as follows and it worked (for this usecase):

          protected void finished(StepContext context) throws Exception {
              LockableResourcesManager.get().unlockNames(singletonList(resourceDescription), context.get(Run.class), this.inversePrecedence);
              context.get(TaskListener.class).getLogger().println("Lock released on resource [" + resourceDescription + "]");
              LOGGER.finest("Lock released on [" + resourceDescription + "]");
          }
          

          Maybe this can help finding the right solution.

          Regards, Daniel

          Daniel Wilmer added a comment - - edited UPDATE Maybe this information can help: private static final class Callback extends BodyExecutionCallback.TailCall { ... protected void finished(StepContext context) throws Exception { LockableResourcesManager.get().unlockNames( this .resourceNames, context.get(Run.class), this .inversePrecedence); context.get(TaskListener.class).getLogger().println( "Lock released on resource [" + resourceDescription + "]" ); LOGGER.finest( "Lock released on [" + resourceDescription + "]" ); } ... I have seen that in the first runs " this.resourceNames " only contains a single stage (e.g. [test-stage-2] ). But then in further executions it suddenly contains all three stages of this example (e.g. [test-stage-1, test-stage-2, test-stage-3] ). Next there are two loggers which print that a lock on " resourceDescription " is released, although " this.resourceNames " is what is really released. I don't have enough background information to decide wether this is correct or not, but I changed the implementation as follows and it worked (for this usecase): protected void finished(StepContext context) throws Exception { LockableResourcesManager.get().unlockNames(singletonList(resourceDescription), context.get(Run.class), this .inversePrecedence); context.get(TaskListener.class).getLogger().println( "Lock released on resource [" + resourceDescription + "]" ); LOGGER.finest( "Lock released on [" + resourceDescription + "]" ); } Maybe this can help finding the right solution. Regards, Daniel

          Jason Pickens added a comment -

          I am also experiencing this bug. I am running 3 instances of the same pipeline job in parallel (build1, build2, build3), all in contention for Resource 'A'. build1 acquires 'A', the next 2 wait. When the first job releases Resource A (via a Java exception in the groovy script), both build2 and build3 log that they have acquired the resource and execute the critical section in parallel. Chaos ensues.

          Jason Pickens added a comment - I am also experiencing this bug. I am running 3 instances of the same pipeline job in parallel (build1, build2, build3), all in contention for Resource 'A'. build1 acquires 'A', the next 2 wait. When the first job releases Resource A (via a Java exception in the groovy script), both build2 and build3 log that they have acquired the resource and execute the critical section in parallel. Chaos ensues.

          Phil McArdle added a comment -

          I believe I hit the same issue. I have multiple stages queued to run in parallel, with some of them waiting on the same locks.

          My experience was that when any lock was released, all the locks seemed to be.

          Lock acquired on resource 1 {
              Lock acquired on resource 2 {
              }
              Lock acquired on resource 1 {
              }
          }
          

          In my case, a second lock was being acquired on resource 1 long before the log or the pipeline steps view showed it being released.

          I've downgraded to 1.10 and it runs correctly. I'm not using inversePrecedence, quantity or label.

          Phil McArdle added a comment - I believe I hit the same issue. I have multiple stages queued to run in parallel, with some of them waiting on the same locks. My experience was that when any lock was released, all the locks seemed to be. Lock acquired on resource 1 { Lock acquired on resource 2 { } Lock acquired on resource 1 { } } In my case, a second lock was being acquired on resource 1 long before the log or the pipeline steps view showed it being released. I've downgraded to 1.10 and it runs correctly. I'm not using inversePrecedence, quantity or label.

          Andrew Bayer added a comment -

          PR up at https://github.com/jenkinsci/lockable-resources-plugin/pull/45, including a first commit with a test that seems to consistently reproduces the issue.

          Andrew Bayer added a comment - PR up at https://github.com/jenkinsci/lockable-resources-plugin/pull/45 , including a first commit with a test that seems to consistently reproduces the issue.

          Code changed in jenkins
          User: Andrew Bayer
          Path:
          src/test/java/org/jenkins/plugins/lockableresources/LockStepTest.java
          http://jenkins-ci.org/commit/lockable-resources-plugin/19119d6cbb0af2057550d6886d1239b34d54b169
          Log:
          JENKINS-40879 Test to verify bug.

          This test seems to fail pretty consistently - not 100% sure it'll keep
          failing every time, but I haven't seen it pass yet except with a fix.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andrew Bayer Path: src/test/java/org/jenkins/plugins/lockableresources/LockStepTest.java http://jenkins-ci.org/commit/lockable-resources-plugin/19119d6cbb0af2057550d6886d1239b34d54b169 Log: JENKINS-40879 Test to verify bug. This test seems to fail pretty consistently - not 100% sure it'll keep failing every time, but I haven't seen it pass yet except with a fix.

          Code changed in jenkins
          User: Andrew Bayer
          Path:
          src/main/java/org/jenkins/plugins/lockableresources/LockableResourcesManager.java
          http://jenkins-ci.org/commit/lockable-resources-plugin/a7cc1aa694beed617aae302a7e04e770c81ec7aa
          Log:
          [FIXED JENKINS-40879] Only call proceed with resources to lock.

          Previously, we were calling LockStepExecution.proceed with the full
          list of resources - which was resulting in everything getting
          unlocked if there was a next context, i.e., if anything else was
          waiting on a resource to be unlocked. That was wrong, obviously.

          This changes to instead only call proceed with the list of resource
          names we actually need to lock for the next context, which fixes the
          test in the previous commit.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andrew Bayer Path: src/main/java/org/jenkins/plugins/lockableresources/LockableResourcesManager.java http://jenkins-ci.org/commit/lockable-resources-plugin/a7cc1aa694beed617aae302a7e04e770c81ec7aa Log: [FIXED JENKINS-40879] Only call proceed with resources to lock. Previously, we were calling LockStepExecution.proceed with the full list of resources - which was resulting in everything getting unlocked if there was a next context, i.e., if anything else was waiting on a resource to be unlocked. That was wrong, obviously. This changes to instead only call proceed with the list of resource names we actually need to lock for the next context, which fixes the test in the previous commit.

            abayer Andrew Bayer
            dawi Daniel Wilmer
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: