Locking and Unlocking takes very long for no apparent reason.

      Take this example pipeline:

      node {
          // change workspace to shared directory
          exws(extWorkspace) {
              timeout(time: 30, unit: 'MINUTES') {
                  stage('build') {
                      checkout scm
                      nodeEnv {
                          sh 'ant build'
                      }
                      milestone 1
                  } // stage build
              } // timeout
          } // exws
      } // node
      lock(resource: 'deploy', inversePrecedence: true) {
          // do not deploy older versions of the artifacts (NON-UNIQUE-SNAPSHOTS)
          milestone 2
          node {
              exws(extWorkspace) {
                  stage('deploy') {
                      sh 'ant deploy'
                  } // stage deploy
              } // extws
          }  // node
      } // lock

       This is how the Build-Log looks (sometimes):

      12:04:42 BUILD SUCCESSFUL
      12:04:42 Total time: 3 minutes 37 seconds
      [Pipeline] }
      [Pipeline] // withEnv
      [Pipeline] milestone
      12:04:42 Trying to pass milestone 1
      [Pipeline] }
      [Pipeline] // stage
      [Pipeline] }
      [Pipeline] // timeout
      [Pipeline] }
      [Pipeline] // exws
      [Pipeline] }
      [Pipeline] // node
      

      here is the first problem. I see the job leaving the node, but not that it tries to lock a resource, that message will be printed after the lock is acquired (even though the timestamp implies something else)!

      [Pipeline] lock
      12:04:42 Trying to acquire lock on [deploy]
      12:13:36 Lock acquired on [deploy]
      [Pipeline] {
      [Pipeline] milestone
      12:13:36 Trying to pass milestone 2
      [Pipeline] node
       

      and the second problem is that it took nearly 10 minutes to get the lock while there was only one other non related job running. So the lock should be obtainable super fast and the overall workload of the system was low.

      Not only the jobs seem to have these performance drops, but also when going to <yourjenkins.url>/lockable-resources/ there are times when it takes minutes to see the currently locked resources.

          [JENKINS-45131] Locking Resources is slow

          I've managed to look at the Thread dump of such a Job today, it's stuck here:

          Thread #4
          	at DSL.lock(Native Method)
          	at ...
          

          Pipeline Steps show:

          • Step: Lock shared resource : Start - (19 Sekunden in block)
            Although the Builds Console didn't print that already the Steps Console showed:
            19:12:21 Trying to acquire lock on [deploy]
            19:12:21 Lock acquired on [deploy]
            
            • Step: Lock shared resource : Body : Start - (no timing in block)
              no console for this step
              • Step: The milestone step forces all builds to go through in order - (92 ms in self)
                19:21:55 Trying to pass milestone 2
                

          So it took 9 Minutes to go over the "Lock shared resource : Body : Start" ?
          The Build was in a leightweight executor at this point and no other build where running.

          Daniel Geißler added a comment - I've managed to look at the Thread dump of such a Job today, it's stuck here: Thread #4 at DSL.lock(Native Method) at ... Pipeline Steps show: Step: Lock shared resource : Start - (19 Sekunden in block) Although the Builds Console didn't print that already the Steps Console showed: 19:12:21 Trying to acquire lock on [deploy] 19:12:21 Lock acquired on [deploy] Step: Lock shared resource : Body : Start - (no timing in block) no console for this step Step: The milestone step forces all builds to go through in order - (92 ms in self) 19:21:55 Trying to pass milestone 2 So it took 9 Minutes to go over the "Lock shared resource : Body : Start" ? The Build was in a leightweight executor at this point and no other build where running.

          Maybe this is related to JENKINS-45821 ?

          Daniel Geißler added a comment - Maybe this is related to JENKINS-45821 ?

          Stefanos Kalantzis added a comment - - edited

          I can confirm this behavior.
          I wouldn't classify this as just "slow".

          In my case I tried to use 10 locks in parallel (with different resource name), and the whole Jenkins environment was unresponsive for about 1.5 hours. Any page I tried to load timed-out.
          (The build was running on a build node on another VM)

          • Jenkins version - 2.107.1
          • Lockable Resources version - 2.2

          Stefanos Kalantzis added a comment - - edited I can confirm this behavior. I wouldn't classify this as just "slow". In my case I tried to use 10 locks in parallel (with different resource name), and the whole Jenkins environment was unresponsive for about 1.5 hours. Any page I tried to load timed-out. (The build was running on a build node on another VM) Jenkins version - 2.107.1 Lockable Resources version - 2.2

          Is this still an issue, even with JENKINS-45821 fixed? If yes, can you provide a stack dump when the system is in such a state (maybe by private mail to me)?

          Tobias Gruetzmacher added a comment - Is this still an issue, even with JENKINS-45821 fixed? If yes, can you provide a stack dump when the system is in such a state (maybe by private mail to me)?

            Unassigned Unassigned
            dageissl Daniel Geißler
            Votes:
            7 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: