Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-27905

stage concurrency allows 1 thread to be blocked at a time, others fail and hang job

    • Icon: Bug Bug
    • Resolution: Not A Defect
    • Icon: Minor Minor
    • pipeline
    • None
    • Jenkins 1.606
      Workflow 1.5

      Take the following script as an example:

      node {
          def sleeps = [ : ]
          for (int i = 0; i < 4; i++) {
              sleeps["num_${i}"] = { sleep60() }
          }
          parallel(sleeps)
      }
      
      def sleep60 () {
          ws {
              stage name: "Sleep for 60 secs", concurrency: 1
              sh "sleep 60"
          }
      }
      

      Because the stages are marked with concurrency: 1, I would expect the sleeps to occur in series, but with all "parallel" tasks performed. With 3, then 2, then 1 task blocked until all are complete.

      Instead the first sleep runs, the second blocks on it, but the other two mysteriously "fail", and this actually results in the Job hanging indefinitely and the 2nd sleep doesn't seem to get to run either.

      Started by user anonymous
      Running: Allocate node : Start
      Running on master in /mydir/jenkins/workspace/Sandbox
      Running: Allocate node : Body : Start
      Running: Execute sub-workflows in parallel : Start
      [num_0] Running: Parallel branch: num_0
      [num_1] Running: Parallel branch: num_1
      [num_2] Running: Parallel branch: num_2
      [num_3] Running: Parallel branch: num_3
      [num_0] Running: Allocate workspace : Start
      [num_0] Running in /mydir/jenkins/workspace/Sandbox-2
      [num_0] Running: Allocate workspace : Body : Start
      [num_1] Running: Allocate workspace : Start
      [num_1] Running in /mydir/jenkins/workspace/Sandbox-3
      [num_1] Running: Allocate workspace : Body : Start
      [num_2] Running: Allocate workspace : Start
      [num_2] Running in /mydir/jenkins/workspace/Sandbox-4
      [num_2] Running: Allocate workspace : Body : Start
      [num_3] Running: Allocate workspace : Start
      [num_3] Running in /mydir/jenkins/workspace/Sandbox-5
      [num_3] Running: Allocate workspace : Body : Start
      [num_0] Running: Sleep for 60 secs
      [num_0] Entering stage Sleep for 60 secs
      [num_0] Proceeding
      [num_0] Running: Shell Script
      [num_0] [Sandbox-2] Running shell script
      [num_1] Running: Sleep for 60 secs
      [num_1] Entering stage Sleep for 60 secs
      [num_1] Waiting for builds [7]
      [num_2] Running: Sleep for 60 secs
      [num_2] Entering stage Sleep for 60 secs
      Running: Allocate workspace : Body : End
      [num_3] Running: Sleep for 60 secs
      [num_3] Entering stage Sleep for 60 secs
      Running: Allocate workspace : Body : End
      Running: Allocate workspace : End
      Running: Allocate workspace : End
      Running: Execute sub-workflows in parallel : Body : End
      Running: Execute sub-workflows in parallel : Body : End
      [num_0] + sleep 60
      Running: Allocate workspace : Body : End
      Running: Allocate workspace : End
      Running: Execute sub-workflows in parallel : Body : End
      

      I assume this is just a bug rather than my misunderstanding of the use of concurrency but I'm happy to be corrected!

          [JENKINS-27905] stage concurrency allows 1 thread to be blocked at a time, others fail and hang job

          Thomas Dalton added a comment - - edited

          Deleted previous comment because it still doesn't work even with an attempted workaround to add an "exit" stage.

          Thomas Dalton added a comment - - edited Deleted previous comment because it still doesn't work even with an attempted workaround to add an "exit" stage.

          Thomas Dalton added a comment - - edited

          I understand after some further reading:
          https://issues.jenkins-ci.org/browse/JENKINS-25570
          https://issues.jenkins-ci.org/browse/JENKINS-27127

          That "mutex" like behavior is not really what the concurrency field is all about. It is more for filtering jobs, allowing a certain number to queue up with others effectively thrown away and failed (the secondary issue here is that it seems to hang and leave stale state).

          For the benefit of others, after a bit of experimentation I've achieved what I wanted. To run parallel jobs that are similar and for some element of them the same, such that the common parts of their flow have to be serialized. Hopefully the increment below is "atomic" or at least "atomic enough".

          class qLock implements Serializable {
              int id = 0
              List queue = []
          
              def increment() {
                  return ++id
              }
              def acquire_id() {
                   id = this.increment()
                   this.queue.add(id)
                   return id
              }
              def is_ready(int id) {
                  return (this.queue.indexOf(id) == 0)
              }
              def release(int id) {
                   if (this.is_ready(id)) {
                       this.queue.remove(0)
                   }
              }
          }
          
          node {
              def sleeps = [ : ]
              def ql = new qLock()
              for (int i = 0; i < 4; i++) {
                  sleeps["num_${i}"] = { sleep10(ql) }
              }
              parallel(sleeps)
          }
          
          def sleep10 (qLock ql) {
              ws {
          // try to acquire "lock"
                  def id = ql.acquire_id()
          // wait until it is my turn
                  while (!ql.is_ready(id)) {
                      sleep(1)
                  }
                  stage name: "Sleeping for 10 secs"
                  sh "sleep 10"
                  ql.release(id)
                  stage name: "Sleep finished"
              }
          }
          

          Thomas Dalton added a comment - - edited I understand after some further reading: https://issues.jenkins-ci.org/browse/JENKINS-25570 https://issues.jenkins-ci.org/browse/JENKINS-27127 That "mutex" like behavior is not really what the concurrency field is all about. It is more for filtering jobs, allowing a certain number to queue up with others effectively thrown away and failed (the secondary issue here is that it seems to hang and leave stale state). For the benefit of others, after a bit of experimentation I've achieved what I wanted. To run parallel jobs that are similar and for some element of them the same, such that the common parts of their flow have to be serialized. Hopefully the increment below is "atomic" or at least "atomic enough". class qLock implements Serializable { int id = 0 List queue = [] def increment() { return ++id } def acquire_id() { id = this .increment() this .queue.add(id) return id } def is_ready( int id) { return ( this .queue.indexOf(id) == 0) } def release( int id) { if ( this .is_ready(id)) { this .queue.remove(0) } } } node { def sleeps = [ : ] def ql = new qLock() for ( int i = 0; i < 4; i++) { sleeps[ "num_${i}" ] = { sleep10(ql) } } parallel(sleeps) } def sleep10 (qLock ql) { ws { // try to acquire "lock" def id = ql.acquire_id() // wait until it is my turn while (!ql.is_ready(id)) { sleep(1) } stage name: "Sleeping for 10 secs" sh "sleep 10" ql.release(id) stage name: "Sleep finished" } }

          Jesse Glick added a comment -

          stage is not supported inside parallel. Or rather it is legal but pointless, since the list of stages is global to a build, and does not take into account threads.

          Jesse Glick added a comment - stage is not supported inside parallel . Or rather it is legal but pointless, since the list of stages is global to a build, and does not take into account threads.

          Thomas Dalton added a comment -

          A stage within parallel could be a common use-case for CI. For example, a given workflow may need multiple builds as well as multiple independent tests, all of which could run in parallel on the same server - each of these flows could easily benefit from having stages.

          You could work around this by defining flows for each of the parallel builds, and then kicking these off e.g.

          parallel([build_1: {build job: "FooFlow"}, build_2: {build job: "BarFlow"}])
          

          Where FooFlow and BarFlow could identify their own stages, but this would require further setup on the jenkins server, more jobs to configure, and detract from the usefulness of the workflow-plugin to do this programmatically.

          Thomas Dalton added a comment - A stage within parallel could be a common use-case for CI. For example, a given workflow may need multiple builds as well as multiple independent tests, all of which could run in parallel on the same server - each of these flows could easily benefit from having stages. You could work around this by defining flows for each of the parallel builds, and then kicking these off e.g. parallel([build_1: {build job: "FooFlow" }, build_2: {build job: "BarFlow" }]) Where FooFlow and BarFlow could identify their own stages, but this would require further setup on the jenkins server, more jobs to configure, and detract from the usefulness of the workflow-plugin to do this programmatically.

          Jesse Glick added a comment -

          I am not saying there are no conceivable use cases, just that stage is designed and written to implement one particular logical operation (a kind of semaphore), and what you are asking about would require a different step with a different implementation.

          Jesse Glick added a comment - I am not saying there are no conceivable use cases, just that stage is designed and written to implement one particular logical operation (a kind of semaphore), and what you are asking about would require a different step with a different implementation.

            jglick Jesse Glick
            tomjdalton Thomas Dalton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: