[JENKINS-27905] stage concurrency allows 1 thread to be blocked at a time, others fail and hang job

Type: Bug
Resolution: Not A Defect
Priority: Minor
Component/s: pipeline
Labels:
None
Environment:
Jenkins 1.606
Workflow 1.5

Similar Issues:
Powered by SuggestiMate

Show

Take the following script as an example:

node {
    def sleeps = [ : ]
    for (int i = 0; i < 4; i++) {
        sleeps["num_${i}"] = { sleep60() }
    }
    parallel(sleeps)
}

def sleep60 () {
    ws {
        stage name: "Sleep for 60 secs", concurrency: 1
        sh "sleep 60"
    }
}

Because the stages are marked with concurrency: 1, I would expect the sleeps to occur in series, but with all "parallel" tasks performed. With 3, then 2, then 1 task blocked until all are complete.

Instead the first sleep runs, the second blocks on it, but the other two mysteriously "fail", and this actually results in the Job hanging indefinitely and the 2nd sleep doesn't seem to get to run either.

Started by user anonymous
Running: Allocate node : Start
Running on master in /mydir/jenkins/workspace/Sandbox
Running: Allocate node : Body : Start
Running: Execute sub-workflows in parallel : Start
[num_0] Running: Parallel branch: num_0
[num_1] Running: Parallel branch: num_1
[num_2] Running: Parallel branch: num_2
[num_3] Running: Parallel branch: num_3
[num_0] Running: Allocate workspace : Start
[num_0] Running in /mydir/jenkins/workspace/Sandbox-2
[num_0] Running: Allocate workspace : Body : Start
[num_1] Running: Allocate workspace : Start
[num_1] Running in /mydir/jenkins/workspace/Sandbox-3
[num_1] Running: Allocate workspace : Body : Start
[num_2] Running: Allocate workspace : Start
[num_2] Running in /mydir/jenkins/workspace/Sandbox-4
[num_2] Running: Allocate workspace : Body : Start
[num_3] Running: Allocate workspace : Start
[num_3] Running in /mydir/jenkins/workspace/Sandbox-5
[num_3] Running: Allocate workspace : Body : Start
[num_0] Running: Sleep for 60 secs
[num_0] Entering stage Sleep for 60 secs
[num_0] Proceeding
[num_0] Running: Shell Script
[num_0] [Sandbox-2] Running shell script
[num_1] Running: Sleep for 60 secs
[num_1] Entering stage Sleep for 60 secs
[num_1] Waiting for builds [7]
[num_2] Running: Sleep for 60 secs
[num_2] Entering stage Sleep for 60 secs
Running: Allocate workspace : Body : End
[num_3] Running: Sleep for 60 secs
[num_3] Entering stage Sleep for 60 secs
Running: Allocate workspace : Body : End
Running: Allocate workspace : End
Running: Allocate workspace : End
Running: Execute sub-workflows in parallel : Body : End
Running: Execute sub-workflows in parallel : Body : End
[num_0] + sleep 60
Running: Allocate workspace : Body : End
Running: Allocate workspace : End
Running: Execute sub-workflows in parallel : Body : End

I assume this is just a bug rather than my misunderstanding of the use of concurrency but I'm happy to be corrected!

Thomas Dalton added a comment - 2015-04-12 21:28 - edited

Deleted previous comment because it still doesn't work even with an attempted workaround to add an "exit" stage.

Thomas Dalton added a comment - 2015-04-12 21:28 - edited Deleted previous comment because it still doesn't work even with an attempted workaround to add an "exit" stage.

Thomas Dalton added a comment - 2015-04-13 13:42 - edited

I understand after some further reading:
https://issues.jenkins-ci.org/browse/JENKINS-25570
https://issues.jenkins-ci.org/browse/JENKINS-27127

That "mutex" like behavior is not really what the concurrency field is all about. It is more for filtering jobs, allowing a certain number to queue up with others effectively thrown away and failed (the secondary issue here is that it seems to hang and leave stale state).

For the benefit of others, after a bit of experimentation I've achieved what I wanted. To run parallel jobs that are similar and for some element of them the same, such that the common parts of their flow have to be serialized. Hopefully the increment below is "atomic" or at least "atomic enough".

class qLock implements Serializable {
    int id = 0
    List queue = []

    def increment() {
        return ++id
    }
    def acquire_id() {
         id = this.increment()
         this.queue.add(id)
         return id
    }
    def is_ready(int id) {
        return (this.queue.indexOf(id) == 0)
    }
    def release(int id) {
         if (this.is_ready(id)) {
             this.queue.remove(0)
         }
    }
}

node {
    def sleeps = [ : ]
    def ql = new qLock()
    for (int i = 0; i < 4; i++) {
        sleeps["num_${i}"] = { sleep10(ql) }
    }
    parallel(sleeps)
}

def sleep10 (qLock ql) {
    ws {
// try to acquire "lock"
        def id = ql.acquire_id()
// wait until it is my turn
        while (!ql.is_ready(id)) {
            sleep(1)
        }
        stage name: "Sleeping for 10 secs"
        sh "sleep 10"
        ql.release(id)
        stage name: "Sleep finished"
    }
}

Thomas Dalton added a comment - 2015-04-13 13:42 - edited I understand after some further reading: https://issues.jenkins-ci.org/browse/JENKINS-25570 https://issues.jenkins-ci.org/browse/JENKINS-27127 That "mutex" like behavior is not really what the concurrency field is all about. It is more for filtering jobs, allowing a certain number to queue up with others effectively thrown away and failed (the secondary issue here is that it seems to hang and leave stale state). For the benefit of others, after a bit of experimentation I've achieved what I wanted. To run parallel jobs that are similar and for some element of them the same, such that the common parts of their flow have to be serialized. Hopefully the increment below is "atomic" or at least "atomic enough". class qLock implements Serializable { int id = 0 List queue = [] def increment() { return ++id } def acquire_id() { id = this .increment() this .queue.add(id) return id } def is_ready( int id) { return ( this .queue.indexOf(id) == 0) } def release( int id) { if ( this .is_ready(id)) { this .queue.remove(0) } } } node { def sleeps = [ : ] def ql = new qLock() for ( int i = 0; i < 4; i++) { sleeps[ "num_${i}" ] = { sleep10(ql) } } parallel(sleeps) } def sleep10 (qLock ql) { ws { // try to acquire "lock" def id = ql.acquire_id() // wait until it is my turn while (!ql.is_ready(id)) { sleep(1) } stage name: "Sleeping for 10 secs" sh "sleep 10" ql.release(id) stage name: "Sleep finished" } }

Jesse Glick added a comment - 2015-04-14 14:00

stage is not supported inside parallel. Or rather it is legal but pointless, since the list of stages is global to a build, and does not take into account threads.

Jesse Glick added a comment - 2015-04-14 14:00 stage is not supported inside parallel . Or rather it is legal but pointless, since the list of stages is global to a build, and does not take into account threads.

Thomas Dalton added a comment - 2015-04-14 15:35

A stage within parallel could be a common use-case for CI. For example, a given workflow may need multiple builds as well as multiple independent tests, all of which could run in parallel on the same server - each of these flows could easily benefit from having stages.

You could work around this by defining flows for each of the parallel builds, and then kicking these off e.g.

parallel([build_1: {build job: "FooFlow"}, build_2: {build job: "BarFlow"}])

Where FooFlow and BarFlow could identify their own stages, but this would require further setup on the jenkins server, more jobs to configure, and detract from the usefulness of the workflow-plugin to do this programmatically.

Thomas Dalton added a comment - 2015-04-14 15:35 A stage within parallel could be a common use-case for CI. For example, a given workflow may need multiple builds as well as multiple independent tests, all of which could run in parallel on the same server - each of these flows could easily benefit from having stages. You could work around this by defining flows for each of the parallel builds, and then kicking these off e.g. parallel([build_1: {build job: "FooFlow" }, build_2: {build job: "BarFlow" }]) Where FooFlow and BarFlow could identify their own stages, but this would require further setup on the jenkins server, more jobs to configure, and detract from the usefulness of the workflow-plugin to do this programmatically.

Jesse Glick added a comment - 2015-04-14 19:45

I am not saying there are no conceivable use cases, just that stage is designed and written to implement one particular logical operation (a kind of semaphore), and what you are asking about would require a different step with a different implementation.

Jesse Glick added a comment - 2015-04-14 19:45 I am not saying there are no conceivable use cases, just that stage is designed and written to implement one particular logical operation (a kind of semaphore), and what you are asking about would require a different step with a different implementation.

Assignee:: Jesse Glick

Reporter:: Thomas Dalton

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2015-04-11 22:56

Updated:: 2016-08-26 21:54

Resolved:: 2015-04-14 13:58

Details

Description

Attachments

Activity

Collapse comment: Thomas Dalton added a comment - 2015-04-12 21:28, Edited by Thomas Dalton - 2015-04-12 22:08

Expand comment: Thomas Dalton added a comment - 2015-04-12 21:28, Edited by Thomas Dalton - 2015-04-12 22:08

Collapse comment: Thomas Dalton added a comment - 2015-04-13 13:42, Edited by Thomas Dalton - 2015-04-13 16:37

Expand comment: Thomas Dalton added a comment - 2015-04-13 13:42, Edited by Thomas Dalton - 2015-04-13 16:37

Collapse comment: Jesse Glick added a comment - 2015-04-14 14:00

Expand comment: Jesse Glick added a comment - 2015-04-14 14:00

Collapse comment: Thomas Dalton added a comment - 2015-04-14 15:35

Expand comment: Thomas Dalton added a comment - 2015-04-14 15:35

Collapse comment: Jesse Glick added a comment - 2015-04-14 19:45

Expand comment: Jesse Glick added a comment - 2015-04-14 19:45

People

Dates