Reproduction case:

      Create a concurrent matrix job with a user defined axis that does a 'sleep 120' as its build step. Launch several of these jobs, enough that all available executors are taken up and there are still builds in the queue. Some of these builds will abort, with a console message similar to:

      [...]
      9 completed with result SUCCESS
      24 completed with result SUCCESS
      23 completed with result SUCCESS
      2 completed with result SUCCESS
      10 appears to be cancelled
      10 completed with result ABORTED
      25 appears to be cancelled
      25 completed with result ABORTED
      18 appears to be cancelled
      18 completed with result ABORTED
      13 appears to be cancelled
      [...]

      For my test case, I have 26 slaves, 82 executors, 25 sub-jobs. I can reproduce reliably if I launch 5 or more top level jobs at once.

          [JENKINS-13972] Concurrent matrix builds abort

          John Koleszar created issue -

          John Koleszar added a comment -

          I was able to reproduce this by hacking one of Jenkins' unit tests as well:

          diff --git a/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy b/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy
          index 1ddb195..8b6f324 100644
          --- a/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy
          +++ b/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy
          @@ -116,7 +116,7 @@ class MatrixProjectCustomWorkspaceTest extends HudsonTestCase {
                */
               def configureCustomWorkspaceConcurrentBuild(MatrixProject p) {
                   // needs sufficient parallel execution capability
          -        jenkins.numExecutors = 10
          +        jenkins.numExecutors = 4
                   jenkins.updateComputerList()
           
                   p.axes = new AxisList(new TextAxis("foo", "1", "2"))
          @@ -140,8 +140,10 @@ class MatrixProjectCustomWorkspaceTest extends HudsonTestCase {
                   // get one going
                   Thread.sleep(1000)
                   def f2 = p.scheduleBuild2(0)
          +        Thread.sleep(1000)
          +        def f3 = p.scheduleBuild2(0)
           
          -        def bs = [f1, f2]*.get().each { assertBuildStatusSuccess(it) }
          +        def bs = [f1, f2, f3]*.get().each { assertBuildStatusSuccess(it) }
                   return bs
               }
           }
          
          

          John Koleszar added a comment - I was able to reproduce this by hacking one of Jenkins' unit tests as well: diff --git a/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy b/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy index 1ddb195..8b6f324 100644 --- a/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy +++ b/test/src/test/groovy/hudson/matrix/MatrixProjectCustomWorkspaceTest.groovy @@ -116,7 +116,7 @@ class MatrixProjectCustomWorkspaceTest extends HudsonTestCase { */ def configureCustomWorkspaceConcurrentBuild(MatrixProject p) { // needs sufficient parallel execution capability - jenkins.numExecutors = 10 + jenkins.numExecutors = 4 jenkins.updateComputerList() p.axes = new AxisList(new TextAxis("foo", "1", "2")) @@ -140,8 +140,10 @@ class MatrixProjectCustomWorkspaceTest extends HudsonTestCase { // get one going Thread.sleep(1000) def f2 = p.scheduleBuild2(0) + Thread.sleep(1000) + def f3 = p.scheduleBuild2(0) - def bs = [f1, f2]*.get().each { assertBuildStatusSuccess(it) } + def bs = [f1, f2, f3]*.get().each { assertBuildStatusSuccess(it) } return bs } }
          Sven Appenrodt made changes -
          Priority Original: Minor [ 4 ] New: Major [ 3 ]

          This seems to be a side effect of the fix of issue 6747.
          The problem only appears when starting the matrix jobs in concurrent mode. Starting the job in serial mode will not abort the axis-jobs
          Next problem: the workaround given in issue 6747 is not working anymore. So there is no possibility to patch the job to run them concurrent anymore.

          Sven Appenrodt added a comment - This seems to be a side effect of the fix of issue 6747. The problem only appears when starting the matrix jobs in concurrent mode. Starting the job in serial mode will not abort the axis-jobs Next problem: the workaround given in issue 6747 is not working anymore. So there is no possibility to patch the job to run them concurrent anymore.

          Trevor Baker added a comment -

          Arg, I just wanted to starting using concurrent matrix builds and ran into this. I am heartened to see that the bug is open and hope we can see resolution soon!

          Trevor Baker added a comment - Arg, I just wanted to starting using concurrent matrix builds and ran into this. I am heartened to see that the bug is open and hope we can see resolution soon!

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          changelog.html
          core/src/main/java/hudson/matrix/MatrixBuild.java
          core/src/main/java/hudson/matrix/MatrixConfiguration.java
          http://jenkins-ci.org/commit/jenkins/9c7ef619cc96dc0111220412e841199de71d5b8d
          Log:
          [FIXED JENKINS-13972]

          Fixed a problem in actually making concurrent builds work.

          Compare: https://github.com/jenkinsci/jenkins/compare/c2c31e2b933a...9c7ef619cc96

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: changelog.html core/src/main/java/hudson/matrix/MatrixBuild.java core/src/main/java/hudson/matrix/MatrixConfiguration.java http://jenkins-ci.org/commit/jenkins/9c7ef619cc96dc0111220412e841199de71d5b8d Log: [FIXED JENKINS-13972] Fixed a problem in actually making concurrent builds work. Compare: https://github.com/jenkinsci/jenkins/compare/c2c31e2b933a...9c7ef619cc96
          SCM/JIRA link daemon made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          dogfood added a comment -

          Integrated in jenkins_main_trunk #1812
          [FIXED JENKINS-13972] (Revision 9c7ef619cc96dc0111220412e841199de71d5b8d)

          Result = UNSTABLE
          Kohsuke Kawaguchi : 9c7ef619cc96dc0111220412e841199de71d5b8d
          Files :

          • changelog.html
          • core/src/main/java/hudson/matrix/MatrixConfiguration.java
          • core/src/main/java/hudson/matrix/MatrixBuild.java

          dogfood added a comment - Integrated in jenkins_main_trunk #1812 [FIXED JENKINS-13972] (Revision 9c7ef619cc96dc0111220412e841199de71d5b8d) Result = UNSTABLE Kohsuke Kawaguchi : 9c7ef619cc96dc0111220412e841199de71d5b8d Files : changelog.html core/src/main/java/hudson/matrix/MatrixConfiguration.java core/src/main/java/hudson/matrix/MatrixBuild.java

          aleksas added a comment -

          Matrix build started on Debian Debian-5.0.9 downstream projects executed on amd64 and i386 debian 5.0 slaves.

          no change for svn://************ since the previous build
          Triggering buildnode_x86.deb50
          Triggering buildnode_x86_64.deb50
          Configuration buildnode_x86.deb50 is still in the queue: Waiting for next available executor on build-lnx32-2.deb50
          buildnode_x86.deb50 completed with result SUCCESS
          appears to be cancelled
          buildnode_x86_64.deb50 completed with result ABORTED
          Notifying upstream build ************* #426 of job completion
          All downstream projects complete!
          Minimum result threshold not met for join project
          Notifying upstream projects of job completion
          Notifying upstream of completion: ********** #426
          Finished: ABORTED

          buildnode_x86_64.deb50 task log shows:

          Notifying upstream projects of job completion
          Finished: SUCCESS

          Jenkins master runs on Windows Server 2008
          Jenkins ver. 1.492

          slave java.version 1.6.0_0
          master java.version 1.7

          aleksas added a comment - Matrix build started on Debian Debian-5.0.9 downstream projects executed on amd64 and i386 debian 5.0 slaves. no change for svn://************ since the previous build Triggering buildnode_x86.deb50 Triggering buildnode_x86_64.deb50 Configuration buildnode_x86.deb50 is still in the queue: Waiting for next available executor on build-lnx32-2.deb50 buildnode_x86.deb50 completed with result SUCCESS appears to be cancelled buildnode_x86_64.deb50 completed with result ABORTED Notifying upstream build ************* #426 of job completion All downstream projects complete! Minimum result threshold not met for join project Notifying upstream projects of job completion Notifying upstream of completion: ********** #426 Finished: ABORTED buildnode_x86_64.deb50 task log shows: Notifying upstream projects of job completion Finished: SUCCESS Jenkins master runs on Windows Server 2008 Jenkins ver. 1.492 slave java.version 1.6.0_0 master java.version 1.7
          aleksas made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

            Unassigned Unassigned
            jkoleszar John Koleszar
            Votes:
            25 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: