Reproduction case:

      Create a concurrent matrix job with a user defined axis that does a 'sleep 120' as its build step. Launch several of these jobs, enough that all available executors are taken up and there are still builds in the queue. Some of these builds will abort, with a console message similar to:

      [...]
      9 completed with result SUCCESS
      24 completed with result SUCCESS
      23 completed with result SUCCESS
      2 completed with result SUCCESS
      10 appears to be cancelled
      10 completed with result ABORTED
      25 appears to be cancelled
      25 completed with result ABORTED
      18 appears to be cancelled
      18 completed with result ABORTED
      13 appears to be cancelled
      [...]

      For my test case, I have 26 slaves, 82 executors, 25 sub-jobs. I can reproduce reliably if I launch 5 or more top level jobs at once.

          [JENKINS-13972] Concurrent matrix builds abort

          Code changed in jenkins
          User: Aleksas
          Path:
          core/src/main/java/hudson/model/Run.java
          http://jenkins-ci.org/commit/jenkins/3d850711bb1a31f11c4309bd798200fbc5410764
          Log:
          Update core/src/main/java/hudson/model/Run.java

          Handling NTFS symlinks introduced via Util.resolveSymlink.
          JENKINS-15587
          Also probably culprit for JENKINS-13972

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Aleksas Path: core/src/main/java/hudson/model/Run.java http://jenkins-ci.org/commit/jenkins/3d850711bb1a31f11c4309bd798200fbc5410764 Log: Update core/src/main/java/hudson/model/Run.java Handling NTFS symlinks introduced via Util.resolveSymlink. JENKINS-15587 Also probably culprit for  JENKINS-13972

          We are seeing this behavior back as well, we aren't using NTFS symlinks so not sure that recent change will address the issue for us

          Jeremy Van Haren added a comment - We are seeing this behavior back as well, we aren't using NTFS symlinks so not sure that recent change will address the issue for us

          Sarah Woodall added a comment - - edited

          I am seeing this issue for the first time today after upgrading to Jenkins 1.509.1 from the previous LTS version. I have a matrix job which runs four different flavours of build on each of three platforms (Mac, Linux and Windows). My job configuration has not changed (and in fact no code has been checked in at all since the last good build – I just started this build manually today to test Jenkins after the upgrade).
          Our master is on Windows, and there are two Windows executors on the same machine. There are two Mac executors on a Mac slave, and two Linux executors on a Linux slave. All of the builds in fact complete successfully, but the master reports that all four of the Windows builds "appear to be cancelled" and then that they "completed with result ABORTED". UPDATE: I have seen similar behaviour for other matrix jobs, including some that do not run on the slaves at all. I think it is a matrix job issue, not a master/slave issue.
          Changing the job configuration to make the builds run serially rather than in parallel appears to work round the problem.

          UPDATE: I believe this problem is Windows only. On my Windows installation, I had to configure all my matrix jobs to run serially, so as to work round this bug. I have now moved my Jenkins master to a Mac, and I have changed all my jobs again so that they do not run serially. So far, I have not seen the problem occur even once on the Mac. (On the Mac I have Jenkins 1.509.2 installed, but I don't think there is a fix for anything like this between 1.509.1 and 1.509.2, so it's more likely to be the change of platform that has caused the improvement.)

          Sarah Woodall added a comment - - edited I am seeing this issue for the first time today after upgrading to Jenkins 1.509.1 from the previous LTS version. I have a matrix job which runs four different flavours of build on each of three platforms (Mac, Linux and Windows). My job configuration has not changed (and in fact no code has been checked in at all since the last good build – I just started this build manually today to test Jenkins after the upgrade). Our master is on Windows, and there are two Windows executors on the same machine. There are two Mac executors on a Mac slave, and two Linux executors on a Linux slave. All of the builds in fact complete successfully, but the master reports that all four of the Windows builds "appear to be cancelled" and then that they "completed with result ABORTED". UPDATE: I have seen similar behaviour for other matrix jobs, including some that do not run on the slaves at all. I think it is a matrix job issue, not a master/slave issue. Changing the job configuration to make the builds run serially rather than in parallel appears to work round the problem. UPDATE: I believe this problem is Windows only. On my Windows installation, I had to configure all my matrix jobs to run serially, so as to work round this bug. I have now moved my Jenkins master to a Mac, and I have changed all my jobs again so that they do not run serially. So far, I have not seen the problem occur even once on the Mac. (On the Mac I have Jenkins 1.509.2 installed, but I don't think there is a fix for anything like this between 1.509.1 and 1.509.2, so it's more likely to be the change of platform that has caused the improvement.)

          Ilguiz Latypov added a comment - - edited

          I see a machine that aborts a job on its second slave. Both slaves start via SSH, and the machine runs a Centrify SSH server.

          Other 2 machines run a regular SSH server and do not exhibit aborts on their second slaves.

          We have Jenkins 1.492.

          Ilguiz Latypov added a comment - - edited I see a machine that aborts a job on its second slave. Both slaves start via SSH, and the machine runs a Centrify SSH server. Other 2 machines run a regular SSH server and do not exhibit aborts on their second slaves. We have Jenkins 1.492.

          Ilguiz Latypov added a comment - - edited

          I figured a node configuration of my matrix job received a "job disabled" property. Sub-projects disabled via "Configuration Slicing/Job Disabled Build Slicer (bool)" in /slicing/jobdisabledbool/ will deny requests for new runs without pointing the reason.

          [USER@MASTER ~]$ diff -u /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAME{X,Y}/config.xml
          --- /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAMEX/config.xml  2013-06-06 21:43:25.823244000 -0400
          +++ /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAMEY/config.xml  2013-06-07 16:22:46.529940000 -0400
          @@ -7,7 +7,7 @@
             </properties>
             <scm class="hudson.scm.NullSCM"/>
             <canRoam>true</canRoam>
          -  <disabled>true</disabled>
          +  <disabled>false</disabled>
             <blockBuildWhenDownstreamBuilding>false</blockBuildWhenDownstreamBuilding>
             <blockBuildWhenUpstreamBuilding>false</blockBuildWhenUpstreamBuilding>
             <triggers class="vector"/>
          

          Ilguiz Latypov added a comment - - edited I figured a node configuration of my matrix job received a "job disabled" property. Sub-projects disabled via "Configuration Slicing/Job Disabled Build Slicer (bool)" in /slicing/jobdisabledbool/ will deny requests for new runs without pointing the reason. [USER@MASTER ~]$ diff -u /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAME{X,Y}/config.xml --- /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAMEX/config.xml 2013-06-06 21:43:25.823244000 -0400 +++ /usr/local/jenkins/data/jobs/MATRIXPROJ/configurations/axis-MATRIX/HOSTNAMEY/config.xml 2013-06-07 16:22:46.529940000 -0400 @@ -7,7 +7,7 @@ </properties> <scm class= "hudson.scm.NullSCM" /> <canRoam> true </canRoam> - <disabled> true </disabled> + <disabled> false </disabled> <blockBuildWhenDownstreamBuilding> false </blockBuildWhenDownstreamBuilding> <blockBuildWhenUpstreamBuilding> false </blockBuildWhenUpstreamBuilding> <triggers class= "vector" />

          Shay Weiss added a comment -

          Hi all,

          I've been investigating aborts issues in Jenkins and I've found at least one bug with regards to this.
          Here is my report on the subject:
          https://docs.google.com/presentation/d/1ybtB-Bhkb4c3dhb5ZMArr4prtEZ-pjLqH9Vk7yhdZTg/

          There is also another issue I'm dealing with and in the process of investigating.

          Core developers - I'll be happy to make a contribution to the sources if you can give me pointers on how to modify my proposed fix so it will be 'commit worthy'

          Shay Weiss added a comment - Hi all, I've been investigating aborts issues in Jenkins and I've found at least one bug with regards to this. Here is my report on the subject: https://docs.google.com/presentation/d/1ybtB-Bhkb4c3dhb5ZMArr4prtEZ-pjLqH9Vk7yhdZTg/ There is also another issue I'm dealing with and in the process of investigating. Core developers - I'll be happy to make a contribution to the sources if you can give me pointers on how to modify my proposed fix so it will be 'commit worthy'

          Tidhar Klein Orbach added a comment - - edited

          Hi

          Is the solution suggested above by Shay Weiss reasonable? is it going to be pushed in the next versions?

          thanks

          Tidhar Klein Orbach added a comment - - edited Hi Is the solution suggested above by Shay Weiss reasonable? is it going to be pushed in the next versions? thanks

          Tidhar Klein Orbach added a comment - - edited

          I created a pull request with a fix, can someone please review?
          https://github.com/jenkinsci/matrix-project-plugin/pull/28

          thanks,
          Tidhar

          Tidhar Klein Orbach added a comment - - edited I created a pull request with a fix, can someone please review? https://github.com/jenkinsci/matrix-project-plugin/pull/28 thanks, Tidhar

          Putting this back to fixed as confirmed in PR it is no longer a problem. When someone spot a similar problem, please, file new issue.

          Oliver Gondža added a comment - Putting this back to fixed as confirmed in PR it is no longer a problem. When someone spot a similar problem, please, file new issue.

          pjdarton added a comment -

          FYI someone did spot this again and raised JENKINS-46453

          (and then one of my colleagues found that bug report after encountering the same symptoms, hence my interest in it)

          pjdarton added a comment - FYI someone did spot this again and raised JENKINS-46453 (and then one of my colleagues found that bug report after encountering the same symptoms, hence my interest in it)

            Unassigned Unassigned
            jkoleszar John Koleszar
            Votes:
            25 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: