Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56544

failFast option for parallel stages sets build status to ABORTED when failure is inside of a stage with an agent

    • pipeline-model-definition 1.3.7

      Symptom

      With a Pipeline using parallel stages with failFast true enabled, when a stage with an agent fails, the final build result is showing as ABORTED instead of FAILURE

      A similar bug was recently fixed JENKINS-55459 which corrected the build status result when using non-nested parallel stages with failFast true enabled, but it seems it did not catch the case where a nested stage inside of one of the parallel stages fails.

      Evidence

      The fix for JENKINS-55459 was delivered under version 1.3.5 of the Pipeline: Declarative Plugin, I am testing version 1.3.6.

      I started up a brand new Jenkins LTS 2.150.3 instance, with the 'recommended' plugins, including Pipeline: Declarative Plugin version 1.3.6

      I ran the testcase inside of JENKINS-55459 and the build status is correctly marked as failed, so the fix from JENKINS-55459 was correct.

      When I run the attached Jenkinsfile , the build still shows as ABORTED:

      ...
      ERROR: script returned exit code 1
      Finished: ABORTED
      

      Here is the full log: log

      I expected the build status to be marked as FAILURE, not ABORTED.

      Hypothesis

      I believe the fix from JENKINS-55459 worked, but does not account for failures inside of stages with agents within the parallel stages.

        1. Jenkinsfile
          2 kB
        2. log
          2 kB
        3. pipeline__Jenkins__and_Mozilla_Firefox.png
          pipeline__Jenkins__and_Mozilla_Firefox.png
          100 kB
        4. test.png
          test.png
          48 kB

          [JENKINS-56544] failFast option for parallel stages sets build status to ABORTED when failure is inside of a stage with an agent

          Devin Nusbaum added a comment -

          I spent some time trying to reproduce this today but was not able to. Here is what I came up with as a base, but I experimented with various modifications to nesting, post stages, and swapping out error for a failing sh step but everything seemed to work fine.

          I do think there could be some issues with certain kinds of agents based on LabelScript.groovy, DockerPipelineFromDockerfileScript.groovy (1, 2, and 3), and DockerPipelineScript.groovy (1, and 2), but I don't think those places would matter for the Jenkinsfile you posted (Unless maybe LabelScript is being used to run the builds on your machine and that try/catch block is being triggered).

          One thing that is strange to me is that in your logs we see that the post conditions for failure and aborted both ran:

          [Pipeline] { (Declarative: Post Actions)
          [Pipeline] echo
           **** Pipeline ALWAYS ****
          [Pipeline] echo
           **** Pipeline Aborted **** 
          [Pipeline] echo
           **** Pipeline FAILURE **** 
          

          There is probably something wrong in Failure.meetsCondition, that is causing those two conditions to overlap in some cases. Maybe the new errorResult computation needs to be moved before this line so that line can check whether the errorResult matches Aborted specifically rather than just checking if error is null. Either way I think this is just a symptom of the build result being aborted in the first place, so it might not matter in practice once we figure out the other issue.

          Devin Nusbaum added a comment - I spent some time trying to reproduce this today but was not able to. Here is what I came up with as a base, but I experimented with various modifications to nesting, post stages, and swapping out error for a failing sh step but everything seemed to work fine. I do think there could be some issues with certain kinds of agents based on LabelScript.groovy , DockerPipelineFromDockerfileScript.groovy ( 1 , 2 , and 3 ), and DockerPipelineScript.groovy ( 1 , and 2 ), but I don't think those places would matter for the Jenkinsfile you posted (Unless maybe LabelScript is being used to run the builds on your machine and that try/catch block is being triggered). One thing that is strange to me is that in your logs we see that the post conditions for failure and aborted both ran: [Pipeline] { (Declarative: Post Actions) [Pipeline] echo **** Pipeline ALWAYS **** [Pipeline] echo **** Pipeline Aborted **** [Pipeline] echo **** Pipeline FAILURE **** There is probably something wrong in Failure.meetsCondition , that is causing those two conditions to overlap in some cases. Maybe the new errorResult computation needs to be moved before this line so that line can check whether the errorResult matches Aborted specifically rather than just checking if error is null. Either way I think this is just a symptom of the build result being aborted in the first place, so it might not matter in practice once we figure out the other issue.

          Devin Nusbaum added a comment -

          Ok, I was able to reproduce the issue, here is the minimal reproduction case:

          pipeline {
              agent none
              stages {
                  stage("foo") {
                      failFast true
                      parallel {
                          stage("first") {
                              steps {
                                  error "First branch"
                              }
                          }
                          stage("second") {
                              agent any
                              steps {
                                  sleep 10
                                  echo "Second branch"
                              }
                          }
                      }
                  }
              }
          }
          

          Note that this reproduction does not have any nested stages. I think the root of the issue is the agent any in the branch that gets terminated early because of the explicit build result setting in LabelScript as noted in my previous comment.

          Devin Nusbaum added a comment - Ok, I was able to reproduce the issue, here is the minimal reproduction case: pipeline { agent none stages { stage("foo") { failFast true parallel { stage("first") { steps { error "First branch" } } stage("second") { agent any steps { sleep 10 echo "Second branch" } } } } } } Note that this reproduction does not have any nested stages. I think the root of the issue is the agent any in the branch that gets terminated early because of the explicit build result setting in LabelScript as noted in my previous comment.

          Ray Kivisto added a comment -

          I'm not sure why you were not able to reproduce with my original testcase, I just tried again now, and started a brand new Jenkins LTS 2.164.1 instance (latest LTS as of today), and installed the "recommended plugins", then ran the attached Jenkinsfile without any changes or Jenkins configuration changes (meaning the `agent any` blocks ran on the master), and the end of the build log shows:

          Finished: ABORTED

           

          Ray Kivisto added a comment - I'm not sure why you were not able to reproduce with my original testcase, I just tried again now, and started a brand new Jenkins LTS 2.164.1 instance (latest LTS as of today), and installed the "recommended plugins", then ran the attached Jenkinsfile without any changes or Jenkins configuration changes (meaning the `agent any` blocks ran on the master), and the end of the build log shows: Finished: ABORTED  

          Ray Kivisto added a comment -

          I should also mention that your new reduced testcase also reproduces the issue with the build status as:

          Finished: ABORTED

          Ray Kivisto added a comment - I should also mention that your new reduced testcase also reproduces the issue with the build status as: Finished: ABORTED

          Devin Nusbaum added a comment -

          I'm not sure why you were not able to reproduce with my original testcase

          I probably didn't clean my plugin work directory or something and so was running code that didn't match 1.3.6, and then immediately started trying to simplify and got rid of the key piece that causes the issue. I think your reproduction should work fine, thanks for coming up with it!

          Devin Nusbaum added a comment - I'm not sure why you were not able to reproduce with my original testcase I probably didn't clean my plugin work directory or something and so was running code that didn't match 1.3.6, and then immediately started trying to simplify and got rid of the key piece that causes the issue. I think your reproduction should work fine, thanks for coming up with it!

          Devin Nusbaum added a comment -

          Filed https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/322 which I think should fix the problem and updated the ticket title/description with what seems like crux of the issue - the fix in JENKINS-55459 didn't work for any stage with an agent.

          Devin Nusbaum added a comment - Filed https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/322 which I think should fix the problem and updated the ticket title/description with what seems like crux of the issue - the fix in JENKINS-55459 didn't work for any stage with an agent.

          Andrew Bayer added a comment -

          Merged, releasing as 1.3.7 right now.

          Andrew Bayer added a comment - Merged, releasing as 1.3.7 right now.

          Ray Kivisto added a comment -

          Verified fixed in Pipeline: Declarative version 1.3.7, thanks!

          Ray Kivisto added a comment - Verified fixed in Pipeline: Declarative version 1.3.7, thanks!

          Liam Newman added a comment -

          Bulk closing resolved issues.

          Liam Newman added a comment - Bulk closing resolved issues.

            dnusbaum Devin Nusbaum
            rkivisto Ray Kivisto
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: