Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-29652

Rerunning failed Delivery Pipeline stage doesn't enable/disable a manual trigger when using the Join plugin

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • join-plugin
    • None
    • Jenkins 1.609.1
      Delivery Pipeline plugin 0.9.4
      Join plugin 1.15

      If a jobA triggers two other jobs (jobB and jobC), this would show as a split in a delivery pipeline view. If I want jobD to run after jobB and jobC completed (merge in the flow), I'd need to configure the Join plugin on jobA and say "run jobD once all downstream jobs have completed".

      This works fine, but if say jobC failed and I rerun that stage in the delivery pipeline view, jobD doesn't get enabled. After testing it seems that whether jobD runs or not is determined by the initial assessment of the Join plugin. No matter what I rerun from the downstream jobs (jobB, jobC), jobD stays in its original state.

          [JENKINS-29652] Rerunning failed Delivery Pipeline stage doesn't enable/disable a manual trigger when using the Join plugin

          This is an issue for Join plugin.

          Patrik Boström added a comment - This is an issue for Join plugin.

          Looks like the issue is resolved in Join plugin 1.16.

          Patrik Boström added a comment - Looks like the issue is resolved in Join plugin 1.16.

          I can confirm this is not fixed with Join plugin 1.16 − whether one retries a failed job using Naginator or the Delivery pipeline trigger.

          In jobC, the logs say:
          [Join] Pending does not contain jobC

          Jean-Frédéric added a comment - I can confirm this is not fixed with Join plugin 1.16 − whether one retries a failed job using Naginator or the Delivery pipeline trigger. In jobC, the logs say: [Join] Pending does not contain jobC

          Reopening per above comment.

          Jean-Frédéric added a comment - Reopening per above comment.

          Chris Engel added a comment -

          I can confirm I'm seeing the same issue with Join plugin 1.16

          Chris Engel added a comment - I can confirm I'm seeing the same issue with Join plugin 1.16

          mdonohue added a comment -

          What do you mean by a "retry" build? If you mean manually triggering a build of a project that also is named as a join project, this will result in a lot of ambiguity.

          Using kolos's bug description setup: What if jobA is triggered again after jobC failed the first time, and jobA then causes jobC to succeed. Should this then trigger the previous execution of jobA to continue with the join, as well as the current execution?

          mdonohue added a comment - What do you mean by a "retry" build? If you mean manually triggering a build of a project that also is named as a join project, this will result in a lot of ambiguity. Using kolos's bug description setup: What if jobA is triggered again after jobC failed the first time, and jobA then causes jobC to succeed. Should this then trigger the previous execution of jobA to continue with the join, as well as the current execution?

          Chris Engel added a comment -

          From my perspective when I view the jobs from the pipeline plugin I see that jobC fails I will perform a rebuild of jobC from the pipeline. My expectation is that if it succeeds on the rebuild then it would trigger jobD (through the join in jobA). However if I were to execute jobA a second time then I would consider that a new execution of the build pipeline and wouldn't expect the previous join to continue.

          As it stands our workaround is to re-execute jobA but that means jobB (which had already passed for this build) has to rerun which is not the best use of our resources.

          Chris Engel added a comment - From my perspective when I view the jobs from the pipeline plugin I see that jobC fails I will perform a rebuild of jobC from the pipeline. My expectation is that if it succeeds on the rebuild then it would trigger jobD (through the join in jobA). However if I were to execute jobA a second time then I would consider that a new execution of the build pipeline and wouldn't expect the previous join to continue. As it stands our workaround is to re-execute jobA but that means jobB (which had already passed for this build) has to rerun which is not the best use of our resources.

          mdonohue added a comment -

          I'm having a hard time reducing your expectation to reasonable semantics. By what logic does the join plugin avoid the double build situation I described?

          mdonohue added a comment - I'm having a hard time reducing your expectation to reasonable semantics. By what logic does the join plugin avoid the double build situation I described?

          Chris Engel added a comment - - edited

          I'm not fully understanding what you are asking. Maybe if I give more details about our delivery pipeline that may help.

          It looks like this
          JobAA > JobA | JobB/C | Job D

          JobAA - This is a build job initiated from a Gerrit review trigger, each run of this job creates a unique build output that needs testing
          JobA - This is just a prep job for Jobs B and C used to define the parallel test execution and the join
          JobB/C - These jobs take the output from JobAA and load onto various hardware platforms and run a verification suite, these tests required dedicated HW resources and can run for long periods of time
          JobD - This 'publish' job is configured to run when JobB and JobC complete with a 'Stable' status and reports back verification success to Gerrit

          Because of some instability in our hardware test suite (Job B/C) it can be common for the jobs to fail because of issues unrelated to the current build being tested. If we simply rerun that job it may pass the second time at which point we want JobD to run to report successful test back to Gerrit.

          Unfortunately it appears that when job B or C fail they are still being taken off the list of pending jobs for the join. Which means a subsequent rebuild of the failing B/C job will show the error displayed above '[Join] Pending does not contain jobC'. If jobA is setup to trigger jobD when build is stable, why is a failing build affecting the pending join?

          As mentioned earlier as a workaround we can do a rebuild of jobA which will cause jobB/C to both run and assuming success will trigger jobD. However since jobB/C require limited test HW resources we want to avoid this so we don't delay other instances of this build pipeline that may also be running.

          One other note the join is configured using the 'Trigger parameterized build on other projects' option.

          Chris Engel added a comment - - edited I'm not fully understanding what you are asking. Maybe if I give more details about our delivery pipeline that may help. It looks like this JobAA > JobA | JobB/C | Job D JobAA - This is a build job initiated from a Gerrit review trigger, each run of this job creates a unique build output that needs testing JobA - This is just a prep job for Jobs B and C used to define the parallel test execution and the join JobB/C - These jobs take the output from JobAA and load onto various hardware platforms and run a verification suite, these tests required dedicated HW resources and can run for long periods of time JobD - This 'publish' job is configured to run when JobB and JobC complete with a 'Stable' status and reports back verification success to Gerrit Because of some instability in our hardware test suite (Job B/C) it can be common for the jobs to fail because of issues unrelated to the current build being tested. If we simply rerun that job it may pass the second time at which point we want JobD to run to report successful test back to Gerrit. Unfortunately it appears that when job B or C fail they are still being taken off the list of pending jobs for the join. Which means a subsequent rebuild of the failing B/C job will show the error displayed above ' [Join] Pending does not contain jobC'. If jobA is setup to trigger jobD when build is stable, why is a failing build affecting the pending join? As mentioned earlier as a workaround we can do a rebuild of jobA which will cause jobB/C to both run and assuming success will trigger jobD. However since jobB/C require limited test HW resources we want to avoid this so we don't delay other instances of this build pipeline that may also be running. One other note the join is configured using the 'Trigger parameterized build on other projects' option.

          I encounter the same issue as Chris with my build pipeline − multiplied: jobA triggers 3 parallel builds that join in jobE, which builds an artifact and triggers in 6 parallel testing jobs, which join in one deployjob jobL which triggers 6 other testing jobs which join in a switchover job. When something goes wrong in one of the join, we can restart jobA, jobE or jobL, but that is a waste.

          Regarding the ambiguity question: I do realise it is an issue. I would have thought though that jobs know which particular build they triggered.
          jobA#1 → jobB#1 ✖
          jobA#2 → jobB#2 ✖ Restart jobB#3 → jobC

          here, jobA#1 does not get notified that jobB#3 succeeded as it is not its upstream parent. Or am I missing something ?

          Jean-Frédéric added a comment - I encounter the same issue as Chris with my build pipeline − multiplied: jobA triggers 3 parallel builds that join in jobE, which builds an artifact and triggers in 6 parallel testing jobs, which join in one deployjob jobL which triggers 6 other testing jobs which join in a switchover job. When something goes wrong in one of the join, we can restart jobA, jobE or jobL, but that is a waste. Regarding the ambiguity question: I do realise it is an issue. I would have thought though that jobs know which particular build they triggered. jobA#1 → jobB#1 ✖ jobA#2 → jobB#2 ✖ Restart jobB#3 → jobC here, jobA#1 does not get notified that jobB#3 succeeded as it is not its upstream parent. Or am I missing something ?

          I have the same issue as above, I now run my jobs sequentially as it works, but it is not a very good scenario, I rather fan out and fan in to save time.

          Marcus Sjölin added a comment - I have the same issue as above, I now run my jobs sequentially as it works, but it is not a very good scenario, I rather fan out and fan in to save time.

          I am also experiencing this issue. Really frustrating to have to start the whole pipeline over again when a job fails for a reason unrelated to code change.

          Brian Villanueva added a comment - I am also experiencing this issue. Really frustrating to have to start the whole pipeline over again when a job fails for a reason unrelated to code change.

          Chris Engel added a comment - - edited

          I made a quick hack to resolve the issue. This appears to work for me but I haven't verified it will work in all cases. But with this patch I'm able to retrigger a failed build and on success the expected join job will be run as I would expect. Added as attachment

          Chris Engel added a comment - - edited I made a quick hack to resolve the issue. This appears to work for me but I haven't verified it will work in all cases. But with this patch I'm able to retrigger a failed build and on success the expected join job will be run as I would expect. Added as attachment

          I'm running ver 1.19 and still seeing this issue - fyi.

          Harry Soehalim added a comment - I'm running ver 1.19 and still seeing this issue - fyi.

          mdonohue added a comment -

          When you say "rerun" is that distinguishable from just clicking "Build Now" on the job that failed?

          mdonohue added a comment - When you say "rerun" is that distinguishable from just clicking "Build Now" on the job that failed?

          Chris Engel added a comment -

          Yes, I'm referring to doing a 'Rebuild' on the failed job not a 'Build Now'

          Chris Engel added a comment - Yes, I'm referring to doing a 'Rebuild' on the failed job not a 'Build Now'

          um, do we have any update on this issue? thx

          Harry Soehalim added a comment - um, do we have any update on this issue? thx

          Neil Rhine added a comment -

          I added another patch to the ticket that retries on more cases. This one will allow for both the eventual joined build to run on both failure and success until all upstream jobs succeed.

          Neil Rhine added a comment - I added another patch to the ticket that retries on more cases. This one will allow for both the eventual joined build to run on both failure and success until all upstream jobs succeed.

          Suresh Kumar added a comment -

          I have the same issue and figured out a way implementing same requirement using promotionsbuild-plugin.
          In this case configure Promotion as Trigger JobD when downstream projects JobB & JobC completed in JobA

          Suresh Kumar added a comment - I have the same issue and figured out a way implementing same requirement using promotionsbuild-plugin. In this case configure Promotion as Trigger JobD when downstream projects JobB & JobC completed in JobA

          mdonohue added a comment -

          I'm looking at both patches that are attached here - it seems the test cases that break as a result of these changes are just commented out. Can you explain why they are no longer needed? Also, can you add a test case to exercise the path this is creating?

          mdonohue added a comment - I'm looking at both patches that are attached here - it seems the test cases that break as a result of these changes are just commented out. Can you explain why they are no longer needed? Also, can you add a test case to exercise the path this is creating?

            Unassigned Unassigned
            kolos kolos
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: