We have 'build' and 'test' jobs. The build job builds several variants of the SW and then triggers several downstream jobs to test all the variants in parallel. The test jobs have a naginator step attached, which checks for specific REs in the console output, which indicate an environment problem (instead of an actual test failure). If such an RE is found, a retry is triggered up to three times. If the build and all triggered tests pass, the build gets promoted.
Now I found this sequence of test jobs:
- Test #4793 runs, Aug 22, 2017 3:22:20 PM, triggered by build #6133
and finally fails
- Test #4794 (Aug 22, 2017 3:31:50 PM), triggered by build #6134
(which got triggered by a code change)
- test #4795 "Started by Naginator after the failure of build #4793"
finally failed too
Huh ? It seems to me, that naginator is triggering a retry of a test, although there is a separately triggered test before, which succeeded.
If my interpretation is correct, it means
- At least an unneeded test run
- In the worst case the old build (#6133) might now trigger a promotion, which will overwrite the one from the newer build (#6134)
Didn't happen here, because the retry failed too - so I can't show that.
Seems to me, naginator should NOT trigger a retry, if any upstream job is running or queued at the time of the failure.