-
New Feature
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins 2.71 (directly started)
Naginator 1.17.2
Running on Linux using JDK 1.8.0_141
We have 'build' and 'test' jobs. The build job builds several variants of the SW and then triggers several downstream jobs to test all the variants in parallel. The test jobs have a naginator step attached, which checks for specific REs in the console output, which indicate an environment problem (instead of an actual test failure). If such an RE is found, a retry is triggered up to three times. If the build and all triggered tests pass, the build gets promoted.
Now I found this sequence of test jobs:
- Test #4793 runs, Aug 22, 2017 3:22:20 PM, triggered by build #6133
and finally fails - Test #4794 (Aug 22, 2017 3:31:50 PM), triggered by build #6134
(which got triggered by a code change)
PASSES - test #4795 "Started by Naginator after the failure of build #4793"
finally failed too
Huh ? It seems to me, that naginator is triggering a retry of a test, although there is a separately triggered test before, which succeeded.
If my interpretation is correct, it means
- At least an unneeded test run
- In the worst case the old build (#6133) might now trigger a promotion, which will overwrite the one from the newer build (#6134)
Didn't happen here, because the retry failed too - so I can't show that.
Seems to me, naginator should NOT trigger a retry, if any upstream job is running or queued at the time of the failure.
Update to the report:
Unfortunately we've seen today the issue I've speculated about above. The behavior of the naginator had led to a "downdating" of the promoted version to an older one. The newer one was already promoted by a successful new build/test pair; after that a retest triggered by naginator made an older build qualify for promotion too, which overwrote the more recent promoted version.
For the moment we're trying to fix this by a modification of your script checking whether a build with all tests passing is valid and needed for promotion. (We have other reasons not to promote something which qualified, e.g. changes "we" (== our System Component) wants to test, but is no visible change on system level)
Still I would like to see an approach from naginator on this, even if I'm not sure if this behavior would need to be configurable.