-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
We use the estimated build time mostly to determine 'by when will we have a result?'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate (disregarding jobs where there are significant differences between builds due to e.g. parameters).
Unfortunately, this commit changed computation of build duration estimates:
https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487
Before: Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.
Now: Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.
So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. If there are no successful builds, use the build duration until previous build failures as estimate.
Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.
Taking into account failing builds makes the estimate completely meaningless. It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).
For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.
Build failures should be considered exceptional conditions. For non-fatal issues, builds can be marked unstable. So failures should not be used in estimating build durations, as some unreliable component used in a build will completely distort build durations.
Surprisingly, the change only considers completed builds and excludes aborted builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!
I'd rather have it use the average of the last N (e.g. 3 or 5) successful builds, and if there are fewer than M (e.g. 1, or 3), just give no estimate at all. For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration – this would make it explicit that Jenkins has no confidence in providing an estimate.
- is related to
-
JENKINS-63174 Jenkins stage view should only calculate the time for build success
-
- Open
-
- relates to
-
JENKINS-49425 Average stage duration should not use failed builds on calculation
-
- Open
-
> which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense.
I don't see this. Of course it also would make sense, if the 'usual' outcome of a build is a failure.
> Are there situations the new estimates are actually more helpful to users?
One situation I sometimes see myself is, when I try to set up a new job which is failing several times until I 1st manage to get it successful. In that case an estimation how long the build will take is of some use.
Also builds often tend to fail for several times in a row until they are fixed. In that case it's IMO also useful to know how long the build will take.
> Estimates that are deliberately far off any actual build durations OTOH I have no need for.
They are not deliberately far off. Quite in contrary, they try to produce the best estimate given the available data.
As said, if you want to provide an algorithm which would create more accurate estimations - what is inherently difficult as the usage patterns vary so much - I would be fine to include it.