Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21099

Don't give useless build time estimates by considering failed builds' durations

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None

      We use the estimated build time mostly to determine 'by when will we have a result?'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate (disregarding jobs where there are significant differences between builds due to e.g. parameters).

      Unfortunately, this commit changed computation of build duration estimates:
      https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

      Before: Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

      Now: Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

      So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. If there are no successful builds, use the build duration until previous build failures as estimate.

      Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.


      Taking into account failing builds makes the estimate completely meaningless. It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

      For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.


      Build failures should be considered exceptional conditions. For non-fatal issues, builds can be marked unstable. So failures should not be used in estimating build durations, as some unreliable component used in a build will completely distort build durations.

      Surprisingly, the change only considers completed builds and excludes aborted builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!

      I'd rather have it use the average of the last N (e.g. 3 or 5) successful builds, and if there are fewer than M (e.g. 1, or 3), just give no estimate at all. For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration – this would make it explicit that Jenkins has no confidence in providing an estimate.

          [JENKINS-21099] Don't give useless build time estimates by considering failed builds' durations

          Well, given that last 5 builds failed it is likely that the next build will fail as well. And if the time for a failed and successful build differs dramatically, estimating using successful builds is very likely to be incorrect.

          To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user.

          I believe build times forms clusters (as you say generally two for successful and failed builds). Taking build times from both cluster and calculating arithmetical average is doomed to be wrong. I would prefer reading constant number of builds, finding largest cluster and (since we have to deliver single value) return an average of such build times.

          Oliver Gondža added a comment - Well, given that last 5 builds failed it is likely that the next build will fail as well. And if the time for a failed and successful build differs dramatically, estimating using successful builds is very likely to be incorrect. To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user. I believe build times forms clusters (as you say generally two for successful and failed builds). Taking build times from both cluster and calculating arithmetical average is doomed to be wrong. I would prefer reading constant number of builds, finding largest cluster and (since we have to deliver single value) return an average of such build times.

          Daniel Beck added a comment -

          Well, given that last 5 builds failed it is likely that the next build will fail as well

          If the build was manually started, the expectation is that the user fixed whatever was wrong.

          Taking build times from both cluster and calculating arithmetical average is doomed to be wrong

          Right. That's why I'd just exclude the 'failing' cluster.

          It is not clear to me what the use case of the current behavior is. The previous behavior provided the duration of recent successful builds, which should most of the time be (close to) the longest possible build duration. Failures are likely to happen more quickly. Users could rely on the estimate to be close to the 'worst case' for waiting time and therefore being a safe estimate ("When will I have the artifacts?", "When will I be able to reboot Jenkins?", etc.).

          The current behavior answers no question anyone could have, and it's not even obvious to users what the estimate is based on. Is it the latest three successful builds? No. Is it the latest three completed builds? No. It will, in most cases, result in a too short duration estimate, which can be really irritating.

          To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user.

          Except that this isn't how it works. Don't get me wrong, just taking the latest three builds regardless of success, would be just as useless, but it'd at least be obvious how estimation works. Which it simply isn't right now.

          Also, it doesn't consider that the project may have been reconfigured since the latest failure. Or that there were changes in SCM. And if the build was manually started, there's a good chance the issue was fixed.

          Jenkins just doesn't have enough information to do a sophisticated estimate considering all these factors. In fact, the whole change was done to reduce the data required for an estimate. Any assumptions on the probability of failures are flawed.


          Therefore Jenkins should simply adopt a behavior similar to the previous one: Estimate duration assuming the build is successful. And if there aren't enough successful builds to do that, it should just not provide an estimate at all. Forcing users to figure it out themselves (using build trend and possibly parameters and job config) if Jenkins cannot provide a useful estimate isn't a bad thing. At least it wouldn't misleading, like the current behavior.

          Daniel Beck added a comment - Well, given that last 5 builds failed it is likely that the next build will fail as well If the build was manually started, the expectation is that the user fixed whatever was wrong. Taking build times from both cluster and calculating arithmetical average is doomed to be wrong Right. That's why I'd just exclude the 'failing' cluster. It is not clear to me what the use case of the current behavior is. The previous behavior provided the duration of recent successful builds, which should most of the time be (close to) the longest possible build duration. Failures are likely to happen more quickly. Users could rely on the estimate to be close to the 'worst case' for waiting time and therefore being a safe estimate ("When will I have the artifacts?", "When will I be able to reboot Jenkins?", etc.). The current behavior answers no question anyone could have, and it's not even obvious to users what the estimate is based on. Is it the latest three successful builds? No. Is it the latest three completed builds? No. It will , in most cases, result in a too short duration estimate, which can be really irritating. To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user. Except that this isn't how it works. Don't get me wrong, just taking the latest three builds regardless of success, would be just as useless, but it'd at least be obvious how estimation works. Which it simply isn't right now. Also, it doesn't consider that the project may have been reconfigured since the latest failure. Or that there were changes in SCM. And if the build was manually started, there's a good chance the issue was fixed. Jenkins just doesn't have enough information to do a sophisticated estimate considering all these factors. In fact, the whole change was done to reduce the data required for an estimate. Any assumptions on the probability of failures are flawed. Therefore Jenkins should simply adopt a behavior similar to the previous one: Estimate duration assuming the build is successful. And if there aren't enough successful builds to do that, it should just not provide an estimate at all. Forcing users to figure it out themselves (using build trend and possibly parameters and job config) if Jenkins cannot provide a useful estimate isn't a bad thing. At least it wouldn't misleading, like the current behavior.

          kutzi added a comment -

          kutzi added a comment - I've already commented to some degree at https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487#commitcomment-4893865

          kutzi added a comment -

          Extending on that:

          • I agree that failed builds form (in most but not all cases) an exceptional case
          • I do not agree that using failed builds is 'completely meaningless' - as Oliver pointed already out the next build may be likely to fail again, so the time until it will complete is of some value
          • these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish

          So in conclusion: I would be fine to apply a lesser weight to failed builds, so that in case there are failed and successful the successful builds will get more weight in the estimate.
          I don't think that leaving out failing builds completely is a good idea. There may be jobs which are failing all the time and not giving any estimate at all for them when I could do better is not feasible IMHO

          kutzi added a comment - Extending on that: I agree that failed builds form (in most but not all cases) an exceptional case I do not agree that using failed builds is 'completely meaningless' - as Oliver pointed already out the next build may be likely to fail again, so the time until it will complete is of some value these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish So in conclusion: I would be fine to apply a lesser weight to failed builds, so that in case there are failed and successful the successful builds will get more weight in the estimate. I don't think that leaving out failing builds completely is a good idea. There may be jobs which are failing all the time and not giving any estimate at all for them when I could do better is not feasible IMHO

          Daniel Beck added a comment -

          these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish

          The only place I could find was Executor#isLikelyStuck() which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense. Jenkins telling me my build is "likely stuck" because I fixed it so it doesn't fail anymore isn't helpful.


          Are there situations the new estimates are actually more helpful to users? I understand that the estimates are in some ways 'more accurate' as they don't exclude so many builds – but I don't see when they'd be 'more helpful' than the previous ones.

          Maybe it's because we copy jobs whenever we're creating release branches, but I'm used to seeing the indeterminate progress indicator and it doesn't bother me. Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          Daniel Beck added a comment - these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish The only place I could find was Executor#isLikelyStuck() which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations ), otherwise this feature wouldn't make a lot of sense. Jenkins telling me my build is "likely stuck" because I fixed it so it doesn't fail anymore isn't helpful. Are there situations the new estimates are actually more helpful to users? I understand that the estimates are in some ways 'more accurate' as they don't exclude so many builds – but I don't see when they'd be 'more helpful' than the previous ones. Maybe it's because we copy jobs whenever we're creating release branches, but I'm used to seeing the indeterminate progress indicator and it doesn't bother me. Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          kutzi added a comment - - edited

          > which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense.

          I don't see this. Of course it also would make sense, if the 'usual' outcome of a build is a failure.

          > Are there situations the new estimates are actually more helpful to users?

          One situation I sometimes see myself is, when I try to set up a new job which is failing several times until I 1st manage to get it successful. In that case an estimation how long the build will take is of some use.

          Also builds often tend to fail for several times in a row until they are fixed. In that case it's IMO also useful to know how long the build will take.

          > Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          They are not deliberately far off. Quite in contrary, they try to produce the best estimate given the available data.

          As said, if you want to provide an algorithm which would create more accurate estimations - what is inherently difficult as the usage patterns vary so much - I would be fine to include it.

          kutzi added a comment - - edited > which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense. I don't see this. Of course it also would make sense, if the 'usual' outcome of a build is a failure. > Are there situations the new estimates are actually more helpful to users? One situation I sometimes see myself is, when I try to set up a new job which is failing several times until I 1st manage to get it successful. In that case an estimation how long the build will take is of some use. Also builds often tend to fail for several times in a row until they are fixed. In that case it's IMO also useful to know how long the build will take. > Estimates that are deliberately far off any actual build durations OTOH I have no need for. They are not deliberately far off. Quite in contrary, they try to produce the best estimate given the available data. As said, if you want to provide an algorithm which would create more accurate estimations - what is inherently difficult as the usage patterns vary so much - I would be fine to include it.

          kutzi added a comment -

          BTW: https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Run.java#L2228
          "Returns the estimated duration for this run if it is currently running."

          No word whether this only applies to successful builds!

          kutzi added a comment - BTW: https://github.com/jenkinsci/jenkins/blob/master/core/src/main/java/hudson/model/Run.java#L2228 "Returns the estimated duration for this run if it is currently running." No word whether this only applies to successful builds!

          Jo Shields added a comment -

          Using failed builds is meaningless. I don't create jobs designed to fail, I create jobs designed to succeed. Failures are always outliers, by definition, even if the last six builds have been failures. It is entirely useless to me and my users to be given a 20 minute estimate on a 2-hour build.

          Jo Shields added a comment - Using failed builds is meaningless. I don't create jobs designed to fail, I create jobs designed to succeed. Failures are always outliers, by definition, even if the last six builds have been failures. It is entirely useless to me and my users to be given a 20 minute estimate on a 2-hour build.

          Kenneth Baltrinic added a comment - - edited

          I am generally in agreement with the perspective that failures are outliers and meaningless. I believe that the algorithm should exclude failed builds. However I wonder if this is something that can be made configurable (i.e. include failed builds: yes/no, include unstable builds: yes/no, average the past N qualifying builds). Alternately could a plugin be created that would override the default behavior so that people can have some choice in what algorithm to use by installing the desired plugin?

          Kenneth Baltrinic added a comment - - edited I am generally in agreement with the perspective that failures are outliers and meaningless. I believe that the algorithm should exclude failed builds. However I wonder if this is something that can be made configurable (i.e. include failed builds: yes/no, include unstable builds: yes/no, average the past N qualifying builds). Alternately could a plugin be created that would override the default behavior so that people can have some choice in what algorithm to use by installing the desired plugin?

          Daniel Beck added a comment -

          https://jenkins.ci.cloudbees.com/job/core/job/acceptance-test-harness-stable/buildTimeTrend

          Builds #649, #650 have been running for just over an hour, and Jenkins says they are half done.

          tfennelly is rightfully confused, because all successful (well, unstable) runs take 6-7 hours.

          15:58 Tom Fennelly: not much more I can say until the tests run and we see the results
          15:59 Daniel Beck: six hours from now?
          15:58 Tom Fennelly: well the build on https://github.com/jenkinsci/acceptance-test-harness/pull/88 says less than 2 hours
          15:59 Tom Fennelly: estimated remaining is 1 hour 8 mins

          Daniel Beck added a comment - https://jenkins.ci.cloudbees.com/job/core/job/acceptance-test-harness-stable/buildTimeTrend Builds #649, #650 have been running for just over an hour, and Jenkins says they are half done. tfennelly is rightfully confused, because all successful (well, unstable) runs take 6-7 hours. 15:58 Tom Fennelly: not much more I can say until the tests run and we see the results 15:59 Daniel Beck: six hours from now? 15:58 Tom Fennelly: well the build on https://github.com/jenkinsci/acceptance-test-harness/pull/88 says less than 2 hours 15:59 Tom Fennelly: estimated remaining is 1 hour 8 mins

          Sverre Moe added a comment -

          An very old issue. Is there anyone who could do anything to implement this? We would like failed build to be excluded as well from the average calculation.

          Perhaps a way to configure this would be preferable, so that it would work as before, but allow others to add excludes criteria.

          Sverre Moe added a comment - An very old issue. Is there anyone who could do anything to implement this? We would like failed build to be excluded as well from the average calculation. Perhaps a way to configure this would be preferable, so that it would work as before, but allow others to add excludes criteria.

          +1 for this.
          Also related, perhaps needs a separate issue. For pipeline builds, I would like to see estimates take into account what stage the build is in. We use a "queue" stage at the start of a build for the time spent waiting for a machine. Queue time can range from seconds to hours and that really throws off the estimates for the remainder of the build. Even though the remaining stages are generally consistent for execution times.

          Adam Brousseau added a comment - +1 for this. Also related, perhaps needs a separate issue. For pipeline builds, I would like to see estimates take into account what stage the build is in. We use a "queue" stage at the start of a build for the time spent waiting for a machine. Queue time can range from seconds to hours and that really throws off the estimates for the remainder of the build. Even though the remaining stages are generally consistent for execution times.

          Hariharan added a comment -

          +1

          This would helps us estimate and automatically setup proper timeouts for all jobs based on successful runs.

          Hariharan added a comment - +1 This would helps us estimate and automatically setup proper timeouts for all jobs based on successful runs.

          This would also be useful in the calculation of progress bar color blue/red.  Often a failed build will cause the next build to show up as "stuck" with a red progress bar.

          brendan waters added a comment - This would also be useful in the calculation of progress bar color blue/red.  Often a failed build will cause the next build to show up as "stuck" with a red progress bar.

          Dmitriy added a comment -

          I want to share a screen shot to make the issue more illustrative. Mean times in the header are much smaller because of several failed runs compared to the times of successful builds below

          Dmitriy added a comment - I want to share a screen shot to make the issue more illustrative. Mean times in the header are much smaller because of several failed runs compared to the times of successful builds below

            Unassigned Unassigned
            danielbeck Daniel Beck
            Votes:
            20 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: