Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21099

Don't give useless build time estimates by considering failed builds' durations

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • core
    • None

      We use the estimated build time mostly to determine 'by when will we have a result?'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate (disregarding jobs where there are significant differences between builds due to e.g. parameters).

      Unfortunately, this commit changed computation of build duration estimates:
      https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

      Before: Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

      Now: Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

      So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. If there are no successful builds, use the build duration until previous build failures as estimate.

      Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.


      Taking into account failing builds makes the estimate completely meaningless. It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

      For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.


      Build failures should be considered exceptional conditions. For non-fatal issues, builds can be marked unstable. So failures should not be used in estimating build durations, as some unreliable component used in a build will completely distort build durations.

      Surprisingly, the change only considers completed builds and excludes aborted builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!

      I'd rather have it use the average of the last N (e.g. 3 or 5) successful builds, and if there are fewer than M (e.g. 1, or 3), just give no estimate at all. For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration – this would make it explicit that Jenkins has no confidence in providing an estimate.

          [JENKINS-21099] Don't give useless build time estimates by considering failed builds' durations

          Daniel Beck created issue -
          Daniel Beck made changes -
          Description Original: Disregarding jobs where there are significant differences (e.g. parameters causing every other build to do twice the work) between builds, we use the estimated build time mostly to determine '_by when do we have a result?_'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate based on how the UI works.

          Unfortunately, this commit changed computation of build duration estimates:
          https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

          *Before:* Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

          *Now:* Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

          So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. *If there are no successful builds, use the build duration until previous build failures as estimate*.

          ----

          *Taking into account failing builds makes the estimate completely meaningless.* It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

          For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.

          ----

          *Build failures should be considered exceptional conditions.* For non-fatal issues, builds can be marked unstable. *So failures should not be used in estimating build durations,* as some unreliable component used in a build will completely distort build durations.

          I'd rather have it use the average of the last N (e.g. 3 or 5) _successful_ builds, and if there are fewer than M (e.g. 1, or 3), *just give no estimate at all.* For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration -- this would make it *explicit that Jenkins has no confidence in providing an estimate.*
          New: Disregarding jobs where there are significant differences (e.g. parameters causing every other build to do twice the work) between builds, we use the estimated build time mostly to determine '_by when do we have a result?_'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate based on how the UI works.

          Unfortunately, this commit changed computation of build duration estimates:
          https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

          *Before:* Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

          *Now:* Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

          So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. *If there are no successful builds, use the build duration until previous build failures as estimate*.

          Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.

          ----

          *Taking into account failing builds makes the estimate completely meaningless.* It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

          For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.

          ----

          *Build failures should be considered exceptional conditions.* For non-fatal issues, builds can be marked unstable. *So failures should not be used in estimating build durations,* as some unreliable component used in a build will completely distort build durations.

          Surprisingly, the change only considers _completed_ builds and excludes _aborted_ builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!

          I'd rather have it use the average of the last N (e.g. 3 or 5) _successful_ builds, and if there are fewer than M (e.g. 1, or 3), *just give no estimate at all.* For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration -- this would make it *explicit that Jenkins has no confidence in providing an estimate.*
          Summary Original: Don't give useless build time estimates New: Don't give useless build time estimates by considering failed builds' durations
          Daniel Beck made changes -
          Description Original: Disregarding jobs where there are significant differences (e.g. parameters causing every other build to do twice the work) between builds, we use the estimated build time mostly to determine '_by when do we have a result?_'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate based on how the UI works.

          Unfortunately, this commit changed computation of build duration estimates:
          https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

          *Before:* Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

          *Now:* Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

          So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. *If there are no successful builds, use the build duration until previous build failures as estimate*.

          Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.

          ----

          *Taking into account failing builds makes the estimate completely meaningless.* It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

          For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.

          ----

          *Build failures should be considered exceptional conditions.* For non-fatal issues, builds can be marked unstable. *So failures should not be used in estimating build durations,* as some unreliable component used in a build will completely distort build durations.

          Surprisingly, the change only considers _completed_ builds and excludes _aborted_ builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!

          I'd rather have it use the average of the last N (e.g. 3 or 5) _successful_ builds, and if there are fewer than M (e.g. 1, or 3), *just give no estimate at all.* For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration -- this would make it *explicit that Jenkins has no confidence in providing an estimate.*
          New: We use the estimated build time mostly to determine '_by when will we have a result?_'. There is some (±5-10%, the longer the duration the less variation) variation between durations of successful builds, but it's still good enough for an estimate (disregarding jobs where there are significant differences between builds due to e.g. parameters).

          Unfortunately, this commit changed computation of build duration estimates:
          https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487

          *Before:* Use the average of the last up to three successful builds. Use fewer if no three successful builds exist. If there are no successful builds, don't attempt to provide an estimate.

          *Now:* Only look at the last six builds. Use the average of the last up to three successful builds of those. If there are fewer than three successful builds, also use the latest failed (but not aborted) builds to get three builds to base the estimate on. If three or fewer completed builds exist, use those.

          So, if there's only one successful build among the last six builds, it will also consider the latest two failed builds for the estimate. *If there are no successful builds, use the build duration until previous build failures as estimate*.

          Not going far back in the job history is a good idea to ensure computation is quick. Filling up the list of candidate builds with failed builds OTOH, is not.

          ----

          *Taking into account failing builds makes the estimate completely meaningless.* It doesn't tell you anything, because the data it uses is all over the place (and it uses too few data points for more sophisticated estimates).

          For a job that often fails after ~2-3 hours but takes 10 hours to complete successfully, this will often reduce the estimate to just ~5 hours. Even just considering one failed and two successful builds will result in an estimate of 7.5, which is way off the actual duration to get a successful build.

          ----

          *Build failures should be considered exceptional conditions.* For non-fatal issues, builds can be marked unstable. *So failures should not be used in estimating build durations,* as some unreliable component used in a build will completely distort build durations.

          Surprisingly, the change only considers _completed_ builds and excludes _aborted_ builds from the estimate. This respects the status of aborted builds as exceptional and unfit for build estimates (unless caused by the Build Timeout Plugin…). But failed builds are no more useful to build an estimate on!

          I'd rather have it use the average of the last N (e.g. 3 or 5) _successful_ builds, and if there are fewer than M (e.g. 1, or 3), *just give no estimate at all.* For jobs that also have wildly varying durations for successful builds, it might actually be useful to not provide an estimated duration -- this would make it *explicit that Jenkins has no confidence in providing an estimate.*

          Well, given that last 5 builds failed it is likely that the next build will fail as well. And if the time for a failed and successful build differs dramatically, estimating using successful builds is very likely to be incorrect.

          To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user.

          I believe build times forms clusters (as you say generally two for successful and failed builds). Taking build times from both cluster and calculating arithmetical average is doomed to be wrong. I would prefer reading constant number of builds, finding largest cluster and (since we have to deliver single value) return an average of such build times.

          Oliver Gondža added a comment - Well, given that last 5 builds failed it is likely that the next build will fail as well. And if the time for a failed and successful build differs dramatically, estimating using successful builds is very likely to be incorrect. To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user. I believe build times forms clusters (as you say generally two for successful and failed builds). Taking build times from both cluster and calculating arithmetical average is doomed to be wrong. I would prefer reading constant number of builds, finding largest cluster and (since we have to deliver single value) return an average of such build times.

          Daniel Beck added a comment -

          Well, given that last 5 builds failed it is likely that the next build will fail as well

          If the build was manually started, the expectation is that the user fixed whatever was wrong.

          Taking build times from both cluster and calculating arithmetical average is doomed to be wrong

          Right. That's why I'd just exclude the 'failing' cluster.

          It is not clear to me what the use case of the current behavior is. The previous behavior provided the duration of recent successful builds, which should most of the time be (close to) the longest possible build duration. Failures are likely to happen more quickly. Users could rely on the estimate to be close to the 'worst case' for waiting time and therefore being a safe estimate ("When will I have the artifacts?", "When will I be able to reboot Jenkins?", etc.).

          The current behavior answers no question anyone could have, and it's not even obvious to users what the estimate is based on. Is it the latest three successful builds? No. Is it the latest three completed builds? No. It will, in most cases, result in a too short duration estimate, which can be really irritating.

          To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user.

          Except that this isn't how it works. Don't get me wrong, just taking the latest three builds regardless of success, would be just as useless, but it'd at least be obvious how estimation works. Which it simply isn't right now.

          Also, it doesn't consider that the project may have been reconfigured since the latest failure. Or that there were changes in SCM. And if the build was manually started, there's a good chance the issue was fixed.

          Jenkins just doesn't have enough information to do a sophisticated estimate considering all these factors. In fact, the whole change was done to reduce the data required for an estimate. Any assumptions on the probability of failures are flawed.


          Therefore Jenkins should simply adopt a behavior similar to the previous one: Estimate duration assuming the build is successful. And if there aren't enough successful builds to do that, it should just not provide an estimate at all. Forcing users to figure it out themselves (using build trend and possibly parameters and job config) if Jenkins cannot provide a useful estimate isn't a bad thing. At least it wouldn't misleading, like the current behavior.

          Daniel Beck added a comment - Well, given that last 5 builds failed it is likely that the next build will fail as well If the build was manually started, the expectation is that the user fixed whatever was wrong. Taking build times from both cluster and calculating arithmetical average is doomed to be wrong Right. That's why I'd just exclude the 'failing' cluster. It is not clear to me what the use case of the current behavior is. The previous behavior provided the duration of recent successful builds, which should most of the time be (close to) the longest possible build duration. Failures are likely to happen more quickly. Users could rely on the estimate to be close to the 'worst case' for waiting time and therefore being a safe estimate ("When will I have the artifacts?", "When will I be able to reboot Jenkins?", etc.). The current behavior answers no question anyone could have, and it's not even obvious to users what the estimate is based on. Is it the latest three successful builds? No. Is it the latest three completed builds? No. It will , in most cases, result in a too short duration estimate, which can be really irritating. To me it makes sense to take the probability the next build will succeed into account as well as exclude builds interrupted by user. Except that this isn't how it works. Don't get me wrong, just taking the latest three builds regardless of success, would be just as useless, but it'd at least be obvious how estimation works. Which it simply isn't right now. Also, it doesn't consider that the project may have been reconfigured since the latest failure. Or that there were changes in SCM. And if the build was manually started, there's a good chance the issue was fixed. Jenkins just doesn't have enough information to do a sophisticated estimate considering all these factors. In fact, the whole change was done to reduce the data required for an estimate. Any assumptions on the probability of failures are flawed. Therefore Jenkins should simply adopt a behavior similar to the previous one: Estimate duration assuming the build is successful. And if there aren't enough successful builds to do that, it should just not provide an estimate at all. Forcing users to figure it out themselves (using build trend and possibly parameters and job config) if Jenkins cannot provide a useful estimate isn't a bad thing. At least it wouldn't misleading, like the current behavior.

          kutzi added a comment -

          kutzi added a comment - I've already commented to some degree at https://github.com/jenkinsci/jenkins/commit/04d85eb476bdc57eb7ac3c2bb34e91be5b55c487#commitcomment-4893865

          kutzi added a comment -

          Extending on that:

          • I agree that failed builds form (in most but not all cases) an exceptional case
          • I do not agree that using failed builds is 'completely meaningless' - as Oliver pointed already out the next build may be likely to fail again, so the time until it will complete is of some value
          • these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish

          So in conclusion: I would be fine to apply a lesser weight to failed builds, so that in case there are failed and successful the successful builds will get more weight in the estimate.
          I don't think that leaving out failing builds completely is a good idea. There may be jobs which are failing all the time and not giving any estimate at all for them when I could do better is not feasible IMHO

          kutzi added a comment - Extending on that: I agree that failed builds form (in most but not all cases) an exceptional case I do not agree that using failed builds is 'completely meaningless' - as Oliver pointed already out the next build may be likely to fail again, so the time until it will complete is of some value these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish So in conclusion: I would be fine to apply a lesser weight to failed builds, so that in case there are failed and successful the successful builds will get more weight in the estimate. I don't think that leaving out failing builds completely is a good idea. There may be jobs which are failing all the time and not giving any estimate at all for them when I could do better is not feasible IMHO
          kutzi made changes -
          Priority Original: Major [ 3 ] New: Minor [ 4 ]

          Daniel Beck added a comment -

          these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish

          The only place I could find was Executor#isLikelyStuck() which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense. Jenkins telling me my build is "likely stuck" because I fixed it so it doesn't fail anymore isn't helpful.


          Are there situations the new estimates are actually more helpful to users? I understand that the estimates are in some ways 'more accurate' as they don't exclude so many builds – but I don't see when they'd be 'more helpful' than the previous ones.

          Maybe it's because we copy jobs whenever we're creating release branches, but I'm used to seeing the indeterminate progress indicator and it doesn't bother me. Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          Daniel Beck added a comment - these are only estimates for the build duration and at no place - AFAIK - is specified that this only applies only to successful (and unstable) builds and not to failed builds. It's just the time until the current build will finish The only place I could find was Executor#isLikelyStuck() which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations ), otherwise this feature wouldn't make a lot of sense. Jenkins telling me my build is "likely stuck" because I fixed it so it doesn't fail anymore isn't helpful. Are there situations the new estimates are actually more helpful to users? I understand that the estimates are in some ways 'more accurate' as they don't exclude so many builds – but I don't see when they'd be 'more helpful' than the previous ones. Maybe it's because we copy jobs whenever we're creating release branches, but I'm used to seeing the indeterminate progress indicator and it doesn't bother me. Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          kutzi added a comment - - edited

          > which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense.

          I don't see this. Of course it also would make sense, if the 'usual' outcome of a build is a failure.

          > Are there situations the new estimates are actually more helpful to users?

          One situation I sometimes see myself is, when I try to set up a new job which is failing several times until I 1st manage to get it successful. In that case an estimation how long the build will take is of some use.

          Also builds often tend to fail for several times in a row until they are fixed. In that case it's IMO also useful to know how long the build will take.

          > Estimates that are deliberately far off any actual build durations OTOH I have no need for.

          They are not deliberately far off. Quite in contrary, they try to produce the best estimate given the available data.

          As said, if you want to provide an algorithm which would create more accurate estimations - what is inherently difficult as the usage patterns vary so much - I would be fine to include it.

          kutzi added a comment - - edited > which seems to have the implied requirement that the estimated time uses successful builds (or rather builds with somewhat homogeneous durations), otherwise this feature wouldn't make a lot of sense. I don't see this. Of course it also would make sense, if the 'usual' outcome of a build is a failure. > Are there situations the new estimates are actually more helpful to users? One situation I sometimes see myself is, when I try to set up a new job which is failing several times until I 1st manage to get it successful. In that case an estimation how long the build will take is of some use. Also builds often tend to fail for several times in a row until they are fixed. In that case it's IMO also useful to know how long the build will take. > Estimates that are deliberately far off any actual build durations OTOH I have no need for. They are not deliberately far off. Quite in contrary, they try to produce the best estimate given the available data. As said, if you want to provide an algorithm which would create more accurate estimations - what is inherently difficult as the usage patterns vary so much - I would be fine to include it.

            Unassigned Unassigned
            danielbeck Daniel Beck
            Votes:
            20 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: