Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-21605

Logging all UpstreamCause's floods Jenkins in large setups

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      This bug is the same as https://issues.jenkins-ci.org/browse/JENKINS-15747 but it happens for us with Jenkins version 1.548 again (while it was working in the mean time).

      In cases where there are multiple paths though the ob dependencies logging all upstream causes generates a huge build log.
      The example job linked with this ticket has a log which is nearly 110MB in size because of that.

      In the previous ticket the issue has been addressed by "not only cap total number of transitive upstream causes but also avoid redundantly storing information about upstream causes listed elsewhere".

        Attachments

          Issue Links

            Activity

            Hide
            dthomas Dirk Thomas added a comment - - edited

            I will describe two concrete cases to have a baseline for the further discussion.

            Case (A) ( https://gist.github.com/dirk-thomas/9bbd47397e48ef3ceef8 ):

            A job "leaf" has only a single upstream dependency on "before_leaf".
            And "before leaf" has many (in this example 40: "01" to "40") upstream dependencies.
            Each upstream dependency "N" has "N-1" as its upstream dependency.

            The "before_leaf" job will list all 40 upstream causes.
            Each upstream cause on its own is limited to a recursive depth of 10 (according to `MAX_DEPTH`).

            The "leaf" job has a single upstream cause ("before_leaf").
            The `Set<String> traversed` in the `UpstreamClause` prevents listing repeated upstream causes of the single upstream cause.

            Case (B) ( https://gist.github.com/dirk-thomas/37febb42abeb8631f946 ):

            A job "leaf" has only a single upstream dependency on "before_leaf".
            And "before leaf" has several (in this example 5: "a15" to "e15") upstream dependencies.
            Each upstream dependency "xN" has "xN-1" as its upstream dependency.

            Recursive upstream causes are usually "terminated" by a `DeeplyNestedUpstreamCause` when `MAX_DEPTH` is reached.
            `MAX_LEAF` prevents adding a `DeeplyNestedUpstreamCause` at the end of the recursion once the number of different causes has reached 25 addresses (`MAX_LEAF`).
            This can be seen in the "leaf" of of case (B).
            (I don't understand why skipping the `DeeplyNestedUpstreamCause` when aborting the recursion makes a big different though - it does not affect the log size significantly and contains valuable information (that the recursion has been aborted)).

            Based on these I identified two problems.

            Problem (A): limitation of performing the thresholds in the `UpstreamCause`:

            The "before_leaf" job of case (A) has 40 upstream causes.
            While each on its own does some logic for limiting the information each separate `UpstreamCause` instance does not know about its siblings.
            Therefore it can not adjust the level of information shown in the case that there are many siblings.
            This is not "fixable" in the `UpstreamCause` class itself.
            This would require some changes in the code handling the upstream causes to pass in information e.g. the number of siblings (which arguably a `UpstreamCause` should not need to know about).
            (The problem is the same for the "before_leaf" job of case (B).)

            Problem (B): the depth threshold is independent from the number of upstream causes:

            The "leaf" job of case (B) has only a single upstream cause.
            But this upstream cause outputs every upstream cause up to the recursion limit.
            This results in N x 10 upstream causes where N is the number of upstream causes of the single upstream cause of the job.
            A "combined" limit would probably make much more sense in this case.
            E.g. limit each recursion to not 10 but potentially less if the number of sibling upstream causes on the first level increases.

            (I am unable to provide a Java unit test since I lack the experience programming in Java but the Groovy examples should be verify specific and hopefully easy to transfer into a unit test by an experienced Jenkins/Java programmer.)

            Show
            dthomas Dirk Thomas added a comment - - edited I will describe two concrete cases to have a baseline for the further discussion. Case (A) ( https://gist.github.com/dirk-thomas/9bbd47397e48ef3ceef8 ): A job "leaf" has only a single upstream dependency on "before_leaf". And "before leaf" has many (in this example 40: "01" to "40") upstream dependencies. Each upstream dependency "N" has "N-1" as its upstream dependency. The "before_leaf" job will list all 40 upstream causes. Each upstream cause on its own is limited to a recursive depth of 10 (according to `MAX_DEPTH`). The "leaf" job has a single upstream cause ("before_leaf"). The `Set<String> traversed` in the `UpstreamClause` prevents listing repeated upstream causes of the single upstream cause. Case (B) ( https://gist.github.com/dirk-thomas/37febb42abeb8631f946 ): A job "leaf" has only a single upstream dependency on "before_leaf". And "before leaf" has several (in this example 5: "a15" to "e15") upstream dependencies. Each upstream dependency "xN" has "xN-1" as its upstream dependency. Recursive upstream causes are usually "terminated" by a `DeeplyNestedUpstreamCause` when `MAX_DEPTH` is reached. `MAX_LEAF` prevents adding a `DeeplyNestedUpstreamCause` at the end of the recursion once the number of different causes has reached 25 addresses (`MAX_LEAF`). This can be seen in the "leaf" of of case (B). (I don't understand why skipping the `DeeplyNestedUpstreamCause` when aborting the recursion makes a big different though - it does not affect the log size significantly and contains valuable information (that the recursion has been aborted)). Based on these I identified two problems. Problem (A): limitation of performing the thresholds in the `UpstreamCause`: The "before_leaf" job of case (A) has 40 upstream causes. While each on its own does some logic for limiting the information each separate `UpstreamCause` instance does not know about its siblings. Therefore it can not adjust the level of information shown in the case that there are many siblings. This is not "fixable" in the `UpstreamCause` class itself. This would require some changes in the code handling the upstream causes to pass in information e.g. the number of siblings (which arguably a `UpstreamCause` should not need to know about). (The problem is the same for the "before_leaf" job of case (B).) Problem (B): the depth threshold is independent from the number of upstream causes: The "leaf" job of case (B) has only a single upstream cause. But this upstream cause outputs every upstream cause up to the recursion limit. This results in N x 10 upstream causes where N is the number of upstream causes of the single upstream cause of the job. A "combined" limit would probably make much more sense in this case. E.g. limit each recursion to not 10 but potentially less if the number of sibling upstream causes on the first level increases. (I am unable to provide a Java unit test since I lack the experience programming in Java but the Groovy examples should be verify specific and hopefully easy to transfer into a unit test by an experienced Jenkins/Java programmer.)
            Hide
            jglick Jesse Glick added a comment -

            Thank you for the analysis. I do not personally expect to have time to work on a fix, but perhaps someone else will.

            Show
            jglick Jesse Glick added a comment - Thank you for the analysis. I do not personally expect to have time to work on a fix, but perhaps someone else will.
            Hide
            dthomas Dirk Thomas added a comment -

            Can someone provide some insight how the problem (A) could be addressed?

            If we could sketch the desired solution together I might try to work on a patch for it.
            But currently the algorithmic approach is not clear to me - especially how to perform the limiting without affecting to many external classes (outside of `UpstreamCause`.

            Show
            dthomas Dirk Thomas added a comment - Can someone provide some insight how the problem (A) could be addressed? If we could sketch the desired solution together I might try to work on a patch for it. But currently the algorithmic approach is not clear to me - especially how to perform the limiting without affecting to many external classes (outside of `UpstreamCause`.
            Hide
            dthomas Dirk Thomas added a comment -

            Since the previous URL is about to become unavailable I will add a current one which demonstrates the ridiculous amount of redundant information being logged: http://build.ros.org/job/Ibin_uT64__desktop_full__ubuntu_trusty_amd64__binary/1/

            Show
            dthomas Dirk Thomas added a comment - Since the previous URL is about to become unavailable I will add a current one which demonstrates the ridiculous amount of redundant information being logged: http://build.ros.org/job/Ibin_uT64__desktop_full__ubuntu_trusty_amd64__binary/1/
            Hide
            dthomas Dirk Thomas added a comment -

            Another case with the console output containing 338,653 lines with nothing more than `Started by upstream project` and `originally caused by` lines: http://build.ros.org/job/Mrel_sync-packages-to-testing_bionic_amd64/529/

            Show
            dthomas Dirk Thomas added a comment - Another case with the console output containing 338,653 lines with nothing more than `Started by upstream project` and `originally caused by` lines:  http://build.ros.org/job/Mrel_sync-packages-to-testing_bionic_amd64/529/

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              dthomas Dirk Thomas
              Votes:
              5 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated: