I will describe two concrete cases to have a baseline for the further discussion.
Case (A) ( https://gist.github.com/dirk-thomas/9bbd47397e48ef3ceef8 ):
A job "leaf" has only a single upstream dependency on "before_leaf".
And "before leaf" has many (in this example 40: "01" to "40") upstream dependencies.
Each upstream dependency "N" has "N-1" as its upstream dependency.
The "before_leaf" job will list all 40 upstream causes.
Each upstream cause on its own is limited to a recursive depth of 10 (according to `MAX_DEPTH`).
The "leaf" job has a single upstream cause ("before_leaf").
The `Set<String> traversed` in the `UpstreamClause` prevents listing repeated upstream causes of the single upstream cause.
Case (B) ( https://gist.github.com/dirk-thomas/37febb42abeb8631f946 ):
A job "leaf" has only a single upstream dependency on "before_leaf".
And "before leaf" has several (in this example 5: "a15" to "e15") upstream dependencies.
Each upstream dependency "xN" has "xN-1" as its upstream dependency.
Recursive upstream causes are usually "terminated" by a `DeeplyNestedUpstreamCause` when `MAX_DEPTH` is reached.
`MAX_LEAF` prevents adding a `DeeplyNestedUpstreamCause` at the end of the recursion once the number of different causes has reached 25 addresses (`MAX_LEAF`).
This can be seen in the "leaf" of of case (B).
(I don't understand why skipping the `DeeplyNestedUpstreamCause` when aborting the recursion makes a big different though - it does not affect the log size significantly and contains valuable information (that the recursion has been aborted)).
Based on these I identified two problems.
Problem (A): limitation of performing the thresholds in the `UpstreamCause`:
The "before_leaf" job of case (A) has 40 upstream causes.
While each on its own does some logic for limiting the information each separate `UpstreamCause` instance does not know about its siblings.
Therefore it can not adjust the level of information shown in the case that there are many siblings.
This is not "fixable" in the `UpstreamCause` class itself.
This would require some changes in the code handling the upstream causes to pass in information e.g. the number of siblings (which arguably a `UpstreamCause` should not need to know about).
(The problem is the same for the "before_leaf" job of case (B).)
Problem (B): the depth threshold is independent from the number of upstream causes:
The "leaf" job of case (B) has only a single upstream cause.
But this upstream cause outputs every upstream cause up to the recursion limit.
This results in N x 10 upstream causes where N is the number of upstream causes of the single upstream cause of the job.
A "combined" limit would probably make much more sense in this case.
E.g. limit each recursion to not 10 but potentially less if the number of sibling upstream causes on the first level increases.
(I am unable to provide a Java unit test since I lack the experience programming in Java but the Groovy examples should be verify specific and hopefully easy to transfer into a unit test by an experienced Jenkins/Java programmer.)
Showing a shallow list of upstream causes from every upstream job is not a real problem in and of itself, because the set of upstream projects is of a fixed size. The problem is when you start showing a somewhat deeper graph, and the upstream projects themselves have interdependencies, because then you can get into an exponentially large set of causes. Duplicated graph nodes should be pruned, and/or the total graph size bounded.