Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-52926

"Git Build Data" should not appear more than once in the side menu for pipeline builds

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • git-plugin
    • None

      Currently, for pipeline builds, every time the pipeline runs on a new node and merges the branch you're building with master, you get an extra "Git Build Data" entry in the sidebar, even though you're building exactly the same thing.

      The individual screens show different revisions being built:


      But actually, these are the same PR branch, merged with the same master branch. In this situation, I would expect a single entry, and I would expect the linked page to list the branches which were merged, not to show the commit hash of the merge result. The merge result is literally impossible to access after the build has completed anyway, so there is no point having this information at all.

       

          [JENKINS-52926] "Git Build Data" should not appear more than once in the side menu for pipeline builds

          trejkaz created issue -
          Mark Waite made changes -
          Assignee Original: Mark Waite [ markewaite ]

          Mark Waite added a comment - - edited

          Can you provide detailed steps to duplicate the problem?

          The sample you've provided almost looks like multiple checkout operations are being performed (and I doubt that is what you're actually doing in your build).

          Are the SHA-1 hashes in the git build data from the preceding builds? Since you indicate that the list continues to grow, it seems likely that they are somehow recording the history of the SHA-1's built by that job.

          Is the job a Pipeline job or a Freestyle job?

          If it is a Pipeline job, is it using a Pipeline shared library?

          Mark Waite added a comment - - edited Can you provide detailed steps to duplicate the problem? The sample you've provided almost looks like multiple checkout operations are being performed (and I doubt that is what you're actually doing in your build). Are the SHA-1 hashes in the git build data from the preceding builds? Since you indicate that the list continues to grow, it seems likely that they are somehow recording the history of the SHA-1's built by that job. Is the job a Pipeline job or a Freestyle job? If it is a Pipeline job, is it using a Pipeline shared library?

          trejkaz added a comment -

          The job is a pipeline job.

          Multiple checkouts do get performed, but I think that's a necessity? Otherwise, I don't understand how code would get from the node coordinating the build to each of the individual nodes doing the build.

          The hash built is different every time a separate node is spun off, because the merge gets performed again and git hashes are essentially random numbers, rather than actual hashes of the content. The multiple hashes reflect the random number each merge generated when the build for that node was performed.

          When I'm essentially saying is that using the hash is not a good idea for deduplicating what code is being built, because when a merge has occurred, the number is not usable in any way. If you replaced that information with enough information to actually reproduce the build, and then deduplicated on that, I think all the items would deduplicate properly.

          trejkaz added a comment - The job is a pipeline job. Multiple checkouts do get performed, but I think that's a necessity? Otherwise, I don't understand how code would get from the node coordinating the build to each of the individual nodes doing the build. The hash built is different every time a separate node is spun off, because the merge gets performed again and git hashes are essentially random numbers, rather than actual hashes of the content. The multiple hashes reflect the random number each merge generated when the build for that node was performed. When I'm essentially saying is that using the hash is not a good idea for deduplicating what code is being built, because when a merge has occurred, the number is not usable in any way. If you replaced that information with enough information to actually reproduce the build, and then deduplicated on that , I think all the items would deduplicate properly.

          Mark Waite added a comment -

          Thanks for the further explanation.

          You might perform the merge once, stash the result, then unstash it on each of the other platforms, rather than performing the merge on every platform.

          As another alternative, you might perform the checkout once on every platform, perform the merge on one, generate a patch of the merge result, stash the patch, and unstash the patch on each of the targets.

          You said:

          using the hash is not a good idea for deduplicating what code is being built

          The SHA-1 hash is the most natural choice to decide if something has changed. Performing the same merge on multiple agents is not a use case that the plugin is ready to handle.

          Mark Waite added a comment - Thanks for the further explanation. You might perform the merge once, stash the result, then unstash it on each of the other platforms, rather than performing the merge on every platform. As another alternative, you might perform the checkout once on every platform, perform the merge on one, generate a patch of the merge result, stash the patch, and unstash the patch on each of the targets. You said: using the hash is not a good idea for deduplicating what code is being built The SHA-1 hash is the most natural choice to decide if something has changed. Performing the same merge on multiple agents is not a use case that the plugin is ready to handle.

          trejkaz added a comment -

          Even if we only ran the merge on one agent, the fact of the matter is that the git info page will still give us no useful information about what it actually built against, since it will still be showing a hash which doesn't correspond to any actual commit in the repository.

          You claim that the SHA-1 hash is the most natural choice, but it isn't even a hash. You can't reproduce the same value even if you have the same content, so every time you run the same build, you get a new random value for it. Clearly the set of actual hashes which went into the merge is a much more suitable choice, both as an identifier, and to record the information about what actually got built.

          trejkaz added a comment - Even if we only ran the merge on one agent, the fact of the matter is that the git info page will still give us no useful information about what it actually built against, since it will still be showing a hash which doesn't correspond to any actual commit in the repository. You claim that the SHA-1 hash is the most natural choice, but it isn't even a hash. You can't reproduce the same value even if you have the same content, so every time you run the same build, you get a new random value for it. Clearly the set of actual hashes which went into the merge is a much more suitable choice, both as an identifier, and to record the information about what actually got built.

          trejkaz added a comment -

          On top of that, is stashing the entire repository really going to be performant enough to use during a build? We're not working with a toy project here... there are literally gigabytes of stuff in there.

          trejkaz added a comment - On top of that, is stashing the entire repository really going to be performant enough to use during a build? We're not working with a toy project here... there are literally gigabytes of stuff in there.

          trejkaz added a comment -

          Ah, there are even more problems with stash, so it's definitely unusable in its current state. For instance, if you stash from an Ubuntu slave and then unstash from a macOS slave, you wind up with the build in a weird state where some of the up-to-date checks don't work, because Gradle is storing absolute paths.

          It sounds like this has been separately reported to Gradle as well, but they didn't seem to understand the problem.

          trejkaz added a comment - Ah, there are even more problems with stash, so it's definitely unusable in its current state. For instance, if you stash from an Ubuntu slave and then unstash from a macOS slave, you wind up with the build in a weird state where some of the up-to-date checks don't work, because Gradle is storing absolute paths. It sounds like this has been separately reported to Gradle as well, but they didn't seem to understand the problem.

          Kevin Bruer added a comment -

          Also seeing this issue; less duplicates in our setup, but just as confusing to confusing to users. 

          Kevin Bruer added a comment - Also seeing this issue; less duplicates in our setup, but just as confusing to confusing to users. 

          Mark Waite added a comment -

          kbruer are you also performing the merge yourself inside the pipeline step as trejkaz is, or are you using the multibranch pipeline facilities that perform the merge step for you?

          Multiple Git Build Data entries also appear when using a pipeline shared library. Each pipeline shared library will cause another entry to be added.

          Mark Waite added a comment - kbruer are you also performing the merge yourself inside the pipeline step as trejkaz is, or are you using the multibranch pipeline facilities that perform the merge step for you? Multiple Git Build Data entries also appear when using a pipeline shared library. Each pipeline shared library will cause another entry to be added.

            Unassigned Unassigned
            trejkaz trejkaz
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: