-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
Powered by SuggestiMate
Currently, for pipeline builds, every time the pipeline runs on a new node and merges the branch you're building with master, you get an extra "Git Build Data" entry in the sidebar, even though you're building exactly the same thing.
The individual screens show different revisions being built:
But actually, these are the same PR branch, merged with the same master branch. In this situation, I would expect a single entry, and I would expect the linked page to list the branches which were merged, not to show the commit hash of the merge result. The merge result is literally impossible to access after the build has completed anyway, so there is no point having this information at all.
[JENKINS-52926] "Git Build Data" should not appear more than once in the side menu for pipeline builds
The job is a pipeline job.
Multiple checkouts do get performed, but I think that's a necessity? Otherwise, I don't understand how code would get from the node coordinating the build to each of the individual nodes doing the build.
The hash built is different every time a separate node is spun off, because the merge gets performed again and git hashes are essentially random numbers, rather than actual hashes of the content. The multiple hashes reflect the random number each merge generated when the build for that node was performed.
When I'm essentially saying is that using the hash is not a good idea for deduplicating what code is being built, because when a merge has occurred, the number is not usable in any way. If you replaced that information with enough information to actually reproduce the build, and then deduplicated on that, I think all the items would deduplicate properly.
Thanks for the further explanation.
You might perform the merge once, stash the result, then unstash it on each of the other platforms, rather than performing the merge on every platform.
As another alternative, you might perform the checkout once on every platform, perform the merge on one, generate a patch of the merge result, stash the patch, and unstash the patch on each of the targets.
You said:
using the hash is not a good idea for deduplicating what code is being built
The SHA-1 hash is the most natural choice to decide if something has changed. Performing the same merge on multiple agents is not a use case that the plugin is ready to handle.
Even if we only ran the merge on one agent, the fact of the matter is that the git info page will still give us no useful information about what it actually built against, since it will still be showing a hash which doesn't correspond to any actual commit in the repository.
You claim that the SHA-1 hash is the most natural choice, but it isn't even a hash. You can't reproduce the same value even if you have the same content, so every time you run the same build, you get a new random value for it. Clearly the set of actual hashes which went into the merge is a much more suitable choice, both as an identifier, and to record the information about what actually got built.
On top of that, is stashing the entire repository really going to be performant enough to use during a build? We're not working with a toy project here... there are literally gigabytes of stuff in there.
Ah, there are even more problems with stash, so it's definitely unusable in its current state. For instance, if you stash from an Ubuntu slave and then unstash from a macOS slave, you wind up with the build in a weird state where some of the up-to-date checks don't work, because Gradle is storing absolute paths.
It sounds like this has been separately reported to Gradle as well, but they didn't seem to understand the problem.
Also seeing this issue; less duplicates in our setup, but just as confusing to confusing to users.
kbruer are you also performing the merge yourself inside the pipeline step as trejkaz is, or are you using the multibranch pipeline facilities that perform the merge step for you?
Multiple Git Build Data entries also appear when using a pipeline shared library. Each pipeline shared library will cause another entry to be added.
We are facing similar issue where we are using pipeline job with jenkins shared library.
For each and every build we are getting Git build data related to jenkins shared library. Is there any solution for this?
If you're seeing duplicate git build data in each build, you may be able to resolve it with the script from https://plugins.jenkins.io/git/#remove-git-plugin-buildsbybranch-builddata-script .
If all that you're seeing is one entry for the Pipeline shared library build data and one entry for the primary repository build data, then there is no solution for that. The Pipeline job is the combination of the pipeline shared library and the primary repository. I think it would be a mistake to hide the Pipeline shared library information, since it can have a significant impact on the build.
Thanks markewaite for your answer.
As you said:
"If you're seeing duplicate git build data in each build, you may be able to resolve it with the script from https://plugins.jenkins.io/git/#remove-git-plugin-buildsbybranch-builddata-script ."
Where is this script suppose to go and how to use it. Can you please guide me more on this?
Thanks for asking the clarifying question. Your question highlights the weakness in that section of the git plugin documentation. Here is the text that should precede that block of code in the documentation. Can you let me know if that makes it clearer?
The git plugin has an issue (JENKINS-19022) that sometimes causes excessive memory use and disc use in the build history of a job. The problem occurs because in some cases the git plugin copies the git build data from previous builds to the most recent build, even though the git build data from the previous build is not used in the most recent build. The issue can be especially challenging when a job retains a very large number of historical builds or when a job builds a wide range of commits during its history.
Multiple attempts to resolve the core issue without breaking compatibility have been unsuccessful. A workaround is provided below that will remove the git build data from the build records. The workaround is a system groovy script that needs to be run from the Jenkins Administrator's Script Console (as in https://jenkins.example.com/script ). Administrator permission is required to run system groovy scripts.
I've submitted that text to the git plugin documentation as https://github.com/jenkinsci/git-plugin/pull/1103
Hi,
I'm also interested in adding this feature.
In my pipeline, placed in test repository, I run build on N nodes in parallel. On each node I checkout product repository (not the one with Jenkinsfile) using checkout step with PreBuildMerge extension and then run tests. As a result I have N Build Data entries, each of them with different sha.
It's problematic because:
- There are N Build Data entries.
- I cannot easily check which revision has been tested, because shas do not exist in product repository
I know, that I can checkout repo once, stash them and unstash on nodes, but it's not user friendly solution.
I can imagine, that there are projects, which checkout their repository and then push merged commits. In that case having sha of merge commit in Build Data makes sense, but in my case this information is useless.
So, my proposition is to add new option to PreBuildMerge extension. Something like buildData:
[$class: 'PreBuildMerge', options: [*buildData: 'source'*, mergeRemote: 'origin', mergeTarget: "main"]]
If buildData is set to:
- 'merged' (default) - Build Data contains sha of merged commit - current implementation, backward compatible
- 'source' - Build Data contains sha of PR branch
If I understand how git plugin works, in second case there should be only 1 entry in my case, because plugin detects that they are duplicates.
What do you think about such solution?
jgalda I think your proposal sounds very interesting. It would preserve compatibility and still allow the reduction of BuildData for users that do not want the duplication.
In my use case, I actually extract multiple repos, but I'd like the git build data to be under 1 git build data item instead of a number of entries.
The multiple entries prevent other sidebar items, such as pipeline console, from fitting on the screen.
items that require to scroll down in the sidebar
Can you provide detailed steps to duplicate the problem?
The sample you've provided almost looks like multiple checkout operations are being performed (and I doubt that is what you're actually doing in your build).
Are the SHA-1 hashes in the git build data from the preceding builds? Since you indicate that the list continues to grow, it seems likely that they are somehow recording the history of the SHA-1's built by that job.
Is the job a Pipeline job or a Freestyle job?
If it is a Pipeline job, is it using a Pipeline shared library?