Details
-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Major
-
Resolution: Unresolved
-
Component/s: git-plugin
-
Environment:Ubuntu 12.04 (but OS does not matter)
Jenkins 1.509.2 LTS (but Jenkins does not matter)
GIT plugin 1.4 (but affects all GIT versions)
-
Similar Issues:
Description
Hello everyone.
Months ago, we've noticed a bug/issue with the GIT plug-in. Previously, it was only a minor nuisance but now, it causes each build that we start to use up ~3MB of main memory and ~5MB of disk space in the build.xml.
The issue is due to the following behaviour of the GIT plug-in:
For every build that has the GIT SCM defined, it retrieves the list of branches in the remote repository. For each branch, it retrieves the last build in Jenkins that was run against this branch.
This information is then stored in the Build object in form of the "BuildData" field. This means, that the full list of all branches, plus their last builds is stored in each and every build – thus using up main memory and using up disk space in the "build.xml" file allocated for the build.
It uses this information to populate a page for the build with the association of branches to builds:
http://<SERVER>/job/<JOBNAME>/<BUILD-ID>/git/?
For normal repositories, this data is relatively small, as only a limited number of unmerged branches exist. Unfortunately, we use GIT in an automated manner, where thousands of tags and branches are spawned without merging back into the mainline.
This means that each build saves several hundred to thousand pointless key-value pairs for GIT branches and Jenkins builds that serve no purpose whatsoever.
In our case, this means – as outlined above – we waste 3MB of RAM per build and 5 MB of disk space. With 10k builds per day, you can imagine that this is quite a predicament.
As a workaround, we've written a Jenkins job that removes the tags contained in "<hudson.plugins.git.util.BuildData>" in the "build.xml". This cuts down its size from 5MB down to 16kB (~0.156MB). This of course also greatly boosts the speed of deserealizing the builds from disk.
Our request would be: Either remove the collections/deserialization of this (from our POV) pointless data, or make its generation optional via a configuration option.
Best regards,
Martin Schröder
Intel Mobile Communications GmbH
Attachments
Issue Links
- is blocking
-
JENKINS-41074 UX Issue with Polling in Multibranch Pipeline
-
- Open
-
- is duplicated by
-
JENKINS-30873 Jenkins' Git plugin reparses all previous build.xml on restart
-
- Open
-
-
JENKINS-56838 pipeline job hangs forever at checkout GitSCM
-
- Closed
-
-
JENKINS-47789 Git BuildData entry leaks in from Gerrit event triggered builds in build.xml
-
- Closed
-
- is related to
-
JENKINS-29482 Prune stale branches prevents git plugin change history display
-
- Closed
-
-
JENKINS-32218 buildsByBranchName is not set anymore
-
- Closed
-
-
JENKINS-56838 pipeline job hangs forever at checkout GitSCM
-
- Closed
-
- relates to
-
JENKINS-18588 Git polling builds same branch multiple times when 'Execute concurrent builds if necessary' turned on
-
- Open
-
- links to
Jason Jardina, yes that's part of the problem. The current build data solution is stored as a map once per build. The script there will delete all build data to conserve on memory and reduce the bloat. The ultimate issue is not that the single map is that much space but that every build keeps a map of history up to that point. Ultimately the issue is that this scales by N^2. If we have 10k builds, we have roughly N^2 (i know it's slightly less since it's more like n*(n-1)/2 ) number of things being stored in a map.
I firmly believe that the git plugin should be modified to store this data per job in an XmlFile in the job root. This way, we can maintain this history (as you and many others obviously require), while avoiding both the cost-complexity of storing the build data repeatably and of trying to rebuild the data from previous jobs.
This task shouldn't be too difficult, but it does require someone investing time, and unfortunately I don't have time to work on this at $DAYJOB right now, so it's not something I can commit to doing in a timely manner.
Now, one could argue that the git plugin shouldn't be saving data about builds which have been deleted, but that's neither here nor their as clearly people desire this behavior and it's how the plugin has behaved for many years now.