Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19022

GIT Plugin (any version) heavily bloats memory use and size of build.xml with "BuildData" fields

      Hello everyone.

      Months ago, we've noticed a bug/issue with the GIT plug-in. Previously, it was only a minor nuisance but now, it causes each build that we start to use up ~3MB of main memory and ~5MB of disk space in the build.xml.

      The issue is due to the following behaviour of the GIT plug-in:
      For every build that has the GIT SCM defined, it retrieves the list of branches in the remote repository. For each branch, it retrieves the last build in Jenkins that was run against this branch.

      This information is then stored in the Build object in form of the "BuildData" field. This means, that the full list of all branches, plus their last builds is stored in each and every build – thus using up main memory and using up disk space in the "build.xml" file allocated for the build.

      It uses this information to populate a page for the build with the association of branches to builds:
      http://<SERVER>/job/<JOBNAME>/<BUILD-ID>/git/?

      For normal repositories, this data is relatively small, as only a limited number of unmerged branches exist. Unfortunately, we use GIT in an automated manner, where thousands of tags and branches are spawned without merging back into the mainline.

      This means that each build saves several hundred to thousand pointless key-value pairs for GIT branches and Jenkins builds that serve no purpose whatsoever.

      In our case, this means – as outlined above – we waste 3MB of RAM per build and 5 MB of disk space. With 10k builds per day, you can imagine that this is quite a predicament.

      As a workaround, we've written a Jenkins job that removes the tags contained in "<hudson.plugins.git.util.BuildData>" in the "build.xml". This cuts down its size from 5MB down to 16kB (~0.156MB). This of course also greatly boosts the speed of deserealizing the builds from disk.

      Our request would be: Either remove the collections/deserialization of this (from our POV) pointless data, or make its generation optional via a configuration option.

      Best regards,
      Martin Schröder
      Intel Mobile Communications GmbH

          [JENKINS-19022] GIT Plugin (any version) heavily bloats memory use and size of build.xml with "BuildData" fields

          Jim D added a comment -

          markewaite, I'm glad I found the comments in this issue, thank you!  When the final 4.0.0 is released with the changes for this issue, will it make sure to handle both BuildDetails and BuildData for previous build runs?  We updated to 4.0.0-rc some time back, and I just came across this thread recently trying to resolve a BuildData issue, and rolled back to 3.9.3.  At this point, Git plugin is working and creating BuildData for each build run, but all those old builds done with 4.0.0-rc don't have any kind of "Git Build Data" in the Jenkins UI anymore, and no BuildData or BuildDetails when retrieved programmatically from WorkflowRun.getAllActions.  I think it must be that 3.9.3 isn't able to read the new BuildDetails object info.  Will 4.0.0 be able to read both, from both 3.9.X builds and 4.0.0 builds?  Thanks again.

           

          Jim D added a comment - markewaite , I'm glad I found the comments in this issue, thank you!  When the final 4.0.0 is released with the changes for this issue, will it make sure to handle both BuildDetails and BuildData for previous build runs?  We updated to 4.0.0-rc some time back, and I just came across this thread recently trying to resolve a BuildData issue, and rolled back to 3.9.3.  At this point, Git plugin is working and creating BuildData for each build run, but all those old builds done with 4.0.0-rc don't have any kind of "Git Build Data" in the Jenkins UI anymore, and no BuildData or BuildDetails when retrieved programmatically from WorkflowRun.getAllActions.  I think it must be that 3.9.3 isn't able to read the new BuildDetails object info.  Will 4.0.0 be able to read both, from both 3.9.X builds and 4.0.0 builds?  Thanks again.  

          Mark Waite added a comment -

          jkd this issue won't be fixed in 4.0.0. The incompatibilities from the BuildDetails change were too great for the community. The accidental release of git plugin 4.0.0-rc to the production update centers showed incompatibilities that I had missed in my testing and that others had missed in their testing.

          BuildData will be the same bloated memory user in 4.0.0 that it is in 3.x.

          Mark Waite added a comment - jkd this issue won't be fixed in 4.0.0. The incompatibilities from the BuildDetails change were too great for the community. The accidental release of git plugin 4.0.0-rc to the production update centers showed incompatibilities that I had missed in my testing and that others had missed in their testing. BuildData will be the same bloated memory user in 4.0.0 that it is in 3.x.

          Jim D added a comment -

          Thanks for the update!

          Jim D added a comment - Thanks for the update!

          zy zhang added a comment -

          Hi, you can use the below groovy script to delete git revisions for the current build.

          import jenkins.model.*

          jenkinsInstance = jenkins.model.Jenkins.get()

          def job = jenkinsInstance.getItemByFullName(JOB_NAME);
          def build = job.getBuild(BUILD_NUMBER)
          def prj = build.project
          def gitActions = build.getActions(hudson.plugins.git.util.BuildData.class)

          if (gitActions != null) {
          for (action in gitActions)

          { build.actions.remove(action) //build.actions.add(action) build.save() }

          }

          zy zhang added a comment - Hi, you can use the below groovy script to delete git revisions for the current build. import jenkins.model.* jenkinsInstance = jenkins.model.Jenkins.get() def job = jenkinsInstance.getItemByFullName(JOB_NAME); def build = job.getBuild(BUILD_NUMBER) def prj = build.project def gitActions = build.getActions(hudson.plugins.git.util.BuildData.class) if (gitActions != null) { for (action in gitActions) { build.actions.remove(action) //build.actions.add(action) build.save() } }

          markewaite I think the "BuildData" structure has been heavily refactored isn't it? Should this be closed maybe? Thanks

          Baptiste Mathus added a comment - markewaite I think the "BuildData" structure has been heavily refactored isn't it? Should this be closed maybe? Thanks

          Mark Waite added a comment - - edited

          Unfortunately batmat, the three attempts (two by ndeloof and one by jekeller ) were unable to significantly refactor BuildData in a compatible fashion. The most recent attempt by jekeller passed multiple months of my testing but showed compatibility issues in the accidental release of git plugin 4.0.0-rc.

          The changes were reverted before the release of git plugin 4.0.0.

          The git plugin documentation now includes instructions as a system groovy script that removes BuildData. See https://plugins.jenkins.io/git/#remove-git-plugin-buildsbybranch-builddata-script

          Mark Waite added a comment - - edited Unfortunately batmat , the three attempts (two by ndeloof and one by jekeller ) were unable to significantly refactor BuildData in a compatible fashion. The most recent attempt by jekeller passed multiple months of my testing but showed compatibility issues in the accidental release of git plugin 4.0.0-rc. The changes were reverted before the release of git plugin 4.0.0. The git plugin documentation now includes instructions as a system groovy script that removes BuildData. See https://plugins.jenkins.io/git/#remove-git-plugin-buildsbybranch-builddata-script

          Jacob Keller added a comment -

          batmat the refactor was reverted because it had unexpected side effects.

          My solution involved doing a search/lookup mechanism against all old builds and "rebuilding" the build data every job. This works but slows down significantly once you have a lot of jobs.

          I believe a better solution exists using a plugin-specific XML file, so we basically just stop storing the build data per-build and start storing it per-job as a separate file. I've thought about it on-and-off for a while but never got around to trying to implement it.

          Jacob Keller added a comment - batmat the refactor was reverted because it had unexpected side effects. My solution involved doing a search/lookup mechanism against all old builds and "rebuilding" the build data every job. This works but slows down significantly once you have a lot of jobs. I believe a better solution exists using a plugin-specific XML file, so we basically just stop storing the build data per-build and start storing it per-job as a separate file. I've thought about it on-and-off for a while but never got around to trying to implement it.

          Jason Jardina added a comment - - edited

          markewaite I ran that script you listed and it kicked several, meaning over 50, old builds that had been built previously.  I use regex to scan my repositories by naming convention using git polling.  A build is kicked when commit hash has changed on a regex named branch.  I am glad I ran that on my older code server and not my currently shipping code.  That script is dangerous.  It may solve your problems, but it definitely does not solve mine.  I have to have the build history in order for Jenkins to know what it has built previously, so it doesn't get stuck in a build loop.  That script is like sticking a loaded gun to Jenkins head and pulling the trigger.  Before you tell everyone to run that script and delete their build data, you should warn them they may see unexpected results, exactly like I saw when we updated to git plugin 4.0.0-rc that was accidentally released in the wild last year.

          The best solution I found is to only keep 10-20 build history on Jenkins by using Discard Old Builds, log rotation settings.  That lets me keep my current git history, without the history file size getting out of hand and slowing builds/reboots. 

          Jason Jardina added a comment - - edited markewaite I ran that script you listed and it kicked several, meaning over 50, old builds that had been built previously.  I use regex to scan my repositories by naming convention using git polling.  A build is kicked when commit hash has changed on a regex named branch.  I am glad I ran that on my older code server and not my currently shipping code.  That script is dangerous.  It may solve your problems, but it definitely does not solve mine.  I have to have the build history in order for Jenkins to know what it has built previously, so it doesn't get stuck in a build loop.  That script is like sticking a loaded gun to Jenkins head and pulling the trigger.  Before you tell everyone to run that script and delete their build data, you should warn them they may see unexpected results, exactly like I saw when we updated to git plugin 4.0.0-rc that was accidentally released in the wild last year. The best solution I found is to only keep 10-20 build history on Jenkins by using Discard Old Builds, log rotation settings.  That lets me keep my current git history, without the history file size getting out of hand and slowing builds/reboots. 

          Jacob Keller added a comment -

          jjardina, yes that's part of the problem. The current build data solution is stored as a map once per build. The script there will delete all build data to conserve on memory and reduce the bloat. The ultimate issue is not that the single map is that much space but that every build keeps a map of history up to that point. Ultimately the issue is that this scales by N^2. If we have 10k builds, we have roughly N^2 (i know it's slightly less since it's more like n*(n-1)/2 ) number of things being stored in a map.

          I firmly believe that the git plugin should be modified to store this data per job in an XmlFile in the job root. This way, we can maintain this history (as you and many others obviously require), while avoiding both the cost-complexity of storing the build data repeatably and of trying to rebuild the data from previous jobs.

          This task shouldn't be too difficult, but it does require someone investing time, and unfortunately I don't have time to work on this at $DAYJOB right now, so it's not something I can commit to doing in a timely manner.

          Now, one could argue that the git plugin shouldn't be saving data about builds which have been deleted, but that's neither here nor their as clearly people desire this behavior and it's how the plugin has behaved for many years now.

          Jacob Keller added a comment - jjardina , yes that's part of the problem. The current build data solution is stored as a map once per build. The script there will delete all build data to conserve on memory and reduce the bloat. The ultimate issue is not that the single map is that much space but that every build keeps a map of history up to that point. Ultimately the issue is that this scales by N^2. If we have 10k builds, we have roughly N^2 (i know it's slightly less since it's more like n*(n-1)/2 ) number of things being stored in a map. I firmly believe that the git plugin should be modified to store this data per job in an XmlFile in the job root. This way, we can maintain this history (as you and many others obviously require), while avoiding both the cost-complexity of storing the build data repeatably and of trying to rebuild the data from previous jobs. This task shouldn't be too difficult, but it does require someone investing time, and unfortunately I don't have time to work on this at $DAYJOB right now, so it's not something I can commit to doing in a timely manner. Now, one could argue that the git plugin shouldn't be saving data about builds which have been deleted, but that's neither here nor their as clearly people desire this behavior and it's how the plugin has behaved for many years now.

          Brittany added a comment - - edited

          Hi! I ran into this issue in my work and recently fixed it for our job runs in a way I didn't find noted anywhere. I'm not sure this will solve it for anyone else, but just in case, here's what I found.

          In our Jenkins configuration we were using:

          `url: 'https://github.com/my-awesome-org/my-even-better-repo'` (here's hoping this isn't an actual repo)

          But when I changed it to:

          `url: 'https://github.com/my-awesome-org/my-even-better-repo.git'`,

          (note the `.git` extension), the warning was gone (and it also changed the Jenkins console output from "The recommended git tool is: NONE" to "The recommended git tool is: git"). It also drastically reduced the output on the Build Data page.

          Hope this helps someone else!
           

          Brittany added a comment - - edited Hi! I ran into this issue in my work and recently fixed it for our job runs in a way I didn't find noted anywhere. I'm not sure this will solve it for anyone else, but just in case, here's what I found. In our Jenkins configuration we were using: `url: 'https://github.com/my-awesome-org/my-even-better-repo'` (here's hoping this isn't an actual repo) But when I changed it to: `url: 'https://github.com/my-awesome-org/my-even-better-repo.git'`, (note the `.git` extension), the warning was gone (and it also changed the Jenkins console output from "The recommended git tool is: NONE" to "The recommended git tool is: git"). It also drastically reduced the output on the Build Data page. Hope this helps someone else!  

            Unassigned Unassigned
            mhschroe Martin Schröder
            Votes:
            40 Vote for this issue
            Watchers:
            90 Start watching this issue

              Created:
              Updated: