Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-69415

[pipeline-groovy-lib] /libs folder of every build clogs up file system

      Hello, we have a lot of jobs running on our jenkins and each job (mostly multi-branch pipelines) are using multiple libraries. The libraries are downloaded from GitHub and configured to all download the same default version. We now face the issue that everytime a build is started each library seems to be re-downloaded from GitHub into the individual build directories, say:

      /var/jenkins_home/jobs/<name>/branches/main/builds/46/libs/...
      /var/jenkins_home/jobs/<name>/branches/main/builds/47/libs/...
      ...
      

      As some libraries come with hundrets of files, we have hundrets of jobs, that do hundrets of builds, that results in literally millions of duplicated files on our Jenkins clogging up the whole file system. At that is even if the version of the libraries in question do not change between build the same library is downloaded all over again for every build.

      Is there a way to prevent this massive duplication of files between builds, like a way to clone the library to a central folder on Jenkins instead of the individual build directories instead? Thanks!

          [JENKINS-69415] [pipeline-groovy-lib] /libs folder of every build clogs up file system

          Jesse Glick added a comment -

          For purposes of resuming after restart Replay it is necessary to somehow have a copy of the exact version of the library used in that particular build.

          Some filesystems can automatically deduplicate.

          Currently the LibraryRetriever interface forces libraries to be downloaded to disk. It would be nice on supported SCMs like GitHub to retrieve source files on demand via HTTP(S) API (“lightweight checkout”), so that the build record would only need to keep the commit hash or similar.

          JENKINS-38992 introduced caching but I guess still makes copies.

          Jesse Glick added a comment - For purposes of resuming after restart Replay it is necessary to somehow have a copy of the exact version of the library used in that particular build. Some filesystems can automatically deduplicate. Currently the LibraryRetriever interface forces libraries to be downloaded to disk. It would be nice on supported SCMs like GitHub to retrieve source files on demand via HTTP(S) API (“lightweight checkout”), so that the build record would only need to keep the commit hash or similar. JENKINS-38992 introduced caching but I guess still makes copies.

          Kristian Kraljic added a comment - - edited

          Hey Jesse, thanks for your reply!

          Mhm, I see, makes sense to me. As you mentioned „Replay“ I guess also „Rebuild“ would be affected right? In our case we essentially never use „Replay“, so if we would loose the ability to „Replay“ it wouldn‘t be a problem… it is just „Rebuild“ that we use from time to time, if the build got stuck.

          While I have not checked out your implementation of the `LibraryRetriever`, wouldn‘t it be feasible to add an option for „central storage“ to the plugin? What the option would do is, when a repository with a certain brach is getting cloned, it‘ll put it to a central location instead of into the build folder. The folder name could just be the commit hash of HEAD (or even any unique hash generated from the cloned folder). The next time the same repository is to be cloned, deduplication would automatically happen, as after checkout and determining the hash, the folder would already exist and thus the cloned copy could be deleted again immediately. The Git repository would still need to be available when building. This option could be used in combination with the caching introduced with JENKINS-38992 for adde resilience. Whether to use global storage or local storage (current behaviour) could be simply determined by the presence of a „lib_x.txt“ file in the build directory: the hash within the file determines the folder to look for in the central storage location, if not present, the current retriever logic will be used.

          The only issue I see with this approach would be cleaning up library versions from the central storage location, for which no builds are available any longer, referencing the hash. I don‘t know if one could utilize some hook of Jenkins, to delete the central library when the last bulld is deleted referencing it. However compared to the massive duplication that is currently happening, some orphane files would not be the end of the world in my humble oppinion.

          What do you think? I think the advantage would be, that my approach would essentially work with any kind of SCM that works with commit hashes and a fallback could be to determine an own folder hash as a backup. Looking forward hearing from you.

          Kristian Kraljic added a comment - - edited Hey Jesse, thanks for your reply! Mhm, I see, makes sense to me. As you mentioned „Replay“ I guess also „Rebuild“ would be affected right? In our case we essentially never use „Replay“, so if we would loose the ability to „Replay“ it wouldn‘t be a problem… it is just „Rebuild“ that we use from time to time, if the build got stuck. While I have not checked out your implementation of the `LibraryRetriever`, wouldn‘t it be feasible to add an option for „central storage“ to the plugin? What the option would do is, when a repository with a certain brach is getting cloned, it‘ll put it to a central location instead of into the build folder. The folder name could just be the commit hash of HEAD (or even any unique hash generated from the cloned folder). The next time the same repository is to be cloned, deduplication would automatically happen, as after checkout and determining the hash, the folder would already exist and thus the cloned copy could be deleted again immediately. The Git repository would still need to be available when building. This option could be used in combination with the caching introduced with JENKINS-38992 for adde resilience. Whether to use global storage or local storage (current behaviour) could be simply determined by the presence of a „lib_x.txt“ file in the build directory: the hash within the file determines the folder to look for in the central storage location, if not present, the current retriever logic will be used. The only issue I see with this approach would be cleaning up library versions from the central storage location, for which no builds are available any longer, referencing the hash. I don‘t know if one could utilize some hook of Jenkins, to delete the central library when the last bulld is deleted referencing it. However compared to the massive duplication that is currently happening, some orphane files would not be the end of the world in my humble oppinion. What do you think? I think the advantage would be, that my approach would essentially work with any kind of SCM that works with commit hashes and a fallback could be to determine an own folder hash as a backup. Looking forward hearing from you.

          Jesse Glick added a comment -

          Something like that sounds like it could work if cleanup were automated. Please bear in mind that I lack the time to either implement such changes or review PRs. There may be some maintainers active here on occasion.

          Jesse Glick added a comment - Something like that sounds like it could work if cleanup were automated. Please bear in mind that I lack the time to either implement such changes or review PRs. There may be some maintainers active here on occasion.

          George added a comment - - edited

          Dear Devs,

          We are running Jenkins LTS 2.319.3

          This issue is happening to our team as well.  It's causing 

          1. Very high read I/O to our disks, which in-turn causes high CPU I/O waiting time. 
          2. High filesystem cache's due to the insane amount of files.
          3. Our host to exhaust file inodes rather quickly. 

          The first two symptoms are not huge a problem on its own, especially on multi-core, modern Linux operating systems, but they provide us a worrying stat to watch and we cannot be 100% sure of the negative impact to performance.

          The third symptom is the more serious problem as it can adversely affect and stop Jenkins from writing any more new files.

          George added a comment - - edited Dear Devs, We are running Jenkins LTS 2.319.3 This issue is happening to our team as well.  It's causing  Very high read I/O to our disks, which in-turn causes high CPU I/O waiting time.  High filesystem cache's due to the insane amount of files. Our host to exhaust file inodes rather quickly.  The first two symptoms are not huge a problem on its own, especially on multi-core, modern Linux operating systems, but they provide us a worrying stat to watch and we cannot be 100% sure of the negative impact to performance. The third symptom is the more serious problem as it can adversely affect and stop Jenkins from writing any more new files.

          Jesse Glick added a comment -

          JENKINS-70870 would probably address the inode exhaustion issue, as each library × build would consume just one inode regardless of the number of sources inside that library.

          Jesse Glick added a comment - JENKINS-70870 would probably address the inode exhaustion issue, as each library × build would consume just one inode regardless of the number of sources inside that library.

          jglick, agreed. At least this would significantly reduce the number of files and also make cleaning up way easier! Thanks for letting us know.

          Any EST if & when this is going to be released in the plugin?

          Kristian Kraljic added a comment - jglick , agreed. At least this would significantly reduce the number of files and also make cleaning up way easier! Thanks for letting us know. Any EST if & when this is going to be released in the plugin?

          Jesse Glick added a comment -

          JENKINS-70870? Not currently. I prepared the change in the course of working on something else (JENKINS-70869) but lack the time to follow up, mainly to do careful performance testing. Since it unconditionally switches the behavior of the plugin, there is a risk of regression. (Making this a feature flag would be possible in principle but would make the code much more complex since numerous parts of the plugin would have two distinct code paths.)

          Jesse Glick added a comment - JENKINS-70870 ? Not currently. I prepared the change in the course of working on something else ( JENKINS-70869 ) but lack the time to follow up, mainly to do careful performance testing. Since it unconditionally switches the behavior of the plugin, there is a risk of regression. (Making this a feature flag would be possible in principle but would make the code much more complex since numerous parts of the plugin would have two distinct code paths.)

            Unassigned Unassigned
            kriskra Kristian Kraljic
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: