• Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • fingerprint-plugin
    • None

      Fingerprints are stored in a large number of small XML files in a certain folder (by default, called fingerprints). This makes it really slow to move around data by file copying, as the OS will search the files one-by-one in order to copy, and they tend to be very scattered on the physical disk space. Further this is very space-inneficient, due to a lot of wasted disk space due to file allocation tables which typically allocates multiples of 4 kb blocks for files. Further, space is wasted in some directory allocation tables too.

      Please, put the fingerprints together in a single file, or at least group them in a smaller number of files.

          [JENKINS-19066] Do not spawn multiple files for fingerprints

          trejkaz added a comment - - edited

          We have huge problems with fingerprint files - taking up 50 gigabytes of disk space on a machine which is already stressed for storage, taking a long time to copy anywhere, taking Jenkins itself a long time to compact the fingerprints...

          Storing them in one file certainly seems to be a good fix. I'd almost suggest using a relational database for it, since a lot of time seems to be spent figuring out the links from a project which used the file to the project which generated the file.

          Our build generates at least 100 artifacts, so presumably the workaround for us would be to zip some of them together into a larger file which would be uncompressed at the next build.

          Quick statistics from our build server:

          • Number of fingerprint files: 15,273,772
          • Time it took just to list those files using 'find': ~2 hours
          • Size of those files on disk for a 4k block size: 62.6 GB

          trejkaz added a comment - - edited We have huge problems with fingerprint files - taking up 50 gigabytes of disk space on a machine which is already stressed for storage, taking a long time to copy anywhere, taking Jenkins itself a long time to compact the fingerprints... Storing them in one file certainly seems to be a good fix. I'd almost suggest using a relational database for it, since a lot of time seems to be spent figuring out the links from a project which used the file to the project which generated the file. Our build generates at least 100 artifacts, so presumably the workaround for us would be to zip some of them together into a larger file which would be uncompressed at the next build. Quick statistics from our build server: Number of fingerprint files: 15,273,772 Time it took just to list those files using 'find': ~2 hours Size of those files on disk for a 4k block size: 62.6 GB

          Daniel Beck added a comment -

          trejkaz: Nobody is listing these files, so that should be irrelevant. And if you did write output to a terminal, writing to that may have taken longer than listing itself (try writing output to a file or /dev/null).

          What version of Jenkins are you using? How many builds' artifacts are you retaining?

          Daniel Beck added a comment - trejkaz: Nobody is listing these files, so that should be irrelevant. And if you did write output to a terminal, writing to that may have taken longer than listing itself (try writing output to a file or /dev/null). What version of Jenkins are you using? How many builds' artifacts are you retaining?

          Daniel Beck added a comment -

          trejkaz: Anything interesting in Fingerprint cleanup.log or jenkins.log related to fingerprints cleanup?

          Daniel Beck added a comment - trejkaz: Anything interesting in Fingerprint cleanup.log or jenkins.log related to fingerprints cleanup?

          trejkaz added a comment - "Nobody is listing the files"? https://github.com/jenkinsci/jenkins/blob/4a98beaf6463ea2e746fd837965676899d57b873/core/src/main/java/hudson/model/FingerprintCleanupThread.java#L68

          trejkaz added a comment -

          I'll check the logs myself on Monday, but the other guy (who for whatever reason seems to refuse to deal with tickets here, even though he's the one in charge of the build system) said that he didn't see anything odd, just that it was taking a long time.

          We're running version 1.600 and as far as I know, only retain the artifacts for the "last successful build", and only for four of our builds. But I have noticed on occasion Jenkins saying builds were being retained because of some kind of dependency on other builds. I don't know if that applies to artifacts as well.

          trejkaz added a comment - I'll check the logs myself on Monday, but the other guy (who for whatever reason seems to refuse to deal with tickets here, even though he's the one in charge of the build system) said that he didn't see anything odd, just that it was taking a long time. We're running version 1.600 and as far as I know, only retain the artifacts for the "last successful build", and only for four of our builds. But I have noticed on occasion Jenkins saying builds were being retained because of some kind of dependency on other builds. I don't know if that applies to artifacts as well.

          trejkaz added a comment - - edited

          Here's a quick back of the envelope calculation for our system.

          Compile:

          • Number of artifacts
            • Windows - 13,911
            • Linux - 13,818
            • total = 27,729
          • Keep max 20 builds with artifacts
            • total = 554,580

          Release:

          • Number of artifacts
            • Windows - 272
            • Linux - 278
            • Mac - 274
            • total = 824
          • Keep max 1 builds with artifacts
            • total = 824

          Total files expected from builds we're keeping: 554,580 + 824 = 555,404. (Compared to 15,273,772 files actually present.)

          Fingerprint cleanup.log (took a while to find because it wasn't in /var/log like the main Jenkins log...) has 8 million lines and all lines start with either "possibly trimming" or "deleting obsolete".

          trejkaz added a comment - - edited Here's a quick back of the envelope calculation for our system. Compile: Number of artifacts Windows - 13,911 Linux - 13,818 total = 27,729 Keep max 20 builds with artifacts total = 554,580 Release: Number of artifacts Windows - 272 Linux - 278 Mac - 274 total = 824 Keep max 1 builds with artifacts total = 824 Total files expected from builds we're keeping: 554,580 + 824 = 555,404. (Compared to 15,273,772 files actually present.) Fingerprint cleanup.log (took a while to find because it wasn't in /var/log like the main Jenkins log...) has 8 million lines and all lines start with either "possibly trimming" or "deleting obsolete".

          Daniel Beck added a comment -

          Uh, yeah, Jenkins cannot properly handle 28k artifacts per build. Just like it cannot really deal without some limitations with 100k projects or 5k connected slaves. I'm not surprised.

          You could just disable fingerprinting for these projects as well. Or do you need to know about every single one of these files where it originated?

          Daniel Beck added a comment - Uh, yeah, Jenkins cannot properly handle 28k artifacts per build. Just like it cannot really deal without some limitations with 100k projects or 5k connected slaves. I'm not surprised. You could just disable fingerprinting for these projects as well. Or do you need to know about every single one of these files where it originated?

          trejkaz added a comment -

          "Fingerprint all copied artifacts" is unchecked for these projects already, unless you're referring to another setting we haven't been able to find yet.

          trejkaz added a comment - "Fingerprint all copied artifacts" is unchecked for these projects already, unless you're referring to another setting we haven't been able to find yet.

          Daniel Beck added a comment -

          There's an option to fingerprint archived artifacts in the Advanced section of the post build step.

          Daniel Beck added a comment - There's an option to fingerprint archived artifacts in the Advanced section of the post build step.

          trejkaz added a comment -

          Not seeing it. I only see the one I already mentioned. Screenshot:

          trejkaz added a comment - Not seeing it. I only see the one I already mentioned. Screenshot:

            marcsanfacon Marc Sanfacon
            victorwss Victor Silva
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: