• Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • artifactory-plugin
    • None

      At least in our case, a project can produce quite a number of artifacts, some quite large and some which only change occasionally from one build to another (i.e. some artifacts change every time, some less frequently). It seems that both space and bandwidth could be saved by de-duplicating these seldom changed artifacts from one build to another.

      I imagine an algorithm where the server keeps a database of sums and sizes of stored artifacts and when a slave is going to send the artifacts of a build it first offers the sums and sizes of the artifacts. If the server finds potential matches, further verification of duplication could be performed (i.e. comparing random samples of the suspected duplicates) and once a duplicate has been confirmed, the server can either copy or link the artifact locally and tell the slave not to bother sending it.

          [JENKINS-9190] deduplicating build artifacts

          robsimon added a comment -

          What happens when the older artifact gets pruned? Then the link will show to nothing or am I wrong?

          robsimon added a comment - What happens when the older artifact gets pruned? Then the link will show to nothing or am I wrong?

          Brian Murrell added a comment -

          Pruned how? Like when a job's older (i.e. beyond the defined threshold) artifacts are deleted? Whatever happens currently would happen with de-duplicated artifacts. De-duplication is something that Jenkins does "behind the scenes" and would be transparent to the consumer side of Jenkins.

          Brian Murrell added a comment - Pruned how? Like when a job's older (i.e. beyond the defined threshold) artifacts are deleted? Whatever happens currently would happen with de-duplicated artifacts. De-duplication is something that Jenkins does "behind the scenes" and would be transparent to the consumer side of Jenkins.

          yossis added a comment -

          This is possible in Artifactory using what we call "checksum deploy". Support in Jenkins plugin is coming in the next version. We'll probably use it only for artifacts bigger that 10KB. You can follow the following issue: https://issues.jfrog.org/jira/browse/BI-126.

          yossis added a comment - This is possible in Artifactory using what we call "checksum deploy" . Support in Jenkins plugin is coming in the next version. We'll probably use it only for artifacts bigger that 10KB. You can follow the following issue: https://issues.jfrog.org/jira/browse/BI-126 .

          Brian Murrell added a comment -

          I don't find having to deploy an entirely new tool a reasonable resolution to this issue. The jenkins artifact archiving process is entirely suitable for our environment – if a bit wasteful in the storage of numerous (copies even) of duplicate artifacts.

          I am sure Artifactory is a great tool for people who have a need for for it, but to simply de-duplicate artifacts it feels like using a sledgehammer to drive a finishing nail.

          Brian Murrell added a comment - I don't find having to deploy an entirely new tool a reasonable resolution to this issue. The jenkins artifact archiving process is entirely suitable for our environment – if a bit wasteful in the storage of numerous (copies even) of duplicate artifacts. I am sure Artifactory is a great tool for people who have a need for for it, but to simply de-duplicate artifacts it feels like using a sledgehammer to drive a finishing nail.

            eyalbe Eyal Ben Moshe
            brian Brian Murrell
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: