Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5597

symlinks in archive trees lead to double archiving

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Centos 5.4
    • Jenkins 2.230

      If the tree you are archiving contains an internal symlink, the target files will be archived twice. This can lead to a very large increase in the size of the archived data and consequently, the time it takes to archive it.

      Example:

      /archive-root
      /big-directory
      /symlink -> big-directory

      Then every file in big directory will be archived twice.

      A fix would be for Hudson to detect internal symlinks and copy them rather than dereference them.

          [JENKINS-5597] symlinks in archive trees lead to double archiving

          Andrew Bayer added a comment -

          Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes.

          There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.

          Andrew Bayer added a comment - Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes. There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.

          BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks

          Vitalii Tymchyshyn added a comment - BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks

          Joshua Davis added a comment -

          I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.

          Joshua Davis added a comment - I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.

          This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive.
          Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder.
          With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy).
          You may imagine the troubles I get in with some of my builds archive containing >20000 files per build!

          It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.

          Markus Schlegel added a comment - This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive. Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder. With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy). You may imagine the troubles I get in with some of my builds archive containing >20000 files per build! It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.

          Halvor Lund added a comment -

          I can confirm that this still is an issue. It seems that symlinks to files are archived correctly, whereas symlinks to directories are not, and the whole directory is copied instead. Any plans for fixing this bug?

          Halvor Lund added a comment - I can confirm that this still is an issue. It seems that symlinks to files are archived correctly, whereas symlinks to directories are not, and the whole directory is copied instead. Any plans for fixing this bug?

          Sorin Sbarnea added a comment -

          I am really not glad to see that after more than 7 years we still have nobody working on making a fix for this bug.

          Sorin Sbarnea added a comment - I am really not glad to see that after more than 7 years we still have nobody working on making a fix for this bug.

          E H added a comment -

          As well as being a a size issue, this breaks macOS Frameworks for code signing with a "bundle format is ambiguous (could be app or framework)" message.  I found this with stash/unstash, presumably the cause is the same.

          E H added a comment - As well as being a a size issue, this breaks macOS Frameworks for code signing with a "bundle format is ambiguous (could be app or framework)" message.  I found this with stash/unstash, presumably the cause is the same.

          E H added a comment - - edited

          Drawing on https://github.com/jenkinsci/pipeline-examples/blob/master/pipeline-examples/unstash-different-dir/unstashDifferentDir.groovy, the attached "JENKINS-5597-example.groovy" pipeline script will demonstrate the problem of symlinks to directories becoming directories.

          If the stash-related issue should be a different Jira issue please let me know and I'll create one.

          E H added a comment - - edited Drawing on https://github.com/jenkinsci/pipeline-examples/blob/master/pipeline-examples/unstash-different-dir/unstashDifferentDir.groovy,  the attached " JENKINS-5597 -example.groovy" pipeline script will demonstrate the problem of symlinks to directories becoming directories. If the stash-related issue should be a different Jira issue please let me know and I'll create one.

          To be certain, this is a BUG not an Improvement.  Archiving means to store a copy as is, not to interpret and alter the archive such that it does not reflect what is being archived.

          Why does this bug still exist 8.5 years later?

          Brian J Murrell added a comment - To be certain, this is a BUG not an Improvement.   Archiving means to store a copy as is, not to interpret and alter the archive such that it does not reflect what is being archived. Why does this bug still exist 8.5 years later?

          Markus Winter added a comment -

          Ran into an issue where a build made out of 1000 directory entries over 11 million for the ant DirectoryScanner because of symlinks to directories that again contains symlinks in a subfolder despite having an exclude pattern on the problematic folders.

          archive pattern: gen/**/*log

          exclude pattern: gen/out/modules/*/

          The symlinks were all below gen/out/modules but DirectoryScanner still tried to read everything in before applying the exclude.

          Agent process was started with -Xmx8g and ran oom.

           

          Markus Winter added a comment - Ran into an issue where a build made out of 1000 directory entries over 11 million for the ant DirectoryScanner because of symlinks to directories that again contains symlinks in a subfolder despite having an exclude pattern on the problematic folders. archive pattern: gen/**/*log exclude pattern: gen/out/modules/* / The symlinks were all below gen/out/modules but DirectoryScanner still tried to read everything in before applying the exclude. Agent process was started with -Xmx8g and ran oom.  

          Markus Winter added a comment -

          opened a pull request https://github.com/jenkinsci/jenkins/pull/3947 that make follow symlinks configurable

          Markus Winter added a comment - opened a pull request https://github.com/jenkinsci/jenkins/pull/3947 that make follow symlinks configurable

          Will this address a similar issue with stash/unstash as well ?

          We are using git to checkout sources - which have symbolic links. Then when the sources are stashed and unstashed the symbolic links are lost. They appear as separate directories. This is leading up to a series of issues including bloating up of the sanbox size.

          Aakash Sudhanwa added a comment - Will this address a similar issue with stash/unstash as well ? We are using git to checkout sources - which have symbolic links. Then when the sources are stashed and unstashed the symbolic links are lost. They appear as separate directories. This is leading up to a series of issues including bloating up of the sanbox size.

          Hi,

          Can someone please merge the changes to mains. We're very badly hurt by this issue, and currently building Jenkins manually for this change.

          There is a pending merge request  https://github.com/jenkinsci/jenkins/pull/3947 that makes follow symlinks configurable.

           

          Thanks,

          Abhishek

          Abhishek Sharma added a comment - Hi, Can someone please merge the changes to mains. We're very badly hurt by this issue, and currently building Jenkins manually for this change. There is a pending merge request   https://github.com/jenkinsci/jenkins/pull/3947  that makes follow symlinks configurable.   Thanks, Abhishek

          Oleg Nenashev added a comment -

          The change was released in Jenkins 2.230. Thanks to danielbeck wfollonier jthompson for reviews!

          Oleg Nenashev added a comment - The change was released in Jenkins 2.230. Thanks to danielbeck wfollonier jthompson for reviews!

            Unassigned Unassigned
            pgweiss pgweiss
            Votes:
            23 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: