Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5597

symlinks in archive trees lead to double archiving

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Component/s: core
    • Labels:
      None
    • Environment:
      Centos 5.4
    • Similar Issues:
    • Released As:
      Jenkins 2.230

      Description

      If the tree you are archiving contains an internal symlink, the target files will be archived twice. This can lead to a very large increase in the size of the archived data and consequently, the time it takes to archive it.

      Example:

      /archive-root
      /big-directory
      /symlink -> big-directory

      Then every file in big directory will be archived twice.

      A fix would be for Hudson to detect internal symlinks and copy them rather than dereference them.

        Attachments

          Issue Links

            Activity

            pgweiss pgweiss created issue -
            abayer Andrew Bayer made changes -
            Field Original Value New Value
            Assignee abayer [ abayer ]
            Hide
            abayer Andrew Bayer added a comment -

            Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes.

            There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.

            Show
            abayer Andrew Bayer added a comment - Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes. There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.
            abayer Andrew Bayer made changes -
            Issue Type Bug [ 1 ] Improvement [ 4 ]
            abayer Andrew Bayer made changes -
            Link This issue is duplicated by JENKINS-5993 [ JENKINS-5993 ]
            mindless Alan Harder made changes -
            Component/s core [ 15593 ]
            Hide
            tivv Vitalii Tymchyshyn added a comment -

            BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks

            Show
            tivv Vitalii Tymchyshyn added a comment - BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks
            Hide
            pgmjsd Joshua Davis added a comment -

            I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.

            Show
            pgmjsd Joshua Davis added a comment - I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.
            Hide
            schlegel_m Markus Schlegel added a comment -

            This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive.
            Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder.
            With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy).
            You may imagine the troubles I get in with some of my builds archive containing >20000 files per build!

            It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.

            Show
            schlegel_m Markus Schlegel added a comment - This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive. Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder. With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy). You may imagine the troubles I get in with some of my builds archive containing >20000 files per build! It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 135689 ] JNJira + In-Review [ 174366 ]
            Hide
            halvorlu Halvor Lund added a comment -

            I can confirm that this still is an issue. It seems that symlinks to files are archived correctly, whereas symlinks to directories are not, and the whole directory is copied instead. Any plans for fixing this bug?

            Show
            halvorlu Halvor Lund added a comment - I can confirm that this still is an issue. It seems that symlinks to files are archived correctly, whereas symlinks to directories are not, and the whole directory is copied instead. Any plans for fixing this bug?
            Hide
            ssbarnea Sorin Sbarnea added a comment -

            I am really not glad to see that after more than 7 years we still have nobody working on making a fix for this bug.

            Show
            ssbarnea Sorin Sbarnea added a comment - I am really not glad to see that after more than 7 years we still have nobody working on making a fix for this bug.
            Hide
            ehbbt E H added a comment -

            As well as being a a size issue, this breaks macOS Frameworks for code signing with a "bundle format is ambiguous (could be app or framework)" message.  I found this with stash/unstash, presumably the cause is the same.

            Show
            ehbbt E H added a comment - As well as being a a size issue, this breaks macOS Frameworks for code signing with a "bundle format is ambiguous (could be app or framework)" message.  I found this with stash/unstash, presumably the cause is the same.
            Hide
            ehbbt E H added a comment - - edited

            Drawing on https://github.com/jenkinsci/pipeline-examples/blob/master/pipeline-examples/unstash-different-dir/unstashDifferentDir.groovy, the attached "JENKINS-5597-example.groovy" pipeline script will demonstrate the problem of symlinks to directories becoming directories.

            If the stash-related issue should be a different Jira issue please let me know and I'll create one.

            Show
            ehbbt E H added a comment - - edited Drawing on https://github.com/jenkinsci/pipeline-examples/blob/master/pipeline-examples/unstash-different-dir/unstashDifferentDir.groovy,  the attached " JENKINS-5597 -example.groovy" pipeline script will demonstrate the problem of symlinks to directories becoming directories. If the stash-related issue should be a different Jira issue please let me know and I'll create one.
            ehbbt E H made changes -
            Attachment JENKINS-5597-example.groovy [ 43463 ]
            Hide
            brianjmurrell Brian J Murrell added a comment -

            To be certain, this is a BUG not an Improvement.  Archiving means to store a copy as is, not to interpret and alter the archive such that it does not reflect what is being archived.

            Why does this bug still exist 8.5 years later?

            Show
            brianjmurrell Brian J Murrell added a comment - To be certain, this is a BUG not an Improvement.   Archiving means to store a copy as is, not to interpret and alter the archive such that it does not reflect what is being archived. Why does this bug still exist 8.5 years later?
            Hide
            mwinter69 Markus Winter added a comment -

            Ran into an issue where a build made out of 1000 directory entries over 11 million for the ant DirectoryScanner because of symlinks to directories that again contains symlinks in a subfolder despite having an exclude pattern on the problematic folders.

            archive pattern: gen/**/*log

            exclude pattern: gen/out/modules/*/

            The symlinks were all below gen/out/modules but DirectoryScanner still tried to read everything in before applying the exclude.

            Agent process was started with -Xmx8g and ran oom.

             

            Show
            mwinter69 Markus Winter added a comment - Ran into an issue where a build made out of 1000 directory entries over 11 million for the ant DirectoryScanner because of symlinks to directories that again contains symlinks in a subfolder despite having an exclude pattern on the problematic folders. archive pattern: gen/**/*log exclude pattern: gen/out/modules/* / The symlinks were all below gen/out/modules but DirectoryScanner still tried to read everything in before applying the exclude. Agent process was started with -Xmx8g and ran oom.  
            Hide
            mwinter69 Markus Winter added a comment -

            opened a pull request https://github.com/jenkinsci/jenkins/pull/3947 that make follow symlinks configurable

            Show
            mwinter69 Markus Winter added a comment - opened a pull request https://github.com/jenkinsci/jenkins/pull/3947 that make follow symlinks configurable
            Hide
            aakashsd Aakash Sudhanwa added a comment -

            Will this address a similar issue with stash/unstash as well ?

            We are using git to checkout sources - which have symbolic links. Then when the sources are stashed and unstashed the symbolic links are lost. They appear as separate directories. This is leading up to a series of issues including bloating up of the sanbox size.

            Show
            aakashsd Aakash Sudhanwa added a comment - Will this address a similar issue with stash/unstash as well ? We are using git to checkout sources - which have symbolic links. Then when the sources are stashed and unstashed the symbolic links are lost. They appear as separate directories. This is leading up to a series of issues including bloating up of the sanbox size.
            Hide
            abhsha Abhishek Sharma added a comment -

            Hi,

            Can someone please merge the changes to mains. We're very badly hurt by this issue, and currently building Jenkins manually for this change.

            There is a pending merge request  https://github.com/jenkinsci/jenkins/pull/3947 that makes follow symlinks configurable.

             

            Thanks,

            Abhishek

            Show
            abhsha Abhishek Sharma added a comment - Hi, Can someone please merge the changes to mains. We're very badly hurt by this issue, and currently building Jenkins manually for this change. There is a pending merge request   https://github.com/jenkinsci/jenkins/pull/3947  that makes follow symlinks configurable.   Thanks, Abhishek
            Hide
            oleg_nenashev Oleg Nenashev added a comment -

            The change was released in Jenkins 2.230. Thanks to Daniel Beck Wadeck Follonier Jeff Thompson for reviews!

            Show
            oleg_nenashev Oleg Nenashev added a comment - The change was released in Jenkins 2.230. Thanks to Daniel Beck Wadeck Follonier Jeff Thompson for reviews!
            oleg_nenashev Oleg Nenashev made changes -
            Released As Jenkins 2.230
            Assignee Andrew Bayer [ abayer ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              pgweiss pgweiss
              Votes:
              23 Vote for this issue
              Watchers:
              29 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: