Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-5597

symlinks in archive trees lead to double archiving

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Major Major
    • core
    • None
    • Centos 5.4
    • Jenkins 2.230

      If the tree you are archiving contains an internal symlink, the target files will be archived twice. This can lead to a very large increase in the size of the archived data and consequently, the time it takes to archive it.

      Example:

      /archive-root
      /big-directory
      /symlink -> big-directory

      Then every file in big directory will be archived twice.

      A fix would be for Hudson to detect internal symlinks and copy them rather than dereference them.

          [JENKINS-5597] symlinks in archive trees lead to double archiving

          pgweiss created issue -
          Andrew Bayer made changes -
          Assignee New: Andrew Bayer [ abayer ]

          Andrew Bayer added a comment -

          Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes.

          There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.

          Andrew Bayer added a comment - Changing this to an improvement request rather than a bug - we're using Ant's copy task, fileset and directory scanner for artifact archiving (and other recursive copying), so changing to recreate symlinks rather than dereferencing them (assuming, of course, that the underlying OS/filesystem can handle symlinks in the first place) would mean either writing our own equivalents to the Ant classes we're using or forking the existing Ant classes. There are a couple alternatives that I can see - first, if you know what the problematic symlink's name is, you can exclude it from the artifact archiving, in the advanced config for the artifact archiver. Second, we could add an option to ignore symlinks - the Ant classes in question already have an option to not follow symlinks, so it'd be pretty trivial to add an advanced option to take advantage of that. Of course, neither of these alternatives are particularly elegant, but until I dive deep enough into the Ant code to decide whether forking is really viable, they're definitely options.
          Andrew Bayer made changes -
          Issue Type Original: Bug [ 1 ] New: Improvement [ 4 ]
          Andrew Bayer made changes -
          Link New: This issue is duplicated by JENKINS-5993 [ JENKINS-5993 ]
          Alan Harder made changes -
          Component/s New: core [ 15593 ]

          BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks

          Vitalii Tymchyshyn added a comment - BTW: I'd be happy to make it configurable: follow/not follow external/internal symlinks

          Joshua Davis added a comment -

          I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.

          Joshua Davis added a comment - I've got a job that archives a file, and a symlink to the file (with a different name, obviously). When I upgraded to Jenkins 1.532.3 LTS, only the file is archived and not the symlink. We can work around that by copying the file, but I thought it might be good to know that the behavior did change recently.

          This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive.
          Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder.
          With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy).
          You may imagine the troubles I get in with some of my builds archive containing >20000 files per build!

          It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.

          Markus Schlegel added a comment - This BUG is preventing me from being able to use this Plugin for backup since I need to backup the build archive. Since a while ago, Jenkins uses symlinks in the build archive to make a build-number reference to the build (the symlink jobs/<jobname>/builds/1 points to the first build, for example jobs/<jobname>/builds/2014-10-09_15-42-46). Additionally, there are some special symlinks called "lastStableBuild", "lastSuccessfulBuild", "lastFailedBuild", "lastUnsuccessfulBuild" and "lastUnstableBuild", which also point to some of the job's real builds subfolder. With the current behavior, in the worst case where we have all sorts of these "last*Build" and exactly 5 builds, we end up in a backup which uses three times the diskspace compared to the space it would have required if symlinks would be handled correctly (by copying them as symlink instead of a full copy). You may imagine the troubles I get in with some of my builds archive containing >20000 files per build! It IS a Bug since Jenkins uses symlinks on its own in the meantime. The symlinks are not (in my case) part of the custom build result, but they are there by jenkin's design.
          R. Tyler Croy made changes -
          Workflow Original: JNJira [ 135689 ] New: JNJira + In-Review [ 174366 ]

            Unassigned Unassigned
            pgweiss pgweiss
            Votes:
            23 Vote for this issue
            Watchers:
            29 Start watching this issue

              Created:
              Updated:
              Resolved: