Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-33255

Largefiles do not work with distributed repository caching

    XMLWordPrintable

Details

    Description

      When repository caching is enabled for distributed builds, the repository that eventually ends up in the workspace on the slave has lost upstream path information such that it is no longer able to pull largefiles.

      For example, given a slave with:
      /var/build/hgcache/SOMEREPO - this was created by bundling changes from the master
      /var/build/workspace/MyJob - this was cloned/shared from SOMEREPO

      SOMEREPO does not contain any upstream information at all (because it was populated entirely using bundling) and MyJob only contains SOMEREPO as upstream, so when MyJob is updated to the target revision prior to building, it complains that the largefiles "are not available" from SOMEREPO.

      I think the fix for this is to detect whether largefiles are enabled, and if so, for the master to issue an explicit 'lfpull' command, specifying the original upstream URL. This should be done after changes are unbundled into SOMEREPO but before they are cloned/shared/pulled into MyJob, allowing the clone/share/pull to retrieve the largefiles 'naturally' and avoiding any warnings about an inability to find them during the initial update.

      In the long run it might be good to retrieve the largefiles from the master instead of upstream, but getting it working at all is the first step.

      Attachments

        Issue Links

          Activity

            superpig Richard Fine created issue -
            superpig Richard Fine made changes -
            Field Original Value New Value
            Description When repository caching is enabled for distributed builds, the repository that eventually ends up in the workspace on the slave has lost upstream path information such that it is no longer able to pull largefiles.

            For example, given a slave with:
            /var/build/hgcache/SOMEREPO - this was created by bundling changes from the master
            /var/build/workspace/MyJob - this was cloned/shared from SOMEREPO

            SOMEREPO does not contain any upstream information at all (because it was populated entirely using bundling) and MyJob only contains SOMEREPO as upstream.

            I think the fix for this is to detect whether largefiles are enabled, and if so, for the master to issue an explicit 'lfpull' command, specifying the original upstream URL. This should be done after changes are unbundled into SOMEREPO but before they are cloned/shared/pulled into MyJob, allowing the clone/share/pull to retrieve the largefiles 'naturally' and avoiding any warnings about an inability to find them during the initial update.

            In the long run it might be good to retrieve the largefiles from the master instead of upstream, but getting it working at all is the first step.
            When repository caching is enabled for distributed builds, the repository that eventually ends up in the workspace on the slave has lost upstream path information such that it is no longer able to pull largefiles.

            For example, given a slave with:
            /var/build/hgcache/SOMEREPO - this was created by bundling changes from the master
            /var/build/workspace/MyJob - this was cloned/shared from SOMEREPO

            SOMEREPO does not contain any upstream information at all (because it was populated entirely using bundling) and MyJob only contains SOMEREPO as upstream, so when MyJob is updated to the target revision prior to building, it complains that the largefiles "are not available" from SOMEREPO.

            I think the fix for this is to detect whether largefiles are enabled, and if so, for the master to issue an explicit 'lfpull' command, specifying the original upstream URL. This should be done after changes are unbundled into SOMEREPO but before they are cloned/shared/pulled into MyJob, allowing the clone/share/pull to retrieve the largefiles 'naturally' and avoiding any warnings about an inability to find them during the initial update.

            In the long run it might be good to retrieve the largefiles from the master instead of upstream, but getting it working at all is the first step.
            jglick Jesse Glick made changes -
            Assignee Jesse Glick [ jglick ] Richard Fine [ superpig ]
            jglick Jesse Glick made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jglick Jesse Glick made changes -
            Remote Link This issue links to "PR 76 (Web Link)" [ 14524 ]
            rtyler R. Tyler Croy made changes -
            Workflow JNJira [ 169146 ] JNJira + In-Review [ 185693 ]
            jglick Jesse Glick made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            jglick Jesse Glick made changes -
            Labels cache distributed mercurial cache distributed stalled-pr

            People

              superpig Richard Fine
              superpig Richard Fine
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: