Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42422

Add support for directory caching in pod jobs

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      It would be great to be able to "cache" directories between job executions.
      In some cases it helps to greatly speedup job execution.
      Similar to Travis: https://docs.travis-ci.com/user/caching/.

      Now I achieve it with persistent volume and pipeline function doing explicit (re)store steps.

      What can be added to kubernetes-plugin:
      -option to manage caches (specifically drop cache for specific job)
      -add DSL construct like podTemplate(... cachePaths: ["A", "B/C"])
      -default strategy for cache management (shared NFS-backed volume or provisioned PVC per job)

          [JENKINS-42422] Add support for directory caching in pod jobs

          Jesse Glick added a comment -

          You can specify a PVC as a workspace volume, or additional volume, which would be practical if using a ReadWriteMany storage class like EFS. PR-988 discusses a trick based on pod reuse which was never clearly supported and apparently broke recently. But in general I would just discourage use of direct filesystem-level caching and reuse as being inherently problematic.

          If your main issue is repeated downloads of large sets of dependencies, this can be addressed fairly effectively with a local caching mirror (with e.g. LRU policy for cleanup): architecturally simple (new service independent of Jenkins), secure (no risk of builds from one job deliberately injecting malicious artifacts into a cache used by another), and robust (no risk of half-completed downloads and other such edge conditions causing persistent build outages).

          Many build tools also include their own distributed cache of artifacts from intermediate targets, keyed by hashes of source code or whatever.

          For cases that are not naturally handled by existing tools, I suppose a Jenkins plugin could be created to perform ad-hoc cache save and restore to a blob store with user-specified cache keys and other policy configuration, taking care to ensure that caches are isolated between jobs (since each job could have its own access control policy), perhaps handling multibranch projects specially (permitting PR-like branch projects to read but not write the cache from the base branch). https://github.com/actions/cache does something like this (apparently just backed by HTTP operations) and could serve as inspiration. Not sure if there is anything particularly K8s-specific here except that you could use, say, MinIO as a server, or find or write some minimal StatefulSet microservice to handle storage and make it easy to package alongside Jenkins.

          Jesse Glick added a comment - You can specify a PVC as a workspace volume, or additional volume, which would be practical if using a ReadWriteMany storage class like EFS. PR-988 discusses a trick based on pod reuse which was never clearly supported and apparently broke recently. But in general I would just discourage use of direct filesystem-level caching and reuse as being inherently problematic. If your main issue is repeated downloads of large sets of dependencies, this can be addressed fairly effectively with a local caching mirror (with e.g. LRU policy for cleanup): architecturally simple (new service independent of Jenkins), secure (no risk of builds from one job deliberately injecting malicious artifacts into a cache used by another), and robust (no risk of half-completed downloads and other such edge conditions causing persistent build outages). Many build tools also include their own distributed cache of artifacts from intermediate targets, keyed by hashes of source code or whatever. For cases that are not naturally handled by existing tools, I suppose a Jenkins plugin could be created to perform ad-hoc cache save and restore to a blob store with user-specified cache keys and other policy configuration, taking care to ensure that caches are isolated between jobs (since each job could have its own access control policy), perhaps handling multibranch projects specially (permitting PR-like branch projects to read but not write the cache from the base branch). https://github.com/actions/cache does something like this (apparently just backed by HTTP operations) and could serve as inspiration. Not sure if there is anything particularly K8s-specific here except that you could use, say, MinIO as a server, or find or write some minimal StatefulSet microservice to handle storage and make it easy to package alongside Jenkins.

          Jesse Glick added a comment -

          I forgot to mention https://plugins.jenkins.io/external-workspace-manager/ though I have no personal experience with it and am not sure if it is suitable for K8s.

          Jesse Glick added a comment - I forgot to mention https://plugins.jenkins.io/external-workspace-manager/ though I have no personal experience with it and am not sure if it is suitable for K8s.

          Dee Kryvenko added a comment - - edited

          jglick pod reusing is a bad idea, agreed. For me it just wouldn't work at all - each pod in unique in my setup.

          I am not sure PVC as a workspace volume or even https://github.com/jenkinsci/external-workspace-manager-plugin applies here - at least in my setup, to cache or share the actual workspace would mean a big security hole. I'm sorry, I couldn't see how can I create an additional volume with the kubernetes plugin at the moment, is there some example?

          I am actively using Nexus/Artifactory for local mirrors, I didn't even brought that up since I thought it's a common sense. But it just doesn't help in a 100% of cases. For Terraform, for instance, it's just not there yet in neither of them:

          https://issues.sonatype.org/browse/NEXUS-16089

          https://www.jfrog.com/jira/browse/RTFACT-16117

          And I just not ready to get into a contract with Hashicorp if all I need is a local mirroring but what I'll have to pay for is much more than that.

          Other examples such as Maven (maybe Gradle too - I'm not too familiar) just inherently bad at resolving dependency tree. Even with a mirror on a local network on a mid-size project it takes ridiculously long time to get dependencies.

          I wish mirror was the universal answer for the reasons you outlined, but it's just not.

          Risk of injecting malicious artifacts into a cache is a valid concern, but I think it's an ultimate responsibility of a cluster/jenkins admin to make sure it is secure and isolated. I don't see how kubernetes plugin or proposed PVC plugin is justified to limit what the user can do on the assumption that the admins is dumb. In my opinion plugins must provide utilities and it is up to the admins how to apply these utilities. Many admins, me included, are using Jenkins as an engine/framework - my users are not allowed to manage jobs or provide their own Jenkinsfile. My pipeline generator can be responsible for isolating different caches.

          All and all, I still think a new `kubernetesPersistentVolumeClaim` step would make sense. The question was should it be a part of the existing plugin or a new plugin...

          I was thinking about this a bit more and had another idea - this functionality can be implemented as separate controller, unrelated to Jenkins at all. Controller could watch for special annotation on a pod and fulfilling the `volumes.*.persistentVolumeClaim` requirements in a pod. The user would create a pod pointing to a non-existing PVC and eventually controller would create it based on config in the annotation, and then delete PVC when the pod requesting it is gone and remove claimRef from the associated PV. I thought of doing it as an admission controller first, but I guess it should work as a controller too and controllers are much easier to implement.

          Dee Kryvenko added a comment - - edited jglick  pod reusing is a bad idea, agreed. For me it just wouldn't work at all - each pod in unique in my setup. I am not sure PVC as a workspace volume or even https://github.com/jenkinsci/external-workspace-manager-plugin  applies here - at least in my setup, to cache or share the actual workspace would mean a big security hole. I'm sorry, I couldn't see how can I create an additional volume with the kubernetes plugin at the moment, is there some example? I am actively using Nexus/Artifactory for local mirrors, I didn't even brought that up since I thought it's a common sense. But it just doesn't help in a 100% of cases. For Terraform, for instance, it's just not there yet in neither of them: https://issues.sonatype.org/browse/NEXUS-16089 https://www.jfrog.com/jira/browse/RTFACT-16117 And I just not ready to get into a contract with Hashicorp if all I need is a local mirroring but what I'll have to pay for is much more than that. Other examples such as Maven (maybe Gradle too - I'm not too familiar) just inherently bad at resolving dependency tree. Even with a mirror on a local network on a mid-size project it takes ridiculously long time to get dependencies. I wish mirror was the universal answer for the reasons you outlined, but it's just not. Risk of injecting malicious artifacts into a cache is a valid concern, but I think it's an ultimate responsibility of a cluster/jenkins admin to make sure it is secure and isolated. I don't see how kubernetes plugin or proposed PVC plugin is justified to limit what the user can do on the assumption that the admins is dumb. In my opinion plugins must provide utilities and it is up to the admins how to apply these utilities. Many admins, me included, are using Jenkins as an engine/framework - my users are not allowed to manage jobs or provide their own Jenkinsfile. My pipeline generator can be responsible for isolating different caches. All and all, I still think a new `kubernetesPersistentVolumeClaim` step would make sense. The question was should it be a part of the existing plugin or a new plugin... I was thinking about this a bit more and had another idea - this functionality can be implemented as separate controller, unrelated to Jenkins at all. Controller could watch for special annotation on a pod and fulfilling the `volumes.*.persistentVolumeClaim` requirements in a pod. The user would create a pod pointing to a non-existing PVC and eventually controller would create it based on config in the annotation, and then delete PVC when the pod requesting it is gone and remove claimRef from the associated PV. I thought of doing it as an admission controller first, but I guess it should work as a controller too and controllers are much easier to implement.

          Dee Kryvenko added a comment -

          I took it as a weekend project and created two controllers - one as a provisioner and the other as a releaser: https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/pull/1/files.

          I'm gonna have to publish it to docker hub and create a helm chart for it, but meanwhile I'd appreciate any feedback.

          Dee Kryvenko added a comment - I took it as a weekend project and created two controllers - one as a provisioner and the other as a releaser: https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/pull/1/files . I'm gonna have to publish it to docker hub and create a helm chart for it, but meanwhile I'd appreciate any feedback.

          Dee Kryvenko added a comment -

          I gave it quite a bit of testing, published to the Docker Hub and created a helm chart.

          Please see https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers for instructions.

          See Jenkinsfile example here https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/tree/main/examples/jenkins-kubernetes-plugin-with-build-cache.

          Dee Kryvenko added a comment - I gave it quite a bit of testing, published to the Docker Hub and created a helm chart. Please see https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers  for instructions. See Jenkinsfile example here https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/tree/main/examples/jenkins-kubernetes-plugin-with-build-cache .

          Jesse Glick added a comment -

          Very nice! I think it would be helpful to link to this system from the README for this plugin.

          Jesse Glick added a comment - Very nice! I think it would be helpful to link to this system from the README for this plugin.

          Dee Kryvenko added a comment -

          Actually there is a 

                ephemeral: 
                  volumeClaimTemplate: 
                    ...
          

          See https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/, this should be able to provision PVCs dynamically so it seems like I have wasted my time creating my provisioner.
          But the need in automatic releaser still stands as I can't find anything else at the moment that can automatically make my PVs `Available` when they are `Released` so the next PVC can consume them.

          Dee Kryvenko added a comment - Actually there is a  ephemeral: volumeClaimTemplate: ... See https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/ , this should be able to provision PVCs dynamically so it seems like I have wasted my time creating my provisioner. But the need in automatic releaser still stands as I can't find anything else at the moment that can automatically make my PVs `Available` when they are `Released` so the next PVC can consume them.

          Jesse Glick added a comment -

          I do not see how https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes would be helpful for CI caching, as these would behave like emptyDir, unless you mean to use some platform-specific snapshotting system?

          Jesse Glick added a comment - I do not see how https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes would be helpful for CI caching, as these would behave like emptyDir , unless you mean to use some platform-specific snapshotting system?

          Dee Kryvenko added a comment -

          Well, `ephemeral.volumeClaimTemplate` basically does exactly what my provisioner does - dynamically creates a PVC for the pod and sets the pod as the owner so GC will delete PVC when the pod is deleted. Except its natively defined in a pod instead of the annotation. It's probably also more sophisticated behind the scene than my implementation because a scheduler is aware of a PVC from the get go and the PVC configuration will be available for all the admission controllers. The thing is that if a storage class used in the `volumeClaimTemplate` was configured with `reclaimPolicy: Retain` - the PV will stay after the PVC is gone, though it will be in `Released` state. That's why you'd still need my releaser to make it `Available` again so the next PVC could claim it.

          Here's the quote:

          In terms of resource ownership, a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s) that provide that ephemeral storage. When the Pod is deleted, the Kubernetes garbage collector deletes the PVC, which then usually triggers deletion of the volume because the default reclaim policy of storage classes is to delete volumes. You can create quasi-ephemeral local storage using a StorageClass with a reclaim policy of retain: the storage outlives the Pod, and in this case you need to ensure that volume clean up happens separately.

          The reason I am still looking into that is - my PVs are backed by EBS and I ran into another issue I reported here https://github.com/kubernetes/kubernetes/issues/103812. Basically, when PVC tries to grab pre-existent PV in the `Available` status - it completely ignores `WaitForFirstConsumer`. That means the PV getting bound before the pod getting scheduled. If a pod was using more than one PVC - they might be bound with PVs from different AZs, making such a pod unschedulable since EBS can only be mounted to ec2 within the same AZ. For now as a workaround I just locked down my caching storage class and build pods to a single AZ - same one Jenkins Controller lives in. Not like it makes anything worse - Jenkins controller by its nature is a single server anyway and it can only exist in one AZ at a time, so I wasn't really gaining anything by having my build pods to span up in multiple AZs. But I wonder if using `volumeClaimTemplate` will help there.

          After I test it all - I will deprecate and remove provisioner from my code and update docs. We only need the releaser piece at this point.

          Dee Kryvenko added a comment - Well, `ephemeral.volumeClaimTemplate` basically does exactly what my provisioner does - dynamically creates a PVC for the pod and sets the pod as the owner so GC will delete PVC when the pod is deleted. Except its natively defined in a pod instead of the annotation. It's probably also more sophisticated behind the scene than my implementation because a scheduler is aware of a PVC from the get go and the PVC configuration will be available for all the admission controllers. The thing is that if a storage class used in the `volumeClaimTemplate` was configured with `reclaimPolicy: Retain` - the PV will stay after the PVC is gone, though it will be in `Released` state. That's why you'd still need my releaser to make it `Available` again so the next PVC could claim it. Here's the quote: In terms of resource ownership, a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s) that provide that ephemeral storage. When the Pod is deleted, the Kubernetes garbage collector deletes the PVC, which then usually triggers deletion of the volume because the default reclaim policy of storage classes is to delete volumes. You can create quasi-ephemeral local storage using a StorageClass with a reclaim policy of retain: the storage outlives the Pod, and in this case you need to ensure that volume clean up happens separately. The reason I am still looking into that is - my PVs are backed by EBS and I ran into another issue I reported here https://github.com/kubernetes/kubernetes/issues/103812 . Basically, when PVC tries to grab pre-existent PV in the `Available` status - it completely ignores `WaitForFirstConsumer`. That means the PV getting bound before the pod getting scheduled. If a pod was using more than one PVC - they might be bound with PVs from different AZs, making such a pod unschedulable since EBS can only be mounted to ec2 within the same AZ. For now as a workaround I just locked down my caching storage class and build pods to a single AZ - same one Jenkins Controller lives in. Not like it makes anything worse - Jenkins controller by its nature is a single server anyway and it can only exist in one AZ at a time, so I wasn't really gaining anything by having my build pods to span up in multiple AZs. But I wonder if using `volumeClaimTemplate` will help there. After I test it all - I will deprecate and remove provisioner from my code and update docs. We only need the releaser piece at this point.

          Dee Kryvenko added a comment -

          I was trying to test `ephemeral.volumeClaimTemplate` and the part I missed in the doc was:

          FEATURE STATE: Kubernetes v1.21 [beta]

          I guess I didn't entirely wasted my time with the provisioner - I was just a bit ahead of k8s. I am still on 1.17.

          Dee Kryvenko added a comment - I was trying to test `ephemeral.volumeClaimTemplate` and the part I missed in the doc was: FEATURE STATE: Kubernetes v1.21 [beta] I guess I didn't entirely wasted my time with the provisioner - I was just a bit ahead of k8s. I am still on 1.17.

            Unassigned Unassigned
            electroma Roman Safronov
            Votes:
            13 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated: