Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42422

Add support for directory caching in pod jobs

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None

      It would be great to be able to "cache" directories between job executions.
      In some cases it helps to greatly speedup job execution.
      Similar to Travis: https://docs.travis-ci.com/user/caching/.

      Now I achieve it with persistent volume and pipeline function doing explicit (re)store steps.

      What can be added to kubernetes-plugin:
      -option to manage caches (specifically drop cache for specific job)
      -add DSL construct like podTemplate(... cachePaths: ["A", "B/C"])
      -default strategy for cache management (shared NFS-backed volume or provisioned PVC per job)

          [JENKINS-42422] Add support for directory caching in pod jobs

          Roman Safronov created issue -

          You can use persistent volumes in the pod, isn't that enough?

          Carlos Sanchez added a comment - You can use persistent volumes in the pod, isn't that enough?

          As I stated it description, PV is too basic:

          • There is no option to provision PV automatically (i.e. create EBS dedicated for this job)
          • I need to create separate script / job to drop caches
          • It's better to use backup/restore instead of direct file manipulation on the remote mount

          I like Travis' approach.

          Roman Safronov added a comment - As I stated it description, PV is too basic: There is no option to provision PV automatically (i.e. create EBS dedicated for this job) I need to create separate script / job to drop caches It's better to use backup/restore instead of direct file manipulation on the remote mount I like Travis' approach.

          Zach Langbert added a comment -

          A Travis-like approach would be awesome. PVs are somewhat clunky in this case imo.

          Zach Langbert added a comment - A Travis-like approach would be awesome. PVs are somewhat clunky in this case imo.

          This is easier now with the yaml syntax as you can create PVCs on demand

          Carlos Sanchez added a comment - This is easier now with the yaml syntax as you can create PVCs on demand

          J Alkjaer added a comment - - edited

          csanchez : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically generated based on the storageClass when PVC are created, but that of course is not the same

          Statefull sets seem to be the only way to dynamically create PVC's as those have volumeClaimTemplates 

          Even if the YAML field can include a PVC object, it would still need some mechanism to dynamically assign a name that can be referenced in the pods volume section

           

          J Alkjaer added a comment - - edited csanchez : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically generated based on the storageClass when PVC are created, but that of course is not the same Statefull sets seem to be the only way to dynamically create PVC's as those have  volumeClaimTemplates  Even if the YAML field can include a PVC object, it would still need some mechanism to dynamically assign a name that can be referenced in the pods volume section  

          R C added a comment -

          The only sane way to implement this is through StatefulSets, especially if you have more than one agent (who doesn't run Jenkins like that?).

          An example scenario:

          You're currently running three agents. A fourth one needs to be launched. As the last comment says, there's no easy way to create an unique volume name that can also be referenced by the new pod. StatefulSets do that: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#writing-to-stable-storage

          Assuming a StatefulSet named agent-XYZ with a VolumeClaimTemplate named workspace, the plugin would need to scale the StatefulSet to 4. Then Kubernetes will take care of creating pod agent-XYZ-3 and PVC workspace-agent-XYZ-3

          When, later, the cluster is idle, the plugin can scale the SS down to 1. Kubernetes will terminate pods 3, 2 and 1, in that order, leaving 0 still up. The volumes are KEPT, by design, which is probably what you want if they're a cache (I do, too). When, later the SS scales up, the volumes are already around.

          This is all nice and takes some complexity away from the plugin, but there's some extra work to be done.

          I haven't looked at the code, but I assume that the plugin treats each pod it creates in isolation, independent of each other (is that correct?). With StatefulSets, you probably want to fingerprint the pod definition so that identical ones are mapped to the same set (XYZ above would be such a hash). Then the plugin needs to track how many pods are needed for each fingerprint and scale the StatefulSets accordingly.

          The other problem is passing the unique JENKINS_SECRET to the pod. I don't think StatefulSets allow per-pod secrets/variables. So, either secrets need to be turned off (bad) or they need to be delivered out-of-band, e.g. in an init container.

           

           

          R C added a comment - The only sane way to implement this is through StatefulSets, especially if you have more than one agent (who doesn't run Jenkins like that?). An example scenario: You're currently running three agents. A fourth one needs to be launched. As the last comment says, there's no easy way to create an unique volume name that can also be referenced by the new pod. StatefulSets do that: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#writing-to-stable-storage Assuming a StatefulSet named agent-XYZ  with a VolumeClaimTemplate named workspace , the plugin would need to scale the StatefulSet to 4. Then Kubernetes will take care of creating pod agent-XYZ-3 and PVC workspace-agent-XYZ-3 .  When, later, the cluster is idle, the plugin can scale the SS down to 1. Kubernetes will terminate pods 3, 2 and 1, in that order, leaving 0 still up. The volumes are KEPT, by design, which is probably what you want if they're a cache (I do, too). When, later the SS scales up, the volumes are already around. This is all nice and takes some complexity away from the plugin, but there's some extra work to be done. I haven't looked at the code, but I assume that the plugin treats each pod it creates in isolation, independent of each other (is that correct?). With StatefulSets, you probably want to fingerprint the pod definition so that identical ones are mapped to the same set ( XYZ above would be such a hash). Then the plugin needs to track how many pods are needed for each fingerprint and scale the StatefulSets accordingly. The other problem is passing the unique JENKINS_SECRET to the pod. I don't think StatefulSets allow per-pod secrets/variables. So, either secrets need to be turned off (bad) or they need to be delivered out-of-band, e.g. in an init container.    

          statefulsets can't be used for agents if you want isolation and not long running agents. If job 1 ends before job 3 you can't scale down the SS or you'd kill the pod for job 3

          Carlos Sanchez added a comment - statefulsets can't be used for agents if you want isolation and not long running agents. If job 1 ends before job 3 you can't scale down the SS or you'd kill the pod for job 3

          R C added a comment -

          [Comment almost lost thanks to JIRA]

          Right, there's that, too, but I would be OK if jobs were scheduled starting from the lowest index available. In our case, with a hot cache (which is what we'd like to achieve), an agent's longest run wouldn't be orders of magnitude worse than the average or best case. That is still a net win, even with the occasional inefficiency here and there, like job 1 being idle every once in a while.

          I came across someone else's [experience with stateful agents|https://hiya.com/blog/2017/10/02/kubernetes-base-jenkins-stateful-agents/.] We don't spend 20 minutes doing setup work like they do, but there are pipelines where installing dependencies takes up longer than the actual tests.

          Is the current design of the plugin too reliant on creating pods manually? The least invasive way might be for the scaling to be handled by a HorizontalPodAutoscaler (it should work with StatefulSets). Then the plugin would pick available slaves in alphabetical order. Is there much left to the Kubernetes plugin, at that point?

           

          R C added a comment - [Comment almost lost thanks to JIRA] Right, there's that, too, but I would be OK if jobs were scheduled starting from the lowest index available. In our case, with a hot cache (which is what we'd like to achieve), an agent's longest run wouldn't be orders of magnitude worse than the average or best case. That is still a net win, even with the occasional inefficiency here and there, like job 1 being idle every once in a while. I came across someone else's [experience with stateful agents| https://hiya.com/blog/2017/10/02/kubernetes-base-jenkins-stateful-agents/ .] We don't spend 20 minutes doing setup work like they do, but there are pipelines where installing dependencies takes up longer than the actual tests. Is the current design of the plugin too reliant on creating pods manually? The least invasive way might be for the scaling to be handled by a HorizontalPodAutoscaler (it should work with StatefulSets). Then the plugin would pick available slaves in alphabetical order. Is there much left to the Kubernetes plugin, at that point?  

          I wrote about using a ReplicationController for agents like 4 years ago, but Jenkins needs to be able to kill the jobs that have finished, not a random one. https://www.infoq.com/articles/scaling-docker-kubernetes-v1
          At that point you could just have a pool of agents in k8s instead of dynamic allocation and use the swarm plugin to register them automatically

          About the original issue description, I would use a shared filesystem that supports multiple mounts for writing (NFS, gluster,...) and then it would be easier for developers, while being harder to setup. Doing something to support other mount-once volumes will be harder. There could be an option to copy things from a directory before the build and back after the build, but that's already implemented with stash()/unstash().
          Looking for concrete ideas on what to do, but I would not likely implement them as I have limited time

          Carlos Sanchez added a comment - I wrote about using a ReplicationController for agents like 4 years ago, but Jenkins needs to be able to kill the jobs that have finished, not a random one. https://www.infoq.com/articles/scaling-docker-kubernetes-v1 At that point you could just have a pool of agents in k8s instead of dynamic allocation and use the swarm plugin to register them automatically About the original issue description, I would use a shared filesystem that supports multiple mounts for writing (NFS, gluster,...) and then it would be easier for developers, while being harder to setup. Doing something to support other mount-once volumes will be harder. There could be an option to copy things from a directory before the build and back after the build, but that's already implemented with stash()/unstash(). Looking for concrete ideas on what to do, but I would not likely implement them as I have limited time

            Unassigned Unassigned
            electroma Roman Safronov
            Votes:
            13 Vote for this issue
            Watchers:
            23 Start watching this issue

              Created:
              Updated: