[JENKINS-42422] Add support for directory caching in pod jobs

Carlos Sanchez added a comment - 2017-03-02 03:56

You can use persistent volumes in the pod, isn't that enough?

Carlos Sanchez added a comment - 2017-03-02 03:56 You can use persistent volumes in the pod, isn't that enough?

Roman Safronov added a comment - 2017-03-02 16:57

As I stated it description, PV is too basic:

There is no option to provision PV automatically (i.e. create EBS dedicated for this job)
I need to create separate script / job to drop caches
It's better to use backup/restore instead of direct file manipulation on the remote mount

I like Travis' approach.

Roman Safronov added a comment - 2017-03-02 16:57 As I stated it description, PV is too basic: There is no option to provision PV automatically (i.e. create EBS dedicated for this job) I need to create separate script / job to drop caches It's better to use backup/restore instead of direct file manipulation on the remote mount I like Travis' approach.

Zach Langbert added a comment - 2017-03-16 00:25

A Travis-like approach would be awesome. PVs are somewhat clunky in this case imo.

Zach Langbert added a comment - 2017-03-16 00:25 A Travis-like approach would be awesome. PVs are somewhat clunky in this case imo.

Carlos Sanchez added a comment - 2018-05-25 09:15

This is easier now with the yaml syntax as you can create PVCs on demand

Carlos Sanchez added a comment - 2018-05-25 09:15 This is easier now with the yaml syntax as you can create PVCs on demand

J Alkjaer added a comment - 2018-06-19 12:13 - edited

csanchez : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically generated based on the storageClass when PVC are created, but that of course is not the same

Statefull sets seem to be the only way to dynamically create PVC's as those have volumeClaimTemplates

Even if the YAML field can include a PVC object, it would still need some mechanism to dynamically assign a name that can be referenced in the pods volume section

J Alkjaer added a comment - 2018-06-19 12:13 - edited csanchez : Can you provide an example of how PVCs can be created on demand in the YAML syntax. I see no reference in the K8S docs on how to do this. PVC are there described to be objects with their own separate lifecycle - PV's can be dynamically generated based on the storageClass when PVC are created, but that of course is not the same Statefull sets seem to be the only way to dynamically create PVC's as those have volumeClaimTemplates Even if the YAML field can include a PVC object, it would still need some mechanism to dynamically assign a name that can be referenced in the pods volume section

R C added a comment - 2019-01-09 23:18

The only sane way to implement this is through StatefulSets, especially if you have more than one agent (who doesn't run Jenkins like that?).

An example scenario:

You're currently running three agents. A fourth one needs to be launched. As the last comment says, there's no easy way to create an unique volume name that can also be referenced by the new pod. StatefulSets do that: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#writing-to-stable-storage

Assuming a StatefulSet named agent-XYZ with a VolumeClaimTemplate named workspace, the plugin would need to scale the StatefulSet to 4. Then Kubernetes will take care of creating pod agent-XYZ-3 and PVC workspace-agent-XYZ-3.

When, later, the cluster is idle, the plugin can scale the SS down to 1. Kubernetes will terminate pods 3, 2 and 1, in that order, leaving 0 still up. The volumes are KEPT, by design, which is probably what you want if they're a cache (I do, too). When, later the SS scales up, the volumes are already around.

This is all nice and takes some complexity away from the plugin, but there's some extra work to be done.

I haven't looked at the code, but I assume that the plugin treats each pod it creates in isolation, independent of each other (is that correct?). With StatefulSets, you probably want to fingerprint the pod definition so that identical ones are mapped to the same set (XYZ above would be such a hash). Then the plugin needs to track how many pods are needed for each fingerprint and scale the StatefulSets accordingly.

The other problem is passing the unique JENKINS_SECRET to the pod. I don't think StatefulSets allow per-pod secrets/variables. So, either secrets need to be turned off (bad) or they need to be delivered out-of-band, e.g. in an init container.

R C added a comment - 2019-01-09 23:18 The only sane way to implement this is through StatefulSets, especially if you have more than one agent (who doesn't run Jenkins like that?). An example scenario: You're currently running three agents. A fourth one needs to be launched. As the last comment says, there's no easy way to create an unique volume name that can also be referenced by the new pod. StatefulSets do that: https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#writing-to-stable-storage Assuming a StatefulSet named agent-XYZ with a VolumeClaimTemplate named workspace , the plugin would need to scale the StatefulSet to 4. Then Kubernetes will take care of creating pod agent-XYZ-3 and PVC workspace-agent-XYZ-3 . When, later, the cluster is idle, the plugin can scale the SS down to 1. Kubernetes will terminate pods 3, 2 and 1, in that order, leaving 0 still up. The volumes are KEPT, by design, which is probably what you want if they're a cache (I do, too). When, later the SS scales up, the volumes are already around. This is all nice and takes some complexity away from the plugin, but there's some extra work to be done. I haven't looked at the code, but I assume that the plugin treats each pod it creates in isolation, independent of each other (is that correct?). With StatefulSets, you probably want to fingerprint the pod definition so that identical ones are mapped to the same set ( XYZ above would be such a hash). Then the plugin needs to track how many pods are needed for each fingerprint and scale the StatefulSets accordingly. The other problem is passing the unique JENKINS_SECRET to the pod. I don't think StatefulSets allow per-pod secrets/variables. So, either secrets need to be turned off (bad) or they need to be delivered out-of-band, e.g. in an init container.

Carlos Sanchez added a comment - 2019-01-10 08:28

statefulsets can't be used for agents if you want isolation and not long running agents. If job 1 ends before job 3 you can't scale down the SS or you'd kill the pod for job 3

Carlos Sanchez added a comment - 2019-01-10 08:28 statefulsets can't be used for agents if you want isolation and not long running agents. If job 1 ends before job 3 you can't scale down the SS or you'd kill the pod for job 3

R C added a comment - 2019-01-31 01:33

[Comment almost lost thanks to JIRA]

Right, there's that, too, but I would be OK if jobs were scheduled starting from the lowest index available. In our case, with a hot cache (which is what we'd like to achieve), an agent's longest run wouldn't be orders of magnitude worse than the average or best case. That is still a net win, even with the occasional inefficiency here and there, like job 1 being idle every once in a while.

I came across someone else's [experience with stateful agents|https://hiya.com/blog/2017/10/02/kubernetes-base-jenkins-stateful-agents/.] We don't spend 20 minutes doing setup work like they do, but there are pipelines where installing dependencies takes up longer than the actual tests.

Is the current design of the plugin too reliant on creating pods manually? The least invasive way might be for the scaling to be handled by a HorizontalPodAutoscaler (it should work with StatefulSets). Then the plugin would pick available slaves in alphabetical order. Is there much left to the Kubernetes plugin, at that point?

R C added a comment - 2019-01-31 01:33 [Comment almost lost thanks to JIRA] Right, there's that, too, but I would be OK if jobs were scheduled starting from the lowest index available. In our case, with a hot cache (which is what we'd like to achieve), an agent's longest run wouldn't be orders of magnitude worse than the average or best case. That is still a net win, even with the occasional inefficiency here and there, like job 1 being idle every once in a while. I came across someone else's [experience with stateful agents| https://hiya.com/blog/2017/10/02/kubernetes-base-jenkins-stateful-agents/ .] We don't spend 20 minutes doing setup work like they do, but there are pipelines where installing dependencies takes up longer than the actual tests. Is the current design of the plugin too reliant on creating pods manually? The least invasive way might be for the scaling to be handled by a HorizontalPodAutoscaler (it should work with StatefulSets). Then the plugin would pick available slaves in alphabetical order. Is there much left to the Kubernetes plugin, at that point?

Carlos Sanchez added a comment - 2019-01-31 08:51

I wrote about using a ReplicationController for agents like 4 years ago, but Jenkins needs to be able to kill the jobs that have finished, not a random one. https://www.infoq.com/articles/scaling-docker-kubernetes-v1
At that point you could just have a pool of agents in k8s instead of dynamic allocation and use the swarm plugin to register them automatically

About the original issue description, I would use a shared filesystem that supports multiple mounts for writing (NFS, gluster,...) and then it would be easier for developers, while being harder to setup. Doing something to support other mount-once volumes will be harder. There could be an option to copy things from a directory before the build and back after the build, but that's already implemented with stash()/unstash().
Looking for concrete ideas on what to do, but I would not likely implement them as I have limited time

Carlos Sanchez added a comment - 2019-01-31 08:51 I wrote about using a ReplicationController for agents like 4 years ago, but Jenkins needs to be able to kill the jobs that have finished, not a random one. https://www.infoq.com/articles/scaling-docker-kubernetes-v1 At that point you could just have a pool of agents in k8s instead of dynamic allocation and use the swarm plugin to register them automatically About the original issue description, I would use a shared filesystem that supports multiple mounts for writing (NFS, gluster,...) and then it would be easier for developers, while being harder to setup. Doing something to support other mount-once volumes will be harder. There could be an option to copy things from a directory before the build and back after the build, but that's already implemented with stash()/unstash(). Looking for concrete ideas on what to do, but I would not likely implement them as I have limited time

suryatej yaramada added a comment - 2019-10-09 05:04

Can we achieve this by using EFS by any chance if so I would like to know how? we already mount jenkins master /var/lib/jenkins which is running on EC2 to EFS. we using kubernetes plugin to start dynamic agents and run our maven/gradle/npm jobs but those jobs takiing some extra time as cache is not happening so for every build it downloading from artifactory. I tried with mounting /root/.m2 directory but problem is not able to start another job till this job finishes. I would like to know if any work around for this.

Really appreciated
`
Multi-Attach error for volume "pvc-4e98f6b9-ea38-11e9-aa7d-02dbf42e9b46" Volume is already used by pod(s) sandbox-surya-maven-cache-3-8hk2s-n2v0v-l4f9h

suryatej yaramada added a comment - 2019-10-09 05:04 Can we achieve this by using EFS by any chance if so I would like to know how? we already mount jenkins master /var/lib/jenkins which is running on EC2 to EFS. we using kubernetes plugin to start dynamic agents and run our maven/gradle/npm jobs but those jobs takiing some extra time as cache is not happening so for every build it downloading from artifactory. I tried with mounting /root/.m2 directory but problem is not able to start another job till this job finishes. I would like to know if any work around for this. Really appreciated ` Multi-Attach error for volume "pvc-4e98f6b9-ea38-11e9-aa7d-02dbf42e9b46" Volume is already used by pod(s) sandbox-surya-maven-cache-3-8hk2s-n2v0v-l4f9h

Tristan FAURE added a comment - 2020-03-26 07:31 - edited

I also have the same issue : https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jenkinsci-users/X61BK83LHLU/hztGcbkvBgAJ

I don't really know travis but i assume it is equivalent to https://docs.gitlab.com/ee/ci/caching/

about csanchez comments :

we did not find either how to create pvc in the yaml syntax it was always overrident when we tried
stash / unstash or archiveArtifacts could help us but how can configure it to store in a pv ? how long the data is kept if there are no unstash ?
- edit: as we want data available across jobs, stash unstash does not seem relevant

Tristan FAURE added a comment - 2020-03-26 07:31 - edited I also have the same issue : https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jenkinsci-users/X61BK83LHLU/hztGcbkvBgAJ I don't really know travis but i assume it is equivalent to https://docs.gitlab.com/ee/ci/caching/ about csanchez comments : we did not find either how to create pvc in the yaml syntax it was always overrident when we tried stash / unstash or archiveArtifacts could help us but how can configure it to store in a pv ? how long the data is kept if there are no unstash ? edit: as we want data available across jobs, stash unstash does not seem relevant

Matthew Ludlum added a comment - 2020-05-11 21:41 - edited

csanchez vlatombe - I'm interested on working on something related to this. Most competing CI platforms offer something like this:

https://help.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows

https://docs.gitlab.com/ee/ci/caching/

https://docs.travis-ci.com/user/caching/

The question I am juggling around is whether this is something that either the pipeline side or the agent side should handle.

Matthew Ludlum added a comment - 2020-05-11 21:41 - edited csanchez vlatombe - I'm interested on working on something related to this. Most competing CI platforms offer something like this: https://help.github.com/en/actions/configuring-and-managing-workflows/caching-dependencies-to-speed-up-workflows https://docs.gitlab.com/ee/ci/caching/ https://docs.travis-ci.com/user/caching/ The question I am juggling around is whether this is something that either the pipeline side or the agent side should handle.

Vincent Latombe added a comment - 2021-01-05 15:21

This is not specific to kubernetes plugin, more like a new api that would be in betweeen stash/unstash and archive API with the persistence scope being the job, or even multibranch job.

Vincent Latombe added a comment - 2021-01-05 15:21 This is not specific to kubernetes plugin, more like a new api that would be in betweeen stash/unstash and archive API with the persistence scope being the job, or even multibranch job.

Matthew Ludlum added a comment - 2021-01-08 21:14

The goal I think was to provide a better experience than stash/unstash such that we'd avoid inundating the master with more storage requests than we'd expect.

https://github.com/jenkinsci/artifact-manager-s3-plugin#pipeline-job might actually be a better fit for this sort of work. The key was seeing https://github.com/jenkinsci/artifact-manager-s3-plugin/blob/master/src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java#L80 as I wasn't finding any docs making obvious to me what drives stash/unstash.

Since the buckets can be configured to use lifecycle, we can use it for proper caching without doing full artifact management.

It is slightly less desirable since it requires co-opting the stash/unstash commands but it bears consideration.

I might suggest it be expanded to include nfs and other storage solutions but I think that can be decided at a later time.

Matthew Ludlum added a comment - 2021-01-08 21:14 The goal I think was to provide a better experience than stash/unstash such that we'd avoid inundating the master with more storage requests than we'd expect. https://github.com/jenkinsci/artifact-manager-s3-plugin#pipeline-job might actually be a better fit for this sort of work. The key was seeing https://github.com/jenkinsci/artifact-manager-s3-plugin/blob/master/src/main/java/io/jenkins/plugins/artifact_manager_jclouds/JCloudsArtifactManager.java#L80 as I wasn't finding any docs making obvious to me what drives stash/unstash. Since the buckets can be configured to use lifecycle, we can use it for proper caching without doing full artifact management. It is slightly less desirable since it requires co-opting the stash/unstash commands but it bears consideration. I might suggest it be expanded to include nfs and other storage solutions but I think that can be decided at a later time.

Igor Sarkisov added a comment - 2021-05-14 21:40

Has anyone figured a workaround for this? We are considering moving Jenkins to Kubernetes, but we use ccache which stores its cache on Jenkins agents. In Kubernetes we'll be running ephemeral agent pods and we are not sure how we are going to handle ccache. We are thinking of using hostPath volume, but not sure if that will work (e.g. how multiple pods going to write cache to same host mounted volume).

Igor Sarkisov added a comment - 2021-05-14 21:40 Has anyone figured a workaround for this? We are considering moving Jenkins to Kubernetes, but we use ccache which stores its cache on Jenkins agents. In Kubernetes we'll be running ephemeral agent pods and we are not sure how we are going to handle ccache. We are thinking of using hostPath volume, but not sure if that will work (e.g. how multiple pods going to write cache to same host mounted volume).

Dee Kryvenko added a comment - 2021-06-18 05:08 - edited

I am surprised this is such a quiet topic - I would imagine this day and age everyone having this same problem as everyone makes Jenkins ephemeral and dynamic using K8s. I am having it too. Let me share some of my own findings:

First let's address StatefulSets - I think it would be a terrible idea to use them in Jenkins. Some of the reasons csanchez already pointed out, but also - the main advantage of this k8s plugin is the ability to dynamically generate build pods accordingly to repository configuration and/or user input. That cannot be a stateful set for the same reasons it cannot be a deployment - this plugin manages pods directly, not even a replica sets. In a way this plugin is a k8s controller, except that its implementation is not exactly controller-ish and it's not even running in k8s per se. If you're using this plugin and thinking of StatefulSets - chances are you upgraded your technology but not your thinking, you should stop thinking in terms of a fleet of jenkins agents in stand-by waiting for a job to be scheduled to it, instead think of dynamic ephemeral build pods. Otherwise why even use this plugin - you can create a Deployment or StatefulSet with helm or whatever and let it dial in back to Jenkins.

Now, `hostPath` was a brilliant idea for cache, or so I thought until I finally noticed this https://kubernetes.io/docs/concepts/storage/volumes/#hostpath:

The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume

It basically means `fsGroup` of a security context will be ignored for `hostPath` volumes. This will only work if all pod containers are running as root, or you will have to run chmod/chown in a separate root container all the time to workaround access issues. This isn't fun and I abandoned this idea.

I briefly looked at the Local volumes https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ but this cannot be used either - a PVC once provisioned creates an affinity to the particular k8s node. Once node goes down - no pods using this PVC will be able to schedule ever again. Wasted time.

Then I saw a comment from seakip18 and I thought of using a combination of https://plugins.jenkins.io/artifact-manager-s3/ and https://plugins.jenkins.io/copyartifact/ plugins. It will work, yes, but honestly I am not sure if this is something I will enjoy dealing with. In a world of ephemeral dynamic build pods per stage, logic to archive/copy/unarchive and then moving to a proper system places like ~/.m2 might get too complex, not to mention - transferring to/from S3 will not exactly be fast and it will have to be done in each stage. Too much overhead.

The most promising idea I have is to use EFS (or any other storage that supports ReadWriteMany). It will require to deploy an EFS CSI driver, create an EFS, create StorageClass for it, create a PVC and mount via `persistentVolumeClaim` under the `volumes` section of a pod. EFS is basically an NFS, so it does not support an FS level locks. You cannot let multiple builds to work in the same directory, which somewhat defeats the purpose of a "cache". If you'll have a sub-directory or even a separate EFS per job - you'll have a lot of duplicated cache and overall it will be not very efficient.

I think I have an idea how the proper solution have to work. There must be a pool of reclaimable PVs similarly to how StatefulSet does it.

A wrapper step in the pipeline that receives a PVC yaml as an argument, something like `kubernetesPersistentVolumeClaim('...yaml...') { ... }`.
It will have to create a PVC and then get to the inner steps, exposing PVC name similarly to how `POD_LABEL` works in the k8s plugin. You could then define a `node` inside using a `podTemplate` that in turn uses this PVC name.
When it is done with the inner steps and if the retention policy was Retain - it will have to delete the PVC and then remove `spec.claimRef` from the PV which will make it Available for the next PVC to claim.

The way this can be used is - the user must supply a PVC spec that uses `selector.matchLabels` to search for pre-existing PVs before creating a new. The StorageClass also must have a reclaim policy set to Retain.

This basically re-implements a StatefulSet controller with the only exception that it doesn't perform any cleanup activities upon releasing PV back to the pool. And it doesn't rely on the index if my understanding of `selector.matchLabels` in this case is correct (it will try to find an existing PV matching the labels and if not found - create new?).

This will enable to use any ReadWriteOnce storage types such as EBS for cache, and kubernetes will guarantee exclusivity of a claim while it's in use - so no race conditions.

I am not sure this step should be a part of the kubernetes plugin though - sounds like a beast of its own kind. But maybe there is an opportunity to re-use some code from this plugin? csanchez jglick vlatombe what do you think?

Dee Kryvenko added a comment - 2021-06-18 05:08 - edited I am surprised this is such a quiet topic - I would imagine this day and age everyone having this same problem as everyone makes Jenkins ephemeral and dynamic using K8s. I am having it too. Let me share some of my own findings: First let's address StatefulSets - I think it would be a terrible idea to use them in Jenkins. Some of the reasons csanchez already pointed out, but also - the main advantage of this k8s plugin is the ability to dynamically generate build pods accordingly to repository configuration and/or user input. That cannot be a stateful set for the same reasons it cannot be a deployment - this plugin manages pods directly, not even a replica sets. In a way this plugin is a k8s controller, except that its implementation is not exactly controller-ish and it's not even running in k8s per se. If you're using this plugin and thinking of StatefulSets - chances are you upgraded your technology but not your thinking, you should stop thinking in terms of a fleet of jenkins agents in stand-by waiting for a job to be scheduled to it, instead think of dynamic ephemeral build pods. Otherwise why even use this plugin - you can create a Deployment or StatefulSet with helm or whatever and let it dial in back to Jenkins. Now, `hostPath` was a brilliant idea for cache, or so I thought until I finally noticed this https://kubernetes.io/docs/concepts/storage/volumes/#hostpath : The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume It basically means `fsGroup` of a security context will be ignored for `hostPath` volumes. This will only work if all pod containers are running as root, or you will have to run chmod/chown in a separate root container all the time to workaround access issues. This isn't fun and I abandoned this idea. I briefly looked at the Local volumes https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ but this cannot be used either - a PVC once provisioned creates an affinity to the particular k8s node. Once node goes down - no pods using this PVC will be able to schedule ever again. Wasted time. Then I saw a comment from seakip18 and I thought of using a combination of https://plugins.jenkins.io/artifact-manager-s3/ and https://plugins.jenkins.io/copyartifact/ plugins. It will work, yes, but honestly I am not sure if this is something I will enjoy dealing with. In a world of ephemeral dynamic build pods per stage, logic to archive/copy/unarchive and then moving to a proper system places like ~/.m2 might get too complex, not to mention - transferring to/from S3 will not exactly be fast and it will have to be done in each stage. Too much overhead. The most promising idea I have is to use EFS (or any other storage that supports ReadWriteMany). It will require to deploy an EFS CSI driver, create an EFS, create StorageClass for it, create a PVC and mount via `persistentVolumeClaim` under the `volumes` section of a pod. EFS is basically an NFS, so it does not support an FS level locks. You cannot let multiple builds to work in the same directory, which somewhat defeats the purpose of a "cache". If you'll have a sub-directory or even a separate EFS per job - you'll have a lot of duplicated cache and overall it will be not very efficient. I think I have an idea how the proper solution have to work. There must be a pool of reclaimable PVs similarly to how StatefulSet does it. A wrapper step in the pipeline that receives a PVC yaml as an argument, something like `kubernetesPersistentVolumeClaim('...yaml...') { ... }`. It will have to create a PVC and then get to the inner steps, exposing PVC name similarly to how `POD_LABEL` works in the k8s plugin. You could then define a `node` inside using a `podTemplate` that in turn uses this PVC name. When it is done with the inner steps and if the retention policy was Retain - it will have to delete the PVC and then remove `spec.claimRef` from the PV which will make it Available for the next PVC to claim. The way this can be used is - the user must supply a PVC spec that uses `selector.matchLabels` to search for pre-existing PVs before creating a new. The StorageClass also must have a reclaim policy set to Retain. This basically re-implements a StatefulSet controller with the only exception that it doesn't perform any cleanup activities upon releasing PV back to the pool. And it doesn't rely on the index if my understanding of `selector.matchLabels` in this case is correct (it will try to find an existing PV matching the labels and if not found - create new?). This will enable to use any ReadWriteOnce storage types such as EBS for cache, and kubernetes will guarantee exclusivity of a claim while it's in use - so no race conditions. I am not sure this step should be a part of the kubernetes plugin though - sounds like a beast of its own kind. But maybe there is an opportunity to re-use some code from this plugin? csanchez jglick vlatombe what do you think?

Jesse Glick added a comment - 2021-06-18 13:17

You can specify a PVC as a workspace volume, or additional volume, which would be practical if using a ReadWriteMany storage class like EFS. PR-988 discusses a trick based on pod reuse which was never clearly supported and apparently broke recently. But in general I would just discourage use of direct filesystem-level caching and reuse as being inherently problematic.

If your main issue is repeated downloads of large sets of dependencies, this can be addressed fairly effectively with a local caching mirror (with e.g. LRU policy for cleanup): architecturally simple (new service independent of Jenkins), secure (no risk of builds from one job deliberately injecting malicious artifacts into a cache used by another), and robust (no risk of half-completed downloads and other such edge conditions causing persistent build outages).

Many build tools also include their own distributed cache of artifacts from intermediate targets, keyed by hashes of source code or whatever.

For cases that are not naturally handled by existing tools, I suppose a Jenkins plugin could be created to perform ad-hoc cache save and restore to a blob store with user-specified cache keys and other policy configuration, taking care to ensure that caches are isolated between jobs (since each job could have its own access control policy), perhaps handling multibranch projects specially (permitting PR-like branch projects to read but not write the cache from the base branch). https://github.com/actions/cache does something like this (apparently just backed by HTTP operations) and could serve as inspiration. Not sure if there is anything particularly K8s-specific here except that you could use, say, MinIO as a server, or find or write some minimal StatefulSet microservice to handle storage and make it easy to package alongside Jenkins.

Jesse Glick added a comment - 2021-06-18 13:17 You can specify a PVC as a workspace volume, or additional volume, which would be practical if using a ReadWriteMany storage class like EFS. PR-988 discusses a trick based on pod reuse which was never clearly supported and apparently broke recently. But in general I would just discourage use of direct filesystem-level caching and reuse as being inherently problematic. If your main issue is repeated downloads of large sets of dependencies, this can be addressed fairly effectively with a local caching mirror (with e.g. LRU policy for cleanup): architecturally simple (new service independent of Jenkins), secure (no risk of builds from one job deliberately injecting malicious artifacts into a cache used by another), and robust (no risk of half-completed downloads and other such edge conditions causing persistent build outages). Many build tools also include their own distributed cache of artifacts from intermediate targets, keyed by hashes of source code or whatever. For cases that are not naturally handled by existing tools, I suppose a Jenkins plugin could be created to perform ad-hoc cache save and restore to a blob store with user-specified cache keys and other policy configuration, taking care to ensure that caches are isolated between jobs (since each job could have its own access control policy), perhaps handling multibranch projects specially (permitting PR-like branch projects to read but not write the cache from the base branch). https://github.com/actions/cache does something like this (apparently just backed by HTTP operations) and could serve as inspiration. Not sure if there is anything particularly K8s-specific here except that you could use, say, MinIO as a server, or find or write some minimal StatefulSet microservice to handle storage and make it easy to package alongside Jenkins.

Jesse Glick added a comment - 2021-06-18 13:19

I forgot to mention https://plugins.jenkins.io/external-workspace-manager/ though I have no personal experience with it and am not sure if it is suitable for K8s.

Jesse Glick added a comment - 2021-06-18 13:19 I forgot to mention https://plugins.jenkins.io/external-workspace-manager/ though I have no personal experience with it and am not sure if it is suitable for K8s.

Dee Kryvenko added a comment - 2021-06-18 20:25 - edited

jglick pod reusing is a bad idea, agreed. For me it just wouldn't work at all - each pod in unique in my setup.

I am not sure PVC as a workspace volume or even https://github.com/jenkinsci/external-workspace-manager-plugin applies here - at least in my setup, to cache or share the actual workspace would mean a big security hole. I'm sorry, I couldn't see how can I create an additional volume with the kubernetes plugin at the moment, is there some example?

I am actively using Nexus/Artifactory for local mirrors, I didn't even brought that up since I thought it's a common sense. But it just doesn't help in a 100% of cases. For Terraform, for instance, it's just not there yet in neither of them:

https://issues.sonatype.org/browse/NEXUS-16089

https://www.jfrog.com/jira/browse/RTFACT-16117

And I just not ready to get into a contract with Hashicorp if all I need is a local mirroring but what I'll have to pay for is much more than that.

Other examples such as Maven (maybe Gradle too - I'm not too familiar) just inherently bad at resolving dependency tree. Even with a mirror on a local network on a mid-size project it takes ridiculously long time to get dependencies.

I wish mirror was the universal answer for the reasons you outlined, but it's just not.

Risk of injecting malicious artifacts into a cache is a valid concern, but I think it's an ultimate responsibility of a cluster/jenkins admin to make sure it is secure and isolated. I don't see how kubernetes plugin or proposed PVC plugin is justified to limit what the user can do on the assumption that the admins is dumb. In my opinion plugins must provide utilities and it is up to the admins how to apply these utilities. Many admins, me included, are using Jenkins as an engine/framework - my users are not allowed to manage jobs or provide their own Jenkinsfile. My pipeline generator can be responsible for isolating different caches.

All and all, I still think a new `kubernetesPersistentVolumeClaim` step would make sense. The question was should it be a part of the existing plugin or a new plugin...

I was thinking about this a bit more and had another idea - this functionality can be implemented as separate controller, unrelated to Jenkins at all. Controller could watch for special annotation on a pod and fulfilling the `volumes.*.persistentVolumeClaim` requirements in a pod. The user would create a pod pointing to a non-existing PVC and eventually controller would create it based on config in the annotation, and then delete PVC when the pod requesting it is gone and remove claimRef from the associated PV. I thought of doing it as an admission controller first, but I guess it should work as a controller too and controllers are much easier to implement.

Dee Kryvenko added a comment - 2021-06-18 20:25 - edited jglick pod reusing is a bad idea, agreed. For me it just wouldn't work at all - each pod in unique in my setup. I am not sure PVC as a workspace volume or even https://github.com/jenkinsci/external-workspace-manager-plugin applies here - at least in my setup, to cache or share the actual workspace would mean a big security hole. I'm sorry, I couldn't see how can I create an additional volume with the kubernetes plugin at the moment, is there some example? I am actively using Nexus/Artifactory for local mirrors, I didn't even brought that up since I thought it's a common sense. But it just doesn't help in a 100% of cases. For Terraform, for instance, it's just not there yet in neither of them: https://issues.sonatype.org/browse/NEXUS-16089 https://www.jfrog.com/jira/browse/RTFACT-16117 And I just not ready to get into a contract with Hashicorp if all I need is a local mirroring but what I'll have to pay for is much more than that. Other examples such as Maven (maybe Gradle too - I'm not too familiar) just inherently bad at resolving dependency tree. Even with a mirror on a local network on a mid-size project it takes ridiculously long time to get dependencies. I wish mirror was the universal answer for the reasons you outlined, but it's just not. Risk of injecting malicious artifacts into a cache is a valid concern, but I think it's an ultimate responsibility of a cluster/jenkins admin to make sure it is secure and isolated. I don't see how kubernetes plugin or proposed PVC plugin is justified to limit what the user can do on the assumption that the admins is dumb. In my opinion plugins must provide utilities and it is up to the admins how to apply these utilities. Many admins, me included, are using Jenkins as an engine/framework - my users are not allowed to manage jobs or provide their own Jenkinsfile. My pipeline generator can be responsible for isolating different caches. All and all, I still think a new `kubernetesPersistentVolumeClaim` step would make sense. The question was should it be a part of the existing plugin or a new plugin... I was thinking about this a bit more and had another idea - this functionality can be implemented as separate controller, unrelated to Jenkins at all. Controller could watch for special annotation on a pod and fulfilling the `volumes.*.persistentVolumeClaim` requirements in a pod. The user would create a pod pointing to a non-existing PVC and eventually controller would create it based on config in the annotation, and then delete PVC when the pod requesting it is gone and remove claimRef from the associated PV. I thought of doing it as an admission controller first, but I guess it should work as a controller too and controllers are much easier to implement.

Dee Kryvenko added a comment - 2021-06-20 02:14

I took it as a weekend project and created two controllers - one as a provisioner and the other as a releaser: https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/pull/1/files.

I'm gonna have to publish it to docker hub and create a helm chart for it, but meanwhile I'd appreciate any feedback.

Dee Kryvenko added a comment - 2021-06-20 02:14 I took it as a weekend project and created two controllers - one as a provisioner and the other as a releaser: https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/pull/1/files . I'm gonna have to publish it to docker hub and create a helm chart for it, but meanwhile I'd appreciate any feedback.

Dee Kryvenko added a comment - 2021-06-20 09:31

I gave it quite a bit of testing, published to the Docker Hub and created a helm chart.

Please see https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers for instructions.

See Jenkinsfile example here https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/tree/main/examples/jenkins-kubernetes-plugin-with-build-cache.

Dee Kryvenko added a comment - 2021-06-20 09:31 I gave it quite a bit of testing, published to the Docker Hub and created a helm chart. Please see https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers for instructions. See Jenkinsfile example here https://github.com/plumber-cd/kubernetes-dynamic-reclaimable-pvc-controllers/tree/main/examples/jenkins-kubernetes-plugin-with-build-cache .

Jesse Glick added a comment - 2021-06-21 18:25

Very nice! I think it would be helpful to link to this system from the README for this plugin.

Jesse Glick added a comment - 2021-06-21 18:25 Very nice! I think it would be helpful to link to this system from the README for this plugin.

Dee Kryvenko added a comment - 2021-07-26 21:14

Actually there is a

      ephemeral: 
        volumeClaimTemplate: 
          ...

See https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/, this should be able to provision PVCs dynamically so it seems like I have wasted my time creating my provisioner.
But the need in automatic releaser still stands as I can't find anything else at the moment that can automatically make my PVs `Available` when they are `Released` so the next PVC can consume them.

Dee Kryvenko added a comment - 2021-07-26 21:14 Actually there is a ephemeral: volumeClaimTemplate: ... See https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/ , this should be able to provision PVCs dynamically so it seems like I have wasted my time creating my provisioner. But the need in automatic releaser still stands as I can't find anything else at the moment that can automatically make my PVs `Available` when they are `Released` so the next PVC can consume them.

Jesse Glick added a comment - 2021-07-29 21:50

I do not see how https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes would be helpful for CI caching, as these would behave like emptyDir, unless you mean to use some platform-specific snapshotting system?

Jesse Glick added a comment - 2021-07-29 21:50 I do not see how https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes would be helpful for CI caching, as these would behave like emptyDir , unless you mean to use some platform-specific snapshotting system?

Dee Kryvenko added a comment - 2021-07-30 06:06

Well, `ephemeral.volumeClaimTemplate` basically does exactly what my provisioner does - dynamically creates a PVC for the pod and sets the pod as the owner so GC will delete PVC when the pod is deleted. Except its natively defined in a pod instead of the annotation. It's probably also more sophisticated behind the scene than my implementation because a scheduler is aware of a PVC from the get go and the PVC configuration will be available for all the admission controllers. The thing is that if a storage class used in the `volumeClaimTemplate` was configured with `reclaimPolicy: Retain` - the PV will stay after the PVC is gone, though it will be in `Released` state. That's why you'd still need my releaser to make it `Available` again so the next PVC could claim it.

Here's the quote:

In terms of resource ownership, a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s) that provide that ephemeral storage. When the Pod is deleted, the Kubernetes garbage collector deletes the PVC, which then usually triggers deletion of the volume because the default reclaim policy of storage classes is to delete volumes. You can create quasi-ephemeral local storage using a StorageClass with a reclaim policy of retain: the storage outlives the Pod, and in this case you need to ensure that volume clean up happens separately.

The reason I am still looking into that is - my PVs are backed by EBS and I ran into another issue I reported here https://github.com/kubernetes/kubernetes/issues/103812. Basically, when PVC tries to grab pre-existent PV in the `Available` status - it completely ignores `WaitForFirstConsumer`. That means the PV getting bound before the pod getting scheduled. If a pod was using more than one PVC - they might be bound with PVs from different AZs, making such a pod unschedulable since EBS can only be mounted to ec2 within the same AZ. For now as a workaround I just locked down my caching storage class and build pods to a single AZ - same one Jenkins Controller lives in. Not like it makes anything worse - Jenkins controller by its nature is a single server anyway and it can only exist in one AZ at a time, so I wasn't really gaining anything by having my build pods to span up in multiple AZs. But I wonder if using `volumeClaimTemplate` will help there.

After I test it all - I will deprecate and remove provisioner from my code and update docs. We only need the releaser piece at this point.

Dee Kryvenko added a comment - 2021-07-30 06:06 Well, `ephemeral.volumeClaimTemplate` basically does exactly what my provisioner does - dynamically creates a PVC for the pod and sets the pod as the owner so GC will delete PVC when the pod is deleted. Except its natively defined in a pod instead of the annotation. It's probably also more sophisticated behind the scene than my implementation because a scheduler is aware of a PVC from the get go and the PVC configuration will be available for all the admission controllers. The thing is that if a storage class used in the `volumeClaimTemplate` was configured with `reclaimPolicy: Retain` - the PV will stay after the PVC is gone, though it will be in `Released` state. That's why you'd still need my releaser to make it `Available` again so the next PVC could claim it. Here's the quote: In terms of resource ownership, a Pod that has generic ephemeral storage is the owner of the PersistentVolumeClaim(s) that provide that ephemeral storage. When the Pod is deleted, the Kubernetes garbage collector deletes the PVC, which then usually triggers deletion of the volume because the default reclaim policy of storage classes is to delete volumes. You can create quasi-ephemeral local storage using a StorageClass with a reclaim policy of retain: the storage outlives the Pod, and in this case you need to ensure that volume clean up happens separately. The reason I am still looking into that is - my PVs are backed by EBS and I ran into another issue I reported here https://github.com/kubernetes/kubernetes/issues/103812 . Basically, when PVC tries to grab pre-existent PV in the `Available` status - it completely ignores `WaitForFirstConsumer`. That means the PV getting bound before the pod getting scheduled. If a pod was using more than one PVC - they might be bound with PVs from different AZs, making such a pod unschedulable since EBS can only be mounted to ec2 within the same AZ. For now as a workaround I just locked down my caching storage class and build pods to a single AZ - same one Jenkins Controller lives in. Not like it makes anything worse - Jenkins controller by its nature is a single server anyway and it can only exist in one AZ at a time, so I wasn't really gaining anything by having my build pods to span up in multiple AZs. But I wonder if using `volumeClaimTemplate` will help there. After I test it all - I will deprecate and remove provisioner from my code and update docs. We only need the releaser piece at this point.

Dee Kryvenko added a comment - 2021-07-31 23:33

I was trying to test `ephemeral.volumeClaimTemplate` and the part I missed in the doc was:

FEATURE STATE: Kubernetes v1.21 [beta]

I guess I didn't entirely wasted my time with the provisioner - I was just a bit ahead of k8s. I am still on 1.17.

Dee Kryvenko added a comment - 2021-07-31 23:33 I was trying to test `ephemeral.volumeClaimTemplate` and the part I missed in the doc was: FEATURE STATE: Kubernetes v1.21 [beta] I guess I didn't entirely wasted my time with the provisioner - I was just a bit ahead of k8s. I am still on 1.17.

Jenkins

Details

Description

Attachments

Activity

Collapse comment: Carlos Sanchez added a comment - 2017-03-02 03:56

Expand comment: Carlos Sanchez added a comment - 2017-03-02 03:56

Collapse comment: Roman Safronov added a comment - 2017-03-02 16:57

Expand comment: Roman Safronov added a comment - 2017-03-02 16:57

Collapse comment: Zach Langbert added a comment - 2017-03-16 00:25

Expand comment: Zach Langbert added a comment - 2017-03-16 00:25

Collapse comment: Carlos Sanchez added a comment - 2018-05-25 09:15

Expand comment: Carlos Sanchez added a comment - 2018-05-25 09:15

Collapse comment: J Alkjaer added a comment - 2018-06-19 12:13, Edited by J Alkjaer - 2018-06-19 12:42

Expand comment: J Alkjaer added a comment - 2018-06-19 12:13, Edited by J Alkjaer - 2018-06-19 12:42

Collapse comment: R C added a comment - 2019-01-09 23:18

Expand comment: R C added a comment - 2019-01-09 23:18

Collapse comment: Carlos Sanchez added a comment - 2019-01-10 08:28

Expand comment: Carlos Sanchez added a comment - 2019-01-10 08:28

Collapse comment: R C added a comment - 2019-01-31 01:33

Expand comment: R C added a comment - 2019-01-31 01:33

Collapse comment: Carlos Sanchez added a comment - 2019-01-31 08:51

Expand comment: Carlos Sanchez added a comment - 2019-01-31 08:51

Collapse comment: suryatej yaramada added a comment - 2019-10-09 05:04

Expand comment: suryatej yaramada added a comment - 2019-10-09 05:04

Collapse comment: Tristan FAURE added a comment - 2020-03-26 07:31, Edited by Tristan FAURE - 2020-03-26 09:31

Expand comment: Tristan FAURE added a comment - 2020-03-26 07:31, Edited by Tristan FAURE - 2020-03-26 09:31

Collapse comment: Matthew Ludlum added a comment - 2020-05-11 21:41, Edited by Matthew Ludlum - 2020-05-11 21:53

Expand comment: Matthew Ludlum added a comment - 2020-05-11 21:41, Edited by Matthew Ludlum - 2020-05-11 21:53

Collapse comment: Vincent Latombe added a comment - 2021-01-05 15:21

Expand comment: Vincent Latombe added a comment - 2021-01-05 15:21

Collapse comment: Matthew Ludlum added a comment - 2021-01-08 21:14

Expand comment: Matthew Ludlum added a comment - 2021-01-08 21:14

Collapse comment: Igor Sarkisov added a comment - 2021-05-14 21:40

Expand comment: Igor Sarkisov added a comment - 2021-05-14 21:40

Collapse comment: Dee Kryvenko added a comment - 2021-06-18 05:08, Edited by Dee Kryvenko - 2021-06-18 05:21

Expand comment: Dee Kryvenko added a comment - 2021-06-18 05:08, Edited by Dee Kryvenko - 2021-06-18 05:21

Collapse comment: Jesse Glick added a comment - 2021-06-18 13:17

Expand comment: Jesse Glick added a comment - 2021-06-18 13:17

Collapse comment: Jesse Glick added a comment - 2021-06-18 13:19

Expand comment: Jesse Glick added a comment - 2021-06-18 13:19

Collapse comment: Dee Kryvenko added a comment - 2021-06-18 20:25, Edited by Dee Kryvenko - 2021-06-18 20:35

Expand comment: Dee Kryvenko added a comment - 2021-06-18 20:25, Edited by Dee Kryvenko - 2021-06-18 20:35

Collapse comment: Dee Kryvenko added a comment - 2021-06-20 02:14

Expand comment: Dee Kryvenko added a comment - 2021-06-20 02:14

Collapse comment: Dee Kryvenko added a comment - 2021-06-20 09:31

Expand comment: Dee Kryvenko added a comment - 2021-06-20 09:31

Collapse comment: Jesse Glick added a comment - 2021-06-21 18:25

Expand comment: Jesse Glick added a comment - 2021-06-21 18:25

Collapse comment: Dee Kryvenko added a comment - 2021-07-26 21:14

Expand comment: Dee Kryvenko added a comment - 2021-07-26 21:14

Collapse comment: Jesse Glick added a comment - 2021-07-29 21:50

Expand comment: Jesse Glick added a comment - 2021-07-29 21:50

Collapse comment: Dee Kryvenko added a comment - 2021-07-30 06:06

Expand comment: Dee Kryvenko added a comment - 2021-07-30 06:06

Collapse comment: Dee Kryvenko added a comment - 2021-07-31 23:33

Expand comment: Dee Kryvenko added a comment - 2021-07-31 23:33

People

Dates