-
New Feature
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins v2.89.2
Kubernetes plugin v1.3.3
At my company, we use Jenkins to run builds with lots of parallel tasks using the Pipeline suite of plugins, and the slaves for each task are provisioned from a private Kubernetes cluster. We have a very specific problem with these provisioned slaves: we'd like to reduce the time overhead of a Kubernetes slave to match that of a physical slave (or get as close as possible). Since our slave container itself has a non-trivial start-up time (after provisioning, but before registering with the Jenkins master), we're thinking of maintaining a Kubernetes deployment of 'ready' slaves that register themselves with the master, and then are removed from the deployment when they're assigned a job; the rest of the lifecycle remains the same (that is, the slaves are still used only once). This ensures that we have a continuous supply of ready slaves, and we can also use pool size auto-scaling to keep up with load.
We've tried this out internally by modified the Kubernetes plugin a little to be able to support this system, and are reasonably satisfied with the results. I have a couple of questions with regard to this:
1. Is there a better way to reduce overhead? In our case, overhead essentially comprises of provisioning request time + pod scheduling time + container setup + slave connect-back + pipeline setup time.
2. Does this use-case fall within the realm of the Kubernetes plugin, or is it better off developed as a plugin dependent on this one?
Perhaps I should've explained a little more: we're using the Pipeline plugin suite of Jenkins, and there's an inital start-up cost associated with shell step of the pipeline. The breakdown of time consumed came about to roughly the following:
Breakdown of execution time taken by different steps of a simple `echo hello` shell step, when run in a pipeline with an un-modified version of the Kubernetes plugin:
We've already set the over provisioning flags while starting up the instance. Using a deployment allows us to eliminate steps 2-6 (just after connection, we run a small script on the slave to complete the start-up time), and provides an overhead roughly similar to that of physical slaves.
Also, as you mentioned, we don't want to reuse slaves, which kind of eliminates "Time in minutes to retain slave when idle" as an option.