[JENKINS-50735] Using pods in a Kubernetes deployment as slaves via Kubernetes plugin

Type: New Feature
Resolution: Unresolved
Priority: Major
Component/s: kubernetes-plugin
Labels:
None
Environment:
Jenkins v2.89.2
Kubernetes plugin v1.3.3

Similar Issues:
Powered by SuggestiMate

Show

At my company, we use Jenkins to run builds with lots of parallel tasks using the Pipeline suite of plugins, and the slaves for each task are provisioned from a private Kubernetes cluster. We have a very specific problem with these provisioned slaves: we'd like to reduce the time overhead of a Kubernetes slave to match that of a physical slave (or get as close as possible). Since our slave container itself has a non-trivial start-up time (after provisioning, but before registering with the Jenkins master), we're thinking of maintaining a Kubernetes deployment of 'ready' slaves that register themselves with the master, and then are removed from the deployment when they're assigned a job; the rest of the lifecycle remains the same (that is, the slaves are still used only once). This ensures that we have a continuous supply of ready slaves, and we can also use pool size auto-scaling to keep up with load.

We've tried this out internally by modified the Kubernetes plugin a little to be able to support this system, and are reasonably satisfied with the results. I have a couple of questions with regard to this:

1. Is there a better way to reduce overhead? In our case, overhead essentially comprises of provisioning request time + pod scheduling time + container setup + slave connect-back + pipeline setup time.

2. Does this use-case fall within the realm of the Kubernetes plugin, or is it better off developed as a plugin dependent on this one?

Karthik Duddu created issue - 2018-04-11 11:42

Carlos Sanchez added a comment - 2018-04-11 14:58

provisioning request time tends to 0 using the config in https://github.com/jenkinsci/kubernetes-plugin/#over-provisioning-flags

Other than your specific container startup time I don't see any of the other options adding much time

And then you have "Time in minutes to retain slave when idle" which would keep the agents around, but I'm guessing you don't want to reuse them

Carlos Sanchez added a comment - 2018-04-11 14:58 provisioning request time tends to 0 using the config in https://github.com/jenkinsci/kubernetes-plugin/#over-provisioning-flags Other than your specific container startup time I don't see any of the other options adding much time And then you have "Time in minutes to retain slave when idle" which would keep the agents around, but I'm guessing you don't want to reuse them

Karthik Duddu added a comment - 2018-04-11 16:46 - edited

Perhaps I should've explained a little more: we're using the Pipeline plugin suite of Jenkins, and there's an inital start-up cost associated with shell step of the pipeline. The breakdown of time consumed came about to roughly the following:

Breakdown of execution time taken by different steps of a simple `echo hello` shell step, when run in a pipeline with an un-modified version of the Kubernetes plugin:

Order of execution	Operation	Time (in secs)

0	Job start request	1
1	Job start	3
2	Request for pod	10
3	k8s scheduling + startup	5
4	Slave registration and info exchange	5
5	Agent Connection	20
6	Script setup	51
7	Script execution	8
8	Job end	0

We've already set the over provisioning flags while starting up the instance. Using a deployment allows us to eliminate steps 2-6 (just after connection, we run a small script on the slave to complete the start-up time), and provides an overhead roughly similar to that of physical slaves.

Also, as you mentioned, we don't want to reuse slaves, which kind of eliminates "Time in minutes to retain slave when idle" as an option.

Karthik Duddu added a comment - 2018-04-11 16:46 - edited Perhaps I should've explained a little more: we're using the Pipeline plugin suite of Jenkins, and there's an inital start-up cost associated with shell step of the pipeline. The breakdown of time consumed came about to roughly the following: Breakdown of execution time taken by different steps of a simple `echo hello` shell step, when run in a pipeline with an un-modified version of the Kubernetes plugin: Order of execution Operation Time (in secs) 0 Job start request 1 1 Job start 3 2 Request for pod 10 3 k8s scheduling + startup 5 4 Slave registration and info exchange 5 5 Agent Connection 20 6 Script setup 51 7 Script execution 8 8 Job end 0 We've already set the over provisioning flags while starting up the instance. Using a deployment allows us to eliminate steps 2-6 (just after connection, we run a small script on the slave to complete the start-up time), and provides an overhead roughly similar to that of physical slaves. Also, as you mentioned, we don't want to reuse slaves, which kind of eliminates "Time in minutes to retain slave when idle" as an option.

Karthik Duddu made changes - 2018-04-11 16:53

Description

Original: At my company, we use Jenkins to run builds with lots of parallel tasks, where the slaves for each task are provisioned from a private Kubernetes cluster. We have a very specific problem with these provisioned slaves: we'd like to reduce the time overhead of a Kubernetes slave to match that of a physical slave (or get as close as possible). Since our slave container itself has a non-trivial start-up time (after provisioning, but before registering with the Jenkins master), we're thinking of maintaining a Kubernetes deployment of 'ready' slaves that register themselves with the master, and then are removed from the deployment when they're assigned a job; the rest of the lifecycle remains the same (that is, the slaves are still used only once). This ensures that we have a continuous supply of ready slaves, and we can also use pool size auto-scaling to keep up with load.

We've tried this out internally by modified the Kubernetes plugin a little to be able to support this system, and are reasonably satisfied with the results. I have a couple of questions with regard to this:

1. Is there a better way to reduce overhead? In our case, overhead essentially comprises of provisioning request time + pod scheduling time + container setup + slave connect-back.

2. Does this use-case fall within the realm of the Kubernetes plugin, or is it better off developed as a plugin dependent on this one?

New: At my company, we use Jenkins to run builds with lots of parallel tasks using the Pipeline suite of plugins, and the slaves for each task are provisioned from a private Kubernetes cluster. We have a very specific problem with these provisioned slaves: we'd like to reduce the time overhead of a Kubernetes slave to match that of a physical slave (or get as close as possible). Since our slave container itself has a non-trivial start-up time (after provisioning, but before registering with the Jenkins master), we're thinking of maintaining a Kubernetes deployment of 'ready' slaves that register themselves with the master, and then are removed from the deployment when they're assigned a job; the rest of the lifecycle remains the same (that is, the slaves are still used only once). This ensures that we have a continuous supply of ready slaves, and we can also use pool size auto-scaling to keep up with load.

We've tried this out internally by modified the Kubernetes plugin a little to be able to support this system, and are reasonably satisfied with the results. I have a couple of questions with regard to this:

1. Is there a better way to reduce overhead? In our case, overhead essentially comprises of provisioning request time + pod scheduling time + container setup + slave connect-back + pipeline setup time.

2. Does this use-case fall within the realm of the Kubernetes plugin, or is it better off developed as a plugin dependent on this one?

Carlos Sanchez added a comment - 2018-04-12 08:06

So if you don't want agents to be provisioned per job, then you just have "external" agents connected and you manage them (with a deployment for instance). You just need to kill them after each job

You can use the swarm plugin for authentication https://plugins.jenkins.io/swarm https://www.infoq.com/articles/scaling-docker-kubernetes-v1

Carlos Sanchez added a comment - 2018-04-12 08:06 So if you don't want agents to be provisioned per job, then you just have "external" agents connected and you manage them (with a deployment for instance). You just need to kill them after each job You can use the swarm plugin for authentication https://plugins.jenkins.io/swarm https://www.infoq.com/articles/scaling-docker-kubernetes-v1

Yuping Jin added a comment - 2019-04-16 02:44

Without this plugin to manage slaves is more complex. However, being able to pre-launch a certain number of slave pods reduces the time for builds significantly. I'm in the same situation and had to set "Time in minutes to retain slave when idle" to keep some pods running and being reused. Still, the problem is how to launch the slave pods initially. Triggering it by some builds is awkward.

I tried to use a separate deployment for slaves, but I ran into a problem with JENKINS_AGENT_NAME. I've no idea how to handle it. A random name got "Unknown client name" error. karthikduddu, is it possible to share your customization to Kubernetes plugin?

Appreciate your help!

Yuping Jin added a comment - 2019-04-16 02:44 Without this plugin to manage slaves is more complex. However, being able to pre-launch a certain number of slave pods reduces the time for builds significantly. I'm in the same situation and had to set "Time in minutes to retain slave when idle" to keep some pods running and being reused. Still, the problem is how to launch the slave pods initially. Triggering it by some builds is awkward. I tried to use a separate deployment for slaves, but I ran into a problem with JENKINS_AGENT_NAME. I've no idea how to handle it. A random name got "Unknown client name" error. karthikduddu , is it possible to share your customization to Kubernetes plugin? Appreciate your help!

Yuping Jin added a comment - 2019-04-16 06:41

csanchez I tried to add a permanent JNLP agent according to https://support.cloudbees.com/hc/en-us/articles/360004695871-Create-dedicated-agents-running-Kubernetes. While running a pipeline defined for Kubernetes plugin I got "ERROR: Node is not a Kubernetes node". There is not much discussion about it on google. Is this a correct usage?

Thanks and best regards,

Yuping Jin added a comment - 2019-04-16 06:41 csanchez I tried to add a permanent JNLP agent according to https://support.cloudbees.com/hc/en-us/articles/360004695871-Create-dedicated-agents-running-Kubernetes . While running a pipeline defined for Kubernetes plugin I got "ERROR: Node is not a Kubernetes node". There is not much discussion about it on google. Is this a correct usage? Thanks and best regards,

Carlos Sanchez added a comment - 2019-04-16 10:43

As it is now you can't try to run container and other steps from this plugin in a node that is not created by it

Carlos Sanchez added a comment - 2019-04-16 10:43 As it is now you can't try to run container and other steps from this plugin in a node that is not created by it

Yuping Jin added a comment - 2019-04-16 23:18

I see. Thanks!

Yuping Jin added a comment - 2019-04-16 23:18 I see. Thanks!

Yuping Jin added a comment - 2019-04-17 03:15

csanchez Is there a plan to add support for pre-launching a certain number of pods? Since it has already support for retaining pods for reuse this sounds like a natural extension.

This plugin is so nice. It integrates with Kubernetes perfectly. For us, the only concern is the time waiting for pod creating/scheduling/connecting.

Thanks and best regards,

Yuping Jin added a comment - 2019-04-17 03:15 csanchez Is there a plan to add support for pre-launching a certain number of pods? Since it has already support for retaining pods for reuse this sounds like a natural extension. This plugin is so nice. It integrates with Kubernetes perfectly. For us, the only concern is the time waiting for pod creating/scheduling/connecting. Thanks and best regards,

Assignee:: Unassigned

Reporter:: Karthik Duddu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2018-04-11 11:42

Updated:: 2020-09-30 16:12

Jenkins

Details

Description

Attachments

Activity

Collapse comment: Carlos Sanchez added a comment - 2018-04-11 14:58

Expand comment: Carlos Sanchez added a comment - 2018-04-11 14:58

Collapse comment: Karthik Duddu added a comment - 2018-04-11 16:46, Edited by Karthik Duddu - 2018-04-11 16:52

Expand comment: Karthik Duddu added a comment - 2018-04-11 16:46, Edited by Karthik Duddu - 2018-04-11 16:52

Collapse comment: Carlos Sanchez added a comment - 2018-04-12 08:06

Expand comment: Carlos Sanchez added a comment - 2018-04-12 08:06

Collapse comment: Yuping Jin added a comment - 2019-04-16 02:44

Expand comment: Yuping Jin added a comment - 2019-04-16 02:44

Collapse comment: Yuping Jin added a comment - 2019-04-16 06:41

Expand comment: Yuping Jin added a comment - 2019-04-16 06:41

Collapse comment: Carlos Sanchez added a comment - 2019-04-16 10:43

Expand comment: Carlos Sanchez added a comment - 2019-04-16 10:43

Collapse comment: Yuping Jin added a comment - 2019-04-16 23:18

Expand comment: Yuping Jin added a comment - 2019-04-16 23:18

Collapse comment: Yuping Jin added a comment - 2019-04-17 03:15

Expand comment: Yuping Jin added a comment - 2019-04-17 03:15

People

Dates