[JENKINS-59652] [kubernetes plugin] Protect Jenkins agent pods from eviction - Jenkins Jira

Type: Improvement
Resolution: Unresolved
Priority: Minor
Component/s: kubernetes-plugin
Labels:
None
Environment:
GKE cluster master and node pools version: 1.14
Cluster autoscaler activated
Jenkins master LTS installed with official Helm chart (1.1.24)
Kubernetes plugin: 1.19.0

Similar Issues:
Powered by SuggestiMate

Show

I have a sporadic bug occuring on my Jenkins installation for months now:

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

To fix this, I set the annotation on all my pods in the podTemplate yaml:

cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

is duplicated by

JENKINS-67167 in a kubernetes pod sh steps inside container() are failing sporadically

Open

relates to

JENKINS-64848 Shell step failing randomly

Open

JENKINS-67474 Pipeline is failing due to io.fabric8.kubernetes.client.KubernetesClientException: not ready after n milliseconds

Closed

Assignee:: Unassigned

Reporter:: Jonathan Pigrée

Votes:: 5 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 2019-10-04 06:34

Updated:: 2023-11-21 12:21

Details

Description

Attachments

Issue Links

Activity

People

Dates