-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
GKE cluster master and node pools version: 1.14
Cluster autoscaler activated
Jenkins master LTS installed with official Helm chart (1.1.24)
Kubernetes plugin: 1.19.0
I have a sporadic bug occuring on my Jenkins installation for months now:
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command
However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.
To fix this, I set the annotation on all my pods in the podTemplate yaml:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.
But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?
- is duplicated by
-
JENKINS-67167 in a kubernetes pod sh steps inside container() are failing sporadically
- Open
- relates to
-
JENKINS-64848 Shell step failing randomly
- Open
-
JENKINS-67474 Pipeline is failing due to io.fabric8.kubernetes.client.KubernetesClientException: not ready after n milliseconds
- Closed