[JENKINS-59652] [kubernetes plugin] Protect Jenkins agent pods from eviction

Type: Improvement
Resolution: Unresolved
Priority: Minor
Component/s: kubernetes-plugin
Labels:
None
Environment:
GKE cluster master and node pools version: 1.14
Cluster autoscaler activated
Jenkins master LTS installed with official Helm chart (1.1.24)
Kubernetes plugin: 1.19.0

Similar Issues:
Powered by SuggestiMate

Show

I have a sporadic bug occuring on my Jenkins installation for months now:

java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

To fix this, I set the annotation on all my pods in the podTemplate yaml:

cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

is duplicated by

JENKINS-67167 in a kubernetes pod sh steps inside container() are failing sporadically

Open

relates to

JENKINS-64848 Shell step failing randomly

Open

JENKINS-67474 Pipeline is failing due to io.fabric8.kubernetes.client.KubernetesClientException: not ready after n milliseconds

Closed

Jonathan Pigrée created issue - 2019-10-04 06:34

Jonathan Pigrée made changes - 2019-10-04 06:35

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:

??java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF??

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:36

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:36

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:36

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:

{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}

I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:36

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844

- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:37

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored.

How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

Jonathan Pigrée made changes - 2019-10-04 06:39

Description

Original: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

New: I have a sporadic bug occuring on my Jenkins installation for months now:
{noformat}
java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
{noformat}
I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
- https://issues.jenkins-ci.org/browse/JENKINS-39844
- [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

To fix this, I set the annotation on all my pods in the podTemplate yaml:
{noformat}
cluster-autoscaler.kubernetes.io/safe-to-evict: "false" {noformat}
However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction?

Vincent Latombe made changes - 2021-01-07 16:02

Assignee

Original: Carlos Sanchez [ csanchez ]

Jesse Glick made changes - 2021-10-07 11:46

Link

New: This issue relates to JENKINS-64848 [ JENKINS-64848 ]

Jesse Glick made changes - 2021-11-18 15:23

Link

New: This issue is duplicated by JENKINS-67167 [ JENKINS-67167 ]

Assignee:: Unassigned

Reporter:: Jonathan Pigrée

Votes:: 5 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 2019-10-04 06:34

Updated:: 2023-11-21 12:21

Jenkins

Details

Description

Attachments

Issue Links

Activity

People

Dates