Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59652

[kubernetes plugin] Protect Jenkins agent pods from eviction

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • GKE cluster master and node pools version: 1.14
      Cluster autoscaler activated
      Jenkins master LTS installed with official Helm chart (1.1.24)
      Kubernetes plugin: 1.19.0

      I have a sporadic bug occuring on my Jenkins installation for months now:

      java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
      at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
      at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
      at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
      at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
      

      I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

      However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

       

      To fix this, I set the annotation on all my pods in the podTemplate yaml:

      cluster-autoscaler.kubernetes.io/safe-to-evict: "false" 

      However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

      But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 

          [JENKINS-59652] [kubernetes plugin] Protect Jenkins agent pods from eviction

          Jonathan Pigrée created issue -
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

          ??java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF??

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

          - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          To fix this, I set the annotation on all my pods in the podTemplate yaml:
          {noformat}
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false" {noformat}
          However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

          But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 
          Vincent Latombe made changes -
          Assignee Original: Carlos Sanchez [ csanchez ]
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-64848 [ JENKINS-64848 ]
          Jesse Glick made changes -
          Link New: This issue is duplicated by JENKINS-67167 [ JENKINS-67167 ]

            Unassigned Unassigned
            jpigree Jonathan Pigrée
            Votes:
            5 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: