Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59652

[kubernetes plugin] Protect Jenkins agent pods from eviction

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Minor Minor
    • kubernetes-plugin
    • None
    • GKE cluster master and node pools version: 1.14
      Cluster autoscaler activated
      Jenkins master LTS installed with official Helm chart (1.1.24)
      Kubernetes plugin: 1.19.0

      I have a sporadic bug occuring on my Jenkins installation for months now:

      java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
      at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
      at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
      at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
      at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
      

      I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

      However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

       

      To fix this, I set the annotation on all my pods in the podTemplate yaml:

      cluster-autoscaler.kubernetes.io/safe-to-evict: "false" 

      However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

      But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 

          [JENKINS-59652] [kubernetes plugin] Protect Jenkins agent pods from eviction

          Jonathan Pigrée created issue -
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

          ??java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF??

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:

          - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:

           
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
           

           

          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844

          - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

           

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored.

           

          How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 
          Jonathan Pigrée made changes -
          Description Original: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

          I tried to set the annotation [cluster-autoscaler.kubernetes.io/safe-to-evict|https://www.google.com/url?q=http://cluster-autoscaler.kubernetes.io/safe-to-evict&sa=D&usg=AFQjCNE07XKOcvUk0J1yOtDq6Bs0JS7JsQ]: "false" on my jenkins slave pods but it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

           

          However, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 
          New: I have a sporadic bug occuring on my Jenkins installation for months now:
          {noformat}
          java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
          at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
          at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
          at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
          at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
          io.fabric8.kubernetes.client.KubernetesClientException: error dialing backend: EOF
          {noformat}
          I believe it was already reported in these threads and I understand that this is caused by an HTTP 500 returned by the kubernetes API:
           - https://issues.jenkins-ci.org/browse/JENKINS-39844
           - [https://stackoverflow.com/questions/50949718/kubernetes-gke-error-dialing-backend-eof-on-random-exec-command]

          However, after further investigation, I am sure now that the bug occurs only when the cluster autoscaler is on and more precisely when the autoscaler scales down while a Jenkins build is running. It maybe an edge case.

           

          To fix this, I set the annotation on all my pods in the podTemplate yaml:
          {noformat}
          cluster-autoscaler.kubernetes.io/safe-to-evict: "false" {noformat}
          However, it didn't protect them. So I am trying now to setup a PodDisruptionBudget for each of my slave pod to protect them from eviction.

          But, when passing the PDB into the podTemplate yaml it is just totally ignored. How can I protect my jenkins slave pods from eviction? 

          We are also running Jenkins on GKE. At least we don't have issues that a running jenkins slave gets 'moved' to downscale the cluster but we have purposefully created a node pool only for jenkins slaves and sized them so that one jenkins slave uses one node. With autoscaling it is relativly quick but you can and should also have one node running idle.

           

          One thing to be aware of: We added the pdb to make sure jenkins is not killed/moved etc. but we removed it again. When gke is doing maintenance, it will only delay the eviction of a pod by one hour. Which makes the whole process much slower as gke will wait for every pod with pdb for an hour.

          Sigi Kiermayer added a comment - We are also running Jenkins on GKE. At least we don't have issues that a running jenkins slave gets 'moved' to downscale the cluster but we have purposefully created a node pool only for jenkins slaves and sized them so that one jenkins slave uses one node. With autoscaling it is relativly quick but you can and should also have one node running idle.   One thing to be aware of: We added the pdb to make sure jenkins is not killed/moved etc. but we removed it again. When gke is doing maintenance, it will only delay the eviction of a pod by one hour. Which makes the whole process much slower as gke will wait for every pod with pdb for an hour.

          A bug has been opened at Google: https://issuetracker.google.com/issues/156556218

          Allan BURDAJEWICZ added a comment - A bug has been opened at Google: https://issuetracker.google.com/issues/156556218

            Unassigned Unassigned
            jpigree Jonathan Pigrée
            Votes:
            5 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: