-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 2.277.2 running on GKE Kubernetes 1.18.16-gke.302
kubernetes plugin: 1.29.2
Since upgrading to the latest LTS version and plugin, noticed this error showing up when running the first command in container { ... } block:
[2021-04-16T21:01:40.345Z] java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error' [2021-04-16T21:01:40.345Z] at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229) [2021-04-16T21:01:40.345Z] at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196) [2021-04-16T21:01:40.345Z] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) [2021-04-16T21:01:40.345Z] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) [2021-04-16T21:01:40.345Z] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [2021-04-16T21:01:40.345Z] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [2021-04-16T21:01:40.345Z] at java.base/java.lang.Thread.run(Thread.java:834) io.fabric8.kubernetes.client.KubernetesClientException: container not found ("ubuntu")
I was able to reproduce it with this pipeline script:
def imageName = "ubuntu" def imageTag = "latest" def yaml = """ apiVersion: v1 kind: Pod spec: containers: - name: ${imageName} image: ${imageName}:${imageTag} command: ["/bin/sh"] args: ["-c", "sleep 5; cat"] """ Map podTemplateArgs = [:] podTemplateArgs.namespace = "jenkins" podTemplateArgs.serviceAccount = "jenkins" podTemplateArgs.yaml = yaml podTemplateArgs.idleMinutes = 1 def builders = [:] for(def i = 0; i < 10; i++) { def n = i builders["b_${n}"] = { podTemplate(podTemplateArgs) { node(POD_LABEL) { container(imageName) { sh("pwd") } } } } } parallel(builders)
I know that the container command in the yaml looks odd (normally in my pod yaml it's just "cat") but that was the best example I could come up to reproduce the problem somewhat consistently. Sometimes a I need to start more than 10 pods (even 100) sometimes just a few as 5. It does happen when using a plain "cat" as the entrypoint, but that only happens on real production builds, and it's not obvious how to reproduce those.
I suspect that Jenkins attempts to run the sh command before the container is actually ready. What's interesting is that the pod does start and the container does run (set idleMinutes and checked the container on the cluster) and if I use custom label for one of the pods than a new job will happily re-use it without failing.
I'll post more logs in a comment (from a run where 4 containers failed, did my best to select only one).