[JENKINS-65391] Container not found error although it's started in the pod

Type: Bug
Resolution: Unresolved
Priority: Major
Component/s: kubernetes-plugin
Labels:
None
Environment:
Jenkins 2.277.2 running on GKE Kubernetes 1.18.16-gke.302
kubernetes plugin: 1.29.2

Similar Issues:
Powered by SuggestiMate

Show

Since upgrading to the latest LTS version and plugin, noticed this error showing up when running the first command in container { ... } block:

[2021-04-16T21:01:40.345Z] java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
[2021-04-16T21:01:40.345Z] 	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
[2021-04-16T21:01:40.345Z] 	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
[2021-04-16T21:01:40.345Z] 	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
[2021-04-16T21:01:40.345Z] 	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2021-04-16T21:01:40.345Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[2021-04-16T21:01:40.345Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[2021-04-16T21:01:40.345Z] 	at java.base/java.lang.Thread.run(Thread.java:834)
io.fabric8.kubernetes.client.KubernetesClientException: container not found ("ubuntu")

I was able to reproduce it with this pipeline script:

    def imageName = "ubuntu"
    def imageTag = "latest"
    
    def yaml = """
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: ${imageName}
    image: ${imageName}:${imageTag}
    command: ["/bin/sh"]
    args: ["-c", "sleep 5; cat"]
"""
    
    Map podTemplateArgs = [:]
    podTemplateArgs.namespace = "jenkins"
    podTemplateArgs.serviceAccount = "jenkins"
    podTemplateArgs.yaml = yaml
    podTemplateArgs.idleMinutes = 1    
    
    def builders = [:]
    for(def i = 0; i < 10; i++) {
        def n = i
        builders["b_${n}"] = {
            podTemplate(podTemplateArgs) {
                node(POD_LABEL) {
                    container(imageName) {
                        sh("pwd")
                    }
                }
            }
        }
    }
    
    parallel(builders)

I know that the container command in the yaml looks odd (normally in my pod yaml it's just "cat") but that was the best example I could come up to reproduce the problem somewhat consistently. Sometimes a I need to start more than 10 pods (even 100) sometimes just a few as 5. It does happen when using a plain "cat" as the entrypoint, but that only happens on real production builds, and it's not obvious how to reproduce those.

I suspect that Jenkins attempts to run the sh command before the container is actually ready. What's interesting is that the pod does start and the container does run (set idleMinutes and checked the container on the cluster) and if I use custom label for one of the pods than a new job will happily re-use it without failing.

I'll post more logs in a comment (from a run where 4 containers failed, did my best to select only one).

Assignee:: Unassigned

Reporter:: Stefan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021-04-16 21:21

Updated:: 2022-06-24 05:21

Jenkins

Details

Description

Attachments

Activity

People

Dates