Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65391

Container not found error although it's started in the pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • kubernetes-plugin
    • None
    • Jenkins 2.277.2 running on GKE Kubernetes 1.18.16-gke.302
      kubernetes plugin: 1.29.2

      Since upgrading to the latest LTS version and plugin, noticed this error showing up when running the first command in container { ... } block:

      [2021-04-16T21:01:40.345Z] java.net.ProtocolException: Expected HTTP 101 response but was '500 Internal Server Error'
      [2021-04-16T21:01:40.345Z] 	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:229)
      [2021-04-16T21:01:40.345Z] 	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:196)
      [2021-04-16T21:01:40.345Z] 	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
      [2021-04-16T21:01:40.345Z] 	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
      [2021-04-16T21:01:40.345Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      [2021-04-16T21:01:40.345Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      [2021-04-16T21:01:40.345Z] 	at java.base/java.lang.Thread.run(Thread.java:834)
      io.fabric8.kubernetes.client.KubernetesClientException: container not found ("ubuntu")

      I was able to reproduce it with this pipeline script:

          def imageName = "ubuntu"
          def imageTag = "latest"
          
          def yaml = """
      apiVersion: v1
      kind: Pod
      spec:
        containers:
        - name: ${imageName}
          image: ${imageName}:${imageTag}
          command: ["/bin/sh"]
          args: ["-c", "sleep 5; cat"]
      """
          
          Map podTemplateArgs = [:]
          podTemplateArgs.namespace = "jenkins"
          podTemplateArgs.serviceAccount = "jenkins"
          podTemplateArgs.yaml = yaml
          podTemplateArgs.idleMinutes = 1    
          
          def builders = [:]
          for(def i = 0; i < 10; i++) {
              def n = i
              builders["b_${n}"] = {
                  podTemplate(podTemplateArgs) {
                      node(POD_LABEL) {
                          container(imageName) {
                              sh("pwd")
                          }
                      }
                  }
              }
          }
          
          parallel(builders)
      

       

      I know that the container command in the yaml looks odd (normally in my pod yaml it's just "cat") but that was the best example I could come up to reproduce the problem somewhat consistently. Sometimes a I need to start more than 10 pods (even 100) sometimes just a few as 5. It does happen when using a plain "cat" as the entrypoint, but that only happens on real production builds, and it's not obvious how to reproduce those.

      I suspect that Jenkins attempts to run the sh command before the container is actually ready. What's interesting is that the pod does start and the container does run (set idleMinutes and checked the container on the cluster) and if I use custom label for one of the pods than a new job will happily re-use it without failing.

      I'll post more logs in a comment (from a run where 4 containers failed, did my best to select only one).

       

            Unassigned Unassigned
            stefans Stefan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: