Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61340

Cannot launch windows pods

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Hi,

      I'm trying to get windows pods working on GKE but for both the jnlp and shell container I get the following errors:

      Error: failed to start container "jnlp": Error response from daemon: CreateComputeSystem jnlp: The system cannot find the file specified.

      Error: failed to start container "shell": Error response from daemon: CreateComputeSystem shell: The system cannot find the file specified.

      This is using the sample pipeline:

      /*
       * Runs a build on a Windows pod.
       * Tested in EKS: https://docs.aws.amazon.com/eks/latest/userguide/windows-support.html
       */
      podTemplate(yaml: '''
      apiVersion: v1
      kind: Pod
      spec:
        containers:
        - name: jnlp
          image: jenkins/jnlp-agent:latest-windows
        - name: shell
          image: mcr.microsoft.com/powershell:preview-windowsservercore-1809
          command:
          - powershell
          args:
          - Start-Sleep
          - 999999
        nodeSelector:
          beta.kubernetes.io/os: windows
      ''') {
          node(POD_LABEL) {
              container('shell') {
                  powershell 'Get-ChildItem Env: | Sort Name'
              }
          }
      }
      

      Here's an outline the commands I used to setup the cluster:

      gcloud beta container clusters create jenkins-cd \
        --num-nodes 2 \
        --machine-type n1-standard-2 \
        --scopes "https://www.googleapis.com/auth/source.read_write,cloud-platform" \
        --enable-ip-alias \
        --release-channel=rapid
      
      gcloud container clusters get-credentials jenkins-cd
      
      gcloud beta container node-pools create windows-pool \
        --cluster=jenkins-cd \
        --image-type=WINDOWS_SAC \
        --no-enable-autoupgrade \
        --machine-type=n1-standard-2
      
      wget https://get.helm.sh/helm-v2.16.1-linux-amd64.tar.gz
      
      tar zxfv helm-v2.16.1-linux-amd64.tar.gz
      cp linux-amd64/helm .
      
      kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=$(gcloud config get-value account)
      
      kubectl create serviceaccount tiller --namespace kube-system
      kubectl create clusterrolebinding tiller-admin-binding --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
      
      ./helm init --service-account=tiller
      ./helm update
      
      ./helm version
      
      ./helm install -n cd stable/jenkins -f jenkins/values.yaml --version 1.2.2 --wait
      
      kubectl get pods
      
      export POD_NAME=$(kubectl get pods -l "app.kubernetes.io/component=jenkins-master" -o jsonpath="{.items[0].metadata.name}")
      kubectl port-forward $POD_NAME 8080:8080 >> /dev/null &
      
      kubectl get svc
      
      printf $(kubectl get secret cd-jenkins -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode);echo
      

      Then I login into Jenkins and try to run the sample pipeline for windows but I cannot get it to work.

      The following error also appears in the jenkins master log:

      Error in provisioning; agent=KubernetesSlave name: windows-test-8-hn1vc-q3pg0-k4x83, template=PodTemplate{, name='windows-test_8-hn1vc-q3pg0', namespace='default', label='windows-test_8-hn1vc', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], annotations=[org.csanchez.jenkins.plugins.kubernetes.PodAnnotation@aab9c821, org.csanchez.jenkins.plugins.kubernetes.PodAnnotation@c92c82e4]}
      java.lang.IllegalStateException: Pod has terminated containers: default/windows-test-8-hn1vc-q3pg0-k4x83 (jnlp, shell)
      	at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:183)
      	at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.periodicAwait(AllContainersRunningPodWatcher.java:204)
      	at org.csanchez.jenkins.plugins.kubernetes.AllContainersRunningPodWatcher.await(AllContainersRunningPodWatcher.java:144)
      	at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:139)
      	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:292)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      I can deploy an image via kubectl apply -f jenkins-windows.yaml with the following template successfully:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: jenkins-windows
        labels:
          app: jenkins-windows
      spec:
        replicas: 0
        selector:
          matchLabels:
            app: jenkins-windows
        template:
          metadata:
            labels:
              app: jenkins-windows
          spec:
            nodeSelector:
              kubernetes.io/os: windows
            containers:
            - name: jnlp
              image: jenkins/jnlp-agent:latest-windows
            - name: shell
              image: mcr.microsoft.com/powershell:preview-windowsservercore-1809
              ports:
              - containerPort: 80
      

      I'm not sure how to debug this really. The error says The system cannot find the file specified but I am not sure what file this refers to. Is it related to workspaceVolume not being set? I would have thought the sample pipeline would work without that though (and I don't yet know how to configure that).

      Any help would be really appreciated else I will have to drop Kubernetes and just use VMs.

      For good measure here is also the log from the job:

      Created Pod: windows-test-9-mdt6s-t2kj3-dsr2n in namespace default
      Still waiting to schedule task
      ‘windows-test-9-mdt6s-t2kj3-dsr2n’ is offline
      Created Pod: windows-test-9-mdt6s-t2kj3-dcqvw in namespace default
      Created Pod: windows-test-9-mdt6s-t2kj3-kvr05 in namespace default
      Created Pod: windows-test-9-mdt6s-t2kj3-k9v52 in namespace default
      Created Pod: windows-test-9-mdt6s-t2kj3-tbn3g in namespace default
      Created Pod: windows-test-9-mdt6s-t2kj3-2qv1t in namespace default
      Aborted by admin
      

        Attachments

          Activity

          Hide
          alexgeek Alexander Perry added a comment -

          Formatting didn't come out great, I copy and pasted from Github: https://github.com/jenkinsci/google-kubernetes-engine-plugin/issues/117#issue-575604385

          Show
          alexgeek Alexander Perry added a comment - Formatting didn't come out great, I copy and pasted from Github:  https://github.com/jenkinsci/google-kubernetes-engine-plugin/issues/117#issue-575604385
          Hide
          olblak Olivier Vernin added a comment -

          I moved this ticket to the right project with the right component

          Show
          olblak Olivier Vernin added a comment - I moved this ticket to the right project with the right component
          Hide
          kon Kalle Niemitalo added a comment -

          Formatting didn't come out great

          I improved the formatting but can't help with the problem itself. The issue should perhaps be moved from INFRA to JENKINS though, if the problem occurs on your own Jenkins instance rather than on one maintained by the Jenkins project.

          Show
          kon Kalle Niemitalo added a comment - Formatting didn't come out great I improved the formatting but can't help with the problem itself. The issue should perhaps be moved from INFRA to JENKINS though, if the problem occurs on your own Jenkins instance rather than on one maintained by the Jenkins project.
          Hide
          alexgeek Alexander Perry added a comment -

          It looks like a symlink related to a log file is causing issues, does any know a work around for this?

           

          kubectl logs -f pods/windows-test-9-mdt6s-t2kj3-2qv1t jnlp
          failed to try resolving symlinks in path "\\var\\log\\pods\\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\\jnlp\\0.log": CreateFile \var\log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp\0.log: The system cannot find the file specified.
          Show
          alexgeek Alexander Perry added a comment - It looks like a symlink related to a log file is causing issues, does any know a work around for this?   kubectl logs -f pods/windows-test-9-mdt6s-t2kj3-2qv1t jnlp failed to try resolving symlinks in path "\\ var \\log\\pods\\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\\jnlp\\0.log" : CreateFile \ var \log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp\0.log: The system cannot find the file specified.
          Hide
          alexgeek Alexander Perry added a comment -

          It seems to me that the log is not being created and this causes the image to fail.

          The directory in question on the host is empty:

           

           

          C:\var\log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp>dir
           Volume in drive C has no label.
           Volume Serial Number is C62F-FA38
          Directory of C:\var\log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp
          03/04/2020 05:07 PM <DIR> .
          03/04/2020 05:07 PM <DIR> ..
           0 File(s) 0 bytes
           2 Dir(s) 75,565,420,544 bytes free
          

           

           

          Is there an option I can put into the template to prevent logging through the symlink? Or just disable logging entirely? 

           

          Show
          alexgeek Alexander Perry added a comment - It seems to me that the log is not being created and this causes the image to fail. The directory in question on the host is empty:     C:\ var \log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp>dir Volume in drive C has no label. Volume Serial Number is C62F-FA38 Directory of C:\ var \log\pods\default_windows-test-9-mdt6s-t2kj3-2qv1t_318ba281-c8ec-4a56-8515-c65316427b01\jnlp 03/04/2020 05:07 PM <DIR> . 03/04/2020 05:07 PM <DIR> .. 0 File(s) 0 bytes 2 Dir(s) 75,565,420,544 bytes free     Is there an option I can put into the template to prevent logging through the symlink? Or just disable logging entirely?   
          Hide
          alexgeek Alexander Perry added a comment -

          I forgot to say that I solved this. The issue is that the host container did not match the pods. 
          The solution Google Cloud's engineers told me is to use a manifest that combines an image for LTSC and SAC:

           
          https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-cluster-windows#building_multi-arch_images
           
          Confirmed to be working. Thanks

          Show
          alexgeek Alexander Perry added a comment - I forgot to say that I solved this. The issue is that the host container did not match the pods.  The solution Google Cloud's engineers told me is to use a manifest that combines an image for LTSC and SAC:   https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-cluster-windows#building_multi-arch_images   Confirmed to be working. Thanks

            People

            Assignee:
            alexgeek Alexander Perry
            Reporter:
            alexgeek Alexander Perry
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: