Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59893

bat calls hang in Windows Docker container in declarative pipeline script

      Description

      bat steps hang (endless spinning wheel in the jobs console output) even for simple Windows containers.

      bat "echo test inside"

      Troubleshooting & Additional info

      powershell and all other commands tried so far work without issue. Even using powershell to wrap cmd.exe commands works fine. Example:

       

      powershell "cmd /c echo test inside"

       

      Running the image manually on the node host exhibits no issues. i.e. can run docker run -it microsoft/windowsservercore:ltsc2016 and happy use cmd and all other commands without issue.

      Similarly we can attach to the container spun up by the Jenkins job while it's hung and execute the same echo command (or any other) without issue.

      Others have not had this issue so it could be something specific in our setup, but I have not been able to pinpoint anything. https://github.com/jenkinsci/docker-workflow-plugin/pull/184#issuecomment-539213785

      The job console output shows no errors and neither does the main Jenkins log under /log/all. No errors if any kind while the job is running / hung.

      Setup

      Jenkins node host: Windows Server 2016 (1607)

      Docker image: microsoft/windowsservercore:ltsc2016

      Happens regardless if docker {} or dockerfile {} syntax is used.

      Specifically using declarative pipeline scripts. Have not tested other methods

      pipeline {
          agent {
              docker {
                  image 'microsoft/windowsservercore:ltsc2016'
                  label 'windows'
              }
          }
          stages {
              stage('Example Build') {
                  steps{
                      bat "echo test inside"
                  }
              }
          }
      }

          [JENKINS-59893] bat calls hang in Windows Docker container in declarative pipeline script

          a b added a comment - - edited

          henryborchers Which version are you currently running? We installed fresh just a couple weeks ago but I can't check the version right now.

          casz Our current server is 2016 thus we cannot use base images beyond microsoft/windowsservercore:ltsc2016 more or less. As per henryborchers 2019 host and images can exhibit the same issues. 

          However, for other reasons we need to provision a second server so we are shooting for 2019. I will try to keep the rest of the setup the same and see if we experience the same issue.

          Our use case is different and we are not running Jenkins agents inside the container, but I suppose we could try that Jenkins image you reference if we continue to have problems on 2019. We won't be able to use it on 2016 because it seems to be based on windowsservercore 1809 which is beyond what we can run on 2016 as far as I can tell.

          Edit: Confirmed... 

          >docker pull jenkins/agent:latest-windows
          latest-windows: Pulling from jenkins/agent
          a Windows version 10.0.17763-based image is incompatible with a 10.0.14393 host

          a b added a comment - - edited henryborchers Which version are you currently running? We installed fresh just a couple weeks ago but I can't check the version right now. casz Our current server is 2016 thus we cannot use base images beyond microsoft/windowsservercore:ltsc2016 more or less. As per henryborchers 2019 host and images can exhibit the same issues.  However, for other reasons we need to provision a second server so we are shooting for 2019. I will try to keep the rest of the setup the same and see if we experience the same issue. Our use case is different and we are not running Jenkins agents inside the container, but I suppose we could try that Jenkins image you reference if we continue to have problems on 2019. We won't be able to use it on 2016 because it seems to be based on windowsservercore 1809 which is beyond what we can run on 2016 as far as I can tell. Edit : Confirmed...  >docker pull jenkins/agent:latest-windows latest-windows: Pulling from jenkins/agent a Windows version 10.0.17763-based image is incompatible with a 10.0.14393 host

          jerry wiltse added a comment -

          If you pass `–isolation=hyperv` you can run images based on any windows kernel, regardless of what kernel the host is on.

          jerry wiltse added a comment - If you pass `–isolation=hyperv` you can run images based on any windows kernel, regardless of what kernel the host is on.

          a b added a comment - - edited

          solvingj Does not work for me. At least not when trying to pull a 1903 image on Server 2019 1809. Maybe I only works for backward comparability, not forward? 

          >docker build --isolation="hyperv" -t "test_full" -f Dockerfile_1903 . Sending build context to Docker daemon 13.82kB Step 1/7 : FROM mcr.microsoft.com/powershell:7.0.0-preview.5-nanoserver-1903 7.0.0-preview.5-nanoserver-1903: Pulling from powershell a Windows version 10.0.18362-based image is incompatible with a 10.0.17763 host

          Edit: If I try and older version like nanoserver-1803 I get "The container operating system does not match the host operating system." on a powershell step.without the --isolation flag. When adding the flag I get "The request is not supported." on the same step.

          a b added a comment - - edited solvingj Does not work for me. At least not when trying to pull a 1903 image on Server 2019 1809. Maybe I only works for backward comparability, not forward?  >docker build --isolation= "hyperv" -t "test_full" -f Dockerfile_1903 . Sending build context to Docker daemon 13.82kB Step 1/7 : FROM mcr.microsoft.com/powershell:7.0.0-preview.5-nanoserver-1903 7.0.0-preview.5-nanoserver-1903: Pulling from powershell a Windows version 10.0.18362-based image is incompatible with a 10.0.17763 host Edit : If I try and older version like  nanoserver-1803 I get " The container operating system does not match the host operating system ." on a powershell step.without the --isolation flag. When adding the flag I get " The request is not supported ." on the same step.

          jerry wiltse added a comment - - edited

          I don't use quotes, but i don't think that's the issue.  I think you either need to be on a newer version of docker, or you need to enable experimental features.  

          jerry wiltse added a comment - - edited I don't use quotes, but i don't think that's the issue.  I think you either need to be on a newer version of docker, or you need to enable experimental features.  

          a b added a comment -

          We're on the latest version 19.03.4 on Server 2019 (1809) now. Just edited my previous comment with more info. I get different results from docker build if I add the --isolation flag. Fails either way but getting a different result makes me thing it's attempting to apply the isolation setting. Either way I don't think it will solve our issues.

          a b added a comment - We're on the latest version 19.03.4 on Server 2019 (1809) now. Just edited my previous comment with more info. I get different results from docker build if I add the --isolation flag. Fails either way but getting a different result makes me thing it's attempting to apply the isolation setting. Either way I don't think it will solve our issues.

          jerry wiltse added a comment -

          This matrix is how I learned about it.  It might be helpful to you: https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility

          jerry wiltse added a comment - This matrix is how I learned about it.  It might be helpful to you:  https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility

          a b added a comment -

          We've now tried this on a Windows Server 2019 host (1809) and a vanilla{{ }}mcr.microsoft.com/windows/servercore:1809 image and the bat call still hangs.  Even worse when we try a nanoserver image such as mcr.microsoft.com/powershell:nanoserver-1809 it will hang on both powershell and bat and when the Jenkins job is manually cancelled the server will experience a critical error and reboot itself!

           

          Critical: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

           

          a b added a comment - We've now tried this on a Windows Server 2019 host (1809) and a vanilla{{ }} mcr.microsoft.com/windows/servercore:1809 image and the bat call still hangs.  Even worse when we try a nanoserver image such as  mcr.microsoft.com/powershell:nanoserver-1809 it will hang on both powershell and bat and when the Jenkins job is manually cancelled the server will experience a critical error and reboot itself!   Critical: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.  

          How did you install docker?
          Please follow this guide
          https://docs.docker.com/install/windows/docker-ee/

          Joseph Petersen (old) added a comment - How did you install docker? Please follow this guide https://docs.docker.com/install/windows/docker-ee/

          a b added a comment -

          That is the method we used for install.

          The curious thing is that sometimes the bat calls work. We have some large and complex pipeline scripts in which many or even most of the bat calls will work but some still fail and the hello world example above fails ever time.

          I suspect that something else in the larger pipelines are inadvertently sidestepping / “fixing” the issue in real time. So there might be some kind of context in which the calls work and another (like the hello world) where they don’t. Haven’t been able to narrow it down yet.

          a b added a comment - That is the method we used for install. The curious thing is that sometimes the bat calls work. We have some large and complex pipeline scripts in which many or even most of the bat calls will work but some still fail and the hello world example above fails ever time. I suspect that something else in the larger pipelines are inadvertently sidestepping / “fixing” the issue in real time. So there might be some kind of context in which the calls work and another (like the hello world) where they don’t. Haven’t been able to narrow it down yet.

          Wish I could help you but we haven't experienced the issue

          Joseph Petersen (old) added a comment - Wish I could help you but we haven't experienced the issue

            Unassigned Unassigned
            stuck_tech a b
            Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: