Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-59893

bat calls hang in Windows Docker container in declarative pipeline script

      Description

      bat steps hang (endless spinning wheel in the jobs console output) even for simple Windows containers.

      bat "echo test inside"

      Troubleshooting & Additional info

      powershell and all other commands tried so far work without issue. Even using powershell to wrap cmd.exe commands works fine. Example:

       

      powershell "cmd /c echo test inside"

       

      Running the image manually on the node host exhibits no issues. i.e. can run docker run -it microsoft/windowsservercore:ltsc2016 and happy use cmd and all other commands without issue.

      Similarly we can attach to the container spun up by the Jenkins job while it's hung and execute the same echo command (or any other) without issue.

      Others have not had this issue so it could be something specific in our setup, but I have not been able to pinpoint anything. https://github.com/jenkinsci/docker-workflow-plugin/pull/184#issuecomment-539213785

      The job console output shows no errors and neither does the main Jenkins log under /log/all. No errors if any kind while the job is running / hung.

      Setup

      Jenkins node host: Windows Server 2016 (1607)

      Docker image: microsoft/windowsservercore:ltsc2016

      Happens regardless if docker {} or dockerfile {} syntax is used.

      Specifically using declarative pipeline scripts. Have not tested other methods

      pipeline {
          agent {
              docker {
                  image 'microsoft/windowsservercore:ltsc2016'
                  label 'windows'
              }
          }
          stages {
              stage('Example Build') {
                  steps{
                      bat "echo test inside"
                  }
              }
          }
      }

          [JENKINS-59893] bat calls hang in Windows Docker container in declarative pipeline script

          Henry Borchers added a comment - - edited

          I'm also having this problem. However it's hanging on powershell as well.

          Edit: typo

          Henry Borchers added a comment - - edited I'm also having this problem. However it's hanging on powershell as well. Edit: typo

          a b added a comment -

          What host OS and Docker images are you using?

          a b added a comment - What host OS and Docker images are you using?

          stuck_tech

          Host OS:, Windows Server 2019 

          Docker images tried:

          • mcr.microsoft.com/windows/servercore:ltsc2019
          • mcr.microsoft.com/powershell:preview
          • One of my own based off of  mcr.microsoft.com/dotnet/framework/sdk:3.5

          Same experience on every one. 

           

          Currently my test pipeline looks like. this

           

          pipeline {  
              agent any  
              stages {
                  stage('hello world') {
                      parallel {
                          // this one doesn't work
                          stage('Hello world on Windows') {
                              agent {
                                docker {
                                    label 'Windows&&Docker&&aws'
                                    image 'mcr.microsoft.com/windows/servercore:ltsc2019'
                                }
                               }
                              options {
                               timeout(1) // in case the pipeline hangs
                              }
                              steps {
                               // hangs here
                               powershell 'powershell "cmd /c echo test inside"'
                              }
                          }
                          // This one works just fine
                          stage('Hello world on Linux') {
                              agent {
                                docker {
                                    label 'linux&&docker&&aws'
                                    image 'alpine:latest'
                                }
                              }
                              options {
                               timeout(1) // in case the pipeline hangs
                              }
                              steps {
                               sh 'echo "hello world"'
                              }
                          }
                }
              }
            }
          }
          

           

           

          Henry Borchers added a comment - stuck_tech Host OS:, Windows Server 2019  Docker images tried: mcr.microsoft.com/windows/servercore:ltsc2019 mcr.microsoft.com/powershell:preview One of my own based off of  mcr.microsoft.com/dotnet/framework/sdk:3.5 Same experience on every one.    Currently my test pipeline looks like. this   pipeline { agent any stages { stage( 'hello world' ) { parallel { // this one doesn't work stage( 'Hello world on Windows' ) { agent { docker { label 'Windows&&Docker&&aws' image 'mcr.microsoft.com/windows/servercore:ltsc2019' } } options { timeout(1) // in case the pipeline hangs } steps { // hangs here powershell 'powershell "cmd /c echo test inside" ' } } // This one works just fine stage( 'Hello world on Linux' ) { agent { docker { label 'linux&&docker&&aws' image 'alpine:latest' } } options { timeout(1) // in case the pipeline hangs } steps { sh 'echo "hello world" ' } } } } } }    

          a b added a comment - - edited

          Interesting. So different Windows Server OS and different images than mine. Also interesting that your powershell commands hang while ours do not.  Do things work as expected when you manually start or attach to the containers?

          Might be irrelevant but what method did you use to install Docker on the host node?  We used this.

          Install-Module -Name DockerMsftProvider -Repository PSGallery -Force
          Install-Package -Name docker -ProviderName DockerMsftProvider
          Restart-Computer -Force

          And also had to do these steps (install git-bash and update paths) to get around a nohup error. When running the Jenkins jobs.

          https://stackoverflow.com/questions/45140614/jenkins-pipeline-sh-fail-with-cannot-run-program-nohup-on-windows/53395989#53395989

           I'm curious if the chosen workaround for nohup has something to do with it in our case. There is a PR to address the root of the nohup issue. Perhaps I will try building that PR and see if it solves the issue. But would still like to know more about your setup.

          https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/354

          a b added a comment - - edited Interesting. So different Windows Server OS and different images than mine. Also interesting that your powershell commands hang while ours do not.  Do things work as expected when you manually start or attach to the containers? Might be irrelevant but what method did you use to install Docker on the host node?  We used this. Install-Module -Name DockerMsftProvider -Repository PSGallery -Force Install-Package -Name docker -ProviderName DockerMsftProvider Restart-Computer -Force And also had to do these steps (install git-bash and update paths) to get around a nohup error. When running the Jenkins jobs. https://stackoverflow.com/questions/45140614/jenkins-pipeline-sh-fail-with-cannot-run-program-nohup-on-windows/53395989#53395989  I'm curious if the chosen workaround for nohup has something to do with it in our case. There is a PR to address the root of the nohup issue. Perhaps I will try building that PR and see if it solves the issue. But would still like to know more about your setup. https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/354

          I'm taking a research day, so I'm working from home and haven't tried anything on the Windows server at work.

          Anyways, I tried to get a Windows Docker container working in Jenkins on my home Windows 10 machine and it worked.

          I'm thinking the server at work must have a configuration incorrectly set. I just wish I could figure out what the heck the problem is. I'm not even sure where to look.

          Henry Borchers added a comment - I'm taking a research day, so I'm working from home and haven't tried anything on the Windows server at work. Anyways, I tried to get a Windows Docker container working in Jenkins on my home Windows 10 machine and it worked. I'm thinking the server at work must have a configuration incorrectly set. I just wish I could figure out what the heck the problem is. I'm not even sure where to look.

          a b added a comment -

          I agree it may be server config related but hard to understand why or where to start down that path. I feel like the nohup thing is a logical place to start as it has to do with how the plugin interfaces with the container shells.

          Did you run into the nohup initially and if so how did you solve it?

          a b added a comment - I agree it may be server config related but hard to understand why or where to start down that path. I feel like the nohup thing is a logical place to start as it has to do with how the plugin interfaces with the container shells. Did you run into the nohup initially and if so how did you solve it?

          Joseph Petersen (old) added a comment - - edited

          We haven't experienced this.

          Our host OS is windows 2019.

          nohup is never called during bat.

          Our workaround for nohup when using `sh` is to install git using chocolatey with GitAndUnixToolsOnPath  parameter.

          This is our Jenkins windows agents (our Windows base images only have Docker installed)

          Today I would use the existing DockerImage created in Jenkins org: https://github.com/jenkinsci/docker-jnlp-slave/blob/master/Dockerfile-windows

           

          FROM mcr.microsoft.com/windows/servercore:ltsc2019
          
          SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"]
          
          RUN Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')); `
            cinst docker-cli docker-compose adoptopenjdk8jre powershell-preview -y --no-progress; `
            cinst git --params '/GitAndUnixToolsOnPath /SChannel' -y --no-progress
          
          # temporary fix for powershell preview
          RUN [Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Program Files\PowerShell\7-preview", [EnvironmentVariableTarget]::Machine)
          
          # Adding Jenkins Setup last to avoid rebuild
          ADD JenkinsAgentSetup.ps1 .
          
          ENTRYPOINT ["powershell.exe","-executionpolicy", "bypass", "./JenkinsAgentSetup.ps1"]
          
          

          This is how we run the agent

          docker run `
            --dns 10.0.0.1 `
            --dns 1.1.1.1 `
            --dns-search company.io `
            --name jenkins.agent `
            -v "$($WORKSPACE):$($JENKINS_WORKSPACE)" `
            -v \\.\pipe\docker_engine:\\.\pipe\docker_engine `
            -e "JENKINS_MASTER_URL=$JENKINS_MASTER_URL" `
            -e "JENKINS_AGENT_NAME=$JENKINS_AGENT_NAME" `
            -e "JENKINS_AGENT_PARAMETERS=$JENKINS_AGENT_PARAMETERS" `
            -e "JENKINS_AGENT_SECRET=$JENKINS_AGENT_SECRET" `
            artifactory.company.io/docker/jenkins.agent.windows:latest
          

           

          Joseph Petersen (old) added a comment - - edited We haven't experienced this. Our host OS is windows 2019. nohup is never called during bat. Our workaround for nohup when using `sh` is to install git using chocolatey with GitAndUnixToolsOnPath  parameter. This is our Jenkins windows agents (our Windows base images only have Docker installed) Today I would use the existing DockerImage created in Jenkins org:  https://github.com/jenkinsci/docker-jnlp-slave/blob/master/Dockerfile-windows   FROM mcr.microsoft.com/windows/servercore:ltsc2019 SHELL [ "powershell" , "-Command" , "$ErrorActionPreference = 'Stop' ; $ProgressPreference = 'SilentlyContinue' ;" ] RUN Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New- Object System .Net.WebClient).DownloadString( 'https: //chocolatey.org/install.ps1' )); ` cinst docker-cli docker-compose adoptopenjdk8jre powershell-preview -y --no-progress; ` cinst git --params '/GitAndUnixToolsOnPath /SChannel' -y --no-progress # temporary fix for powershell preview RUN [Environment]::SetEnvironmentVariable( "Path" , $env:Path + ";C:\Program Files\PowerShell\7-preview" , [EnvironmentVariableTarget]::Machine) # Adding Jenkins Setup last to avoid rebuild ADD JenkinsAgentSetup.ps1 . ENTRYPOINT [ "powershell.exe" , "-executionpolicy" , "bypass" , "./JenkinsAgentSetup.ps1" ] This is how we run the agent docker run ` --dns 10.0.0.1 ` --dns 1.1.1.1 ` --dns-search company.io ` --name jenkins.agent ` -v "$($WORKSPACE):$($JENKINS_WORKSPACE)" ` -v \\.\pipe\docker_engine:\\.\pipe\docker_engine ` -e "JENKINS_MASTER_URL=$JENKINS_MASTER_URL" ` -e "JENKINS_AGENT_NAME=$JENKINS_AGENT_NAME" ` -e "JENKINS_AGENT_PARAMETERS=$JENKINS_AGENT_PARAMETERS" ` -e "JENKINS_AGENT_SECRET=$JENKINS_AGENT_SECRET" ` artifactory.company.io/docker/jenkins.agent.windows:latest  

          I'd highly encourage using Windows Server 2019 cause it allows you to pipe your docker engine.

          Our experience with Windows Server 2016 was a HUGE nope.

          Joseph Petersen (old) added a comment - I'd highly encourage using Windows Server 2019 cause it allows you to pipe your docker engine. Our experience with Windows Server 2016 was a HUGE nope.

          It started working on our end when we upgraded the version of Docker to the latest. 

          Henry Borchers added a comment - It started working on our end when we upgraded the version of Docker to the latest. 

          Fabio Heer added a comment - - edited

          It looks related to JENKINS-59903, which was fixed in Jenkins version 2.201.
          Sorry, it was reported, not fixed.

          Fabio Heer added a comment - - edited It looks related to  JENKINS-59903 , which was fixed in Jenkins version 2.201. Sorry, it was reported, not fixed.

          a b added a comment - - edited

          henryborchers Which version are you currently running? We installed fresh just a couple weeks ago but I can't check the version right now.

          casz Our current server is 2016 thus we cannot use base images beyond microsoft/windowsservercore:ltsc2016 more or less. As per henryborchers 2019 host and images can exhibit the same issues. 

          However, for other reasons we need to provision a second server so we are shooting for 2019. I will try to keep the rest of the setup the same and see if we experience the same issue.

          Our use case is different and we are not running Jenkins agents inside the container, but I suppose we could try that Jenkins image you reference if we continue to have problems on 2019. We won't be able to use it on 2016 because it seems to be based on windowsservercore 1809 which is beyond what we can run on 2016 as far as I can tell.

          Edit: Confirmed... 

          >docker pull jenkins/agent:latest-windows
          latest-windows: Pulling from jenkins/agent
          a Windows version 10.0.17763-based image is incompatible with a 10.0.14393 host

          a b added a comment - - edited henryborchers Which version are you currently running? We installed fresh just a couple weeks ago but I can't check the version right now. casz Our current server is 2016 thus we cannot use base images beyond microsoft/windowsservercore:ltsc2016 more or less. As per henryborchers 2019 host and images can exhibit the same issues.  However, for other reasons we need to provision a second server so we are shooting for 2019. I will try to keep the rest of the setup the same and see if we experience the same issue. Our use case is different and we are not running Jenkins agents inside the container, but I suppose we could try that Jenkins image you reference if we continue to have problems on 2019. We won't be able to use it on 2016 because it seems to be based on windowsservercore 1809 which is beyond what we can run on 2016 as far as I can tell. Edit : Confirmed...  >docker pull jenkins/agent:latest-windows latest-windows: Pulling from jenkins/agent a Windows version 10.0.17763-based image is incompatible with a 10.0.14393 host

          jerry wiltse added a comment -

          If you pass `–isolation=hyperv` you can run images based on any windows kernel, regardless of what kernel the host is on.

          jerry wiltse added a comment - If you pass `–isolation=hyperv` you can run images based on any windows kernel, regardless of what kernel the host is on.

          a b added a comment - - edited

          solvingj Does not work for me. At least not when trying to pull a 1903 image on Server 2019 1809. Maybe I only works for backward comparability, not forward? 

          >docker build --isolation="hyperv" -t "test_full" -f Dockerfile_1903 . Sending build context to Docker daemon 13.82kB Step 1/7 : FROM mcr.microsoft.com/powershell:7.0.0-preview.5-nanoserver-1903 7.0.0-preview.5-nanoserver-1903: Pulling from powershell a Windows version 10.0.18362-based image is incompatible with a 10.0.17763 host

          Edit: If I try and older version like nanoserver-1803 I get "The container operating system does not match the host operating system." on a powershell step.without the --isolation flag. When adding the flag I get "The request is not supported." on the same step.

          a b added a comment - - edited solvingj Does not work for me. At least not when trying to pull a 1903 image on Server 2019 1809. Maybe I only works for backward comparability, not forward?  >docker build --isolation= "hyperv" -t "test_full" -f Dockerfile_1903 . Sending build context to Docker daemon 13.82kB Step 1/7 : FROM mcr.microsoft.com/powershell:7.0.0-preview.5-nanoserver-1903 7.0.0-preview.5-nanoserver-1903: Pulling from powershell a Windows version 10.0.18362-based image is incompatible with a 10.0.17763 host Edit : If I try and older version like  nanoserver-1803 I get " The container operating system does not match the host operating system ." on a powershell step.without the --isolation flag. When adding the flag I get " The request is not supported ." on the same step.

          jerry wiltse added a comment - - edited

          I don't use quotes, but i don't think that's the issue.  I think you either need to be on a newer version of docker, or you need to enable experimental features.  

          jerry wiltse added a comment - - edited I don't use quotes, but i don't think that's the issue.  I think you either need to be on a newer version of docker, or you need to enable experimental features.  

          a b added a comment -

          We're on the latest version 19.03.4 on Server 2019 (1809) now. Just edited my previous comment with more info. I get different results from docker build if I add the --isolation flag. Fails either way but getting a different result makes me thing it's attempting to apply the isolation setting. Either way I don't think it will solve our issues.

          a b added a comment - We're on the latest version 19.03.4 on Server 2019 (1809) now. Just edited my previous comment with more info. I get different results from docker build if I add the --isolation flag. Fails either way but getting a different result makes me thing it's attempting to apply the isolation setting. Either way I don't think it will solve our issues.

          jerry wiltse added a comment -

          This matrix is how I learned about it.  It might be helpful to you: https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility

          jerry wiltse added a comment - This matrix is how I learned about it.  It might be helpful to you:  https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility

          a b added a comment -

          We've now tried this on a Windows Server 2019 host (1809) and a vanilla{{ }}mcr.microsoft.com/windows/servercore:1809 image and the bat call still hangs.  Even worse when we try a nanoserver image such as mcr.microsoft.com/powershell:nanoserver-1809 it will hang on both powershell and bat and when the Jenkins job is manually cancelled the server will experience a critical error and reboot itself!

           

          Critical: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

           

          a b added a comment - We've now tried this on a Windows Server 2019 host (1809) and a vanilla{{ }} mcr.microsoft.com/windows/servercore:1809 image and the bat call still hangs.  Even worse when we try a nanoserver image such as  mcr.microsoft.com/powershell:nanoserver-1809 it will hang on both powershell and bat and when the Jenkins job is manually cancelled the server will experience a critical error and reboot itself!   Critical: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.  

          How did you install docker?
          Please follow this guide
          https://docs.docker.com/install/windows/docker-ee/

          Joseph Petersen (old) added a comment - How did you install docker? Please follow this guide https://docs.docker.com/install/windows/docker-ee/

          a b added a comment -

          That is the method we used for install.

          The curious thing is that sometimes the bat calls work. We have some large and complex pipeline scripts in which many or even most of the bat calls will work but some still fail and the hello world example above fails ever time.

          I suspect that something else in the larger pipelines are inadvertently sidestepping / “fixing” the issue in real time. So there might be some kind of context in which the calls work and another (like the hello world) where they don’t. Haven’t been able to narrow it down yet.

          a b added a comment - That is the method we used for install. The curious thing is that sometimes the bat calls work. We have some large and complex pipeline scripts in which many or even most of the bat calls will work but some still fail and the hello world example above fails ever time. I suspect that something else in the larger pipelines are inadvertently sidestepping / “fixing” the issue in real time. So there might be some kind of context in which the calls work and another (like the hello world) where they don’t. Haven’t been able to narrow it down yet.

          Wish I could help you but we haven't experienced the issue

          Joseph Petersen (old) added a comment - Wish I could help you but we haven't experienced the issue

            Unassigned Unassigned
            stuck_tech a b
            Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: