-
Bug
-
Resolution: Unresolved
-
Minor
-
Powered by SuggestiMate
Description
bat steps hang (endless spinning wheel in the jobs console output) even for simple Windows containers.
bat "echo test inside"
Troubleshooting & Additional info
powershell and all other commands tried so far work without issue. Even using powershell to wrap cmd.exe commands works fine. Example:
powershell "cmd /c echo test inside"
Running the image manually on the node host exhibits no issues. i.e. can run docker run -it microsoft/windowsservercore:ltsc2016 and happy use cmd and all other commands without issue.
Similarly we can attach to the container spun up by the Jenkins job while it's hung and execute the same echo command (or any other) without issue.
Others have not had this issue so it could be something specific in our setup, but I have not been able to pinpoint anything. https://github.com/jenkinsci/docker-workflow-plugin/pull/184#issuecomment-539213785
The job console output shows no errors and neither does the main Jenkins log under /log/all. No errors if any kind while the job is running / hung.
Setup
Jenkins node host: Windows Server 2016 (1607)
Docker image: microsoft/windowsservercore:ltsc2016
Happens regardless if docker {} or dockerfile {} syntax is used.
Specifically using declarative pipeline scripts. Have not tested other methods
pipeline { agent { docker { image 'microsoft/windowsservercore:ltsc2016' label 'windows' } } stages { stage('Example Build') { steps{ bat "echo test inside" } } } }
[JENKINS-59893] bat calls hang in Windows Docker container in declarative pipeline script
Host OS:, Windows Server 2019
Docker images tried:
- mcr.microsoft.com/windows/servercore:ltsc2019
- mcr.microsoft.com/powershell:preview
- One of my own based off of mcr.microsoft.com/dotnet/framework/sdk:3.5
Same experience on every one.
Currently my test pipeline looks like. this
pipeline { agent any stages { stage('hello world') { parallel { // this one doesn't work stage('Hello world on Windows') { agent { docker { label 'Windows&&Docker&&aws' image 'mcr.microsoft.com/windows/servercore:ltsc2019' } } options { timeout(1) // in case the pipeline hangs } steps { // hangs here powershell 'powershell "cmd /c echo test inside"' } } // This one works just fine stage('Hello world on Linux') { agent { docker { label 'linux&&docker&&aws' image 'alpine:latest' } } options { timeout(1) // in case the pipeline hangs } steps { sh 'echo "hello world"' } } } } } }
Interesting. So different Windows Server OS and different images than mine. Also interesting that your powershell commands hang while ours do not. Do things work as expected when you manually start or attach to the containers?
Might be irrelevant but what method did you use to install Docker on the host node? We used this.
Install-Module -Name DockerMsftProvider -Repository PSGallery -Force Install-Package -Name docker -ProviderName DockerMsftProvider Restart-Computer -Force
And also had to do these steps (install git-bash and update paths) to get around a nohup error. When running the Jenkins jobs.
I'm curious if the chosen workaround for nohup has something to do with it in our case. There is a PR to address the root of the nohup issue. Perhaps I will try building that PR and see if it solves the issue. But would still like to know more about your setup.
https://github.com/jenkinsci/pipeline-model-definition-plugin/pull/354
I'm taking a research day, so I'm working from home and haven't tried anything on the Windows server at work.
Anyways, I tried to get a Windows Docker container working in Jenkins on my home Windows 10 machine and it worked.
I'm thinking the server at work must have a configuration incorrectly set. I just wish I could figure out what the heck the problem is. I'm not even sure where to look.
I agree it may be server config related but hard to understand why or where to start down that path. I feel like the nohup thing is a logical place to start as it has to do with how the plugin interfaces with the container shells.
Did you run into the nohup initially and if so how did you solve it?
We haven't experienced this.
Our host OS is windows 2019.
nohup is never called during bat.
Our workaround for nohup when using `sh` is to install git using chocolatey with GitAndUnixToolsOnPath parameter.
This is our Jenkins windows agents (our Windows base images only have Docker installed)
Today I would use the existing DockerImage created in Jenkins org: https://github.com/jenkinsci/docker-jnlp-slave/blob/master/Dockerfile-windows
FROM mcr.microsoft.com/windows/servercore:ltsc2019 SHELL ["powershell", "-Command", "$ErrorActionPreference = 'Stop'; $ProgressPreference = 'SilentlyContinue';"] RUN Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1')); ` cinst docker-cli docker-compose adoptopenjdk8jre powershell-preview -y --no-progress; ` cinst git --params '/GitAndUnixToolsOnPath /SChannel' -y --no-progress # temporary fix for powershell preview RUN [Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\Program Files\PowerShell\7-preview", [EnvironmentVariableTarget]::Machine) # Adding Jenkins Setup last to avoid rebuild ADD JenkinsAgentSetup.ps1 . ENTRYPOINT ["powershell.exe","-executionpolicy", "bypass", "./JenkinsAgentSetup.ps1"]
This is how we run the agent
docker run ` --dns 10.0.0.1 ` --dns 1.1.1.1 ` --dns-search company.io ` --name jenkins.agent ` -v "$($WORKSPACE):$($JENKINS_WORKSPACE)" ` -v \\.\pipe\docker_engine:\\.\pipe\docker_engine ` -e "JENKINS_MASTER_URL=$JENKINS_MASTER_URL" ` -e "JENKINS_AGENT_NAME=$JENKINS_AGENT_NAME" ` -e "JENKINS_AGENT_PARAMETERS=$JENKINS_AGENT_PARAMETERS" ` -e "JENKINS_AGENT_SECRET=$JENKINS_AGENT_SECRET" ` artifactory.company.io/docker/jenkins.agent.windows:latest
I'd highly encourage using Windows Server 2019 cause it allows you to pipe your docker engine.
Our experience with Windows Server 2016 was a HUGE nope.
It started working on our end when we upgraded the version of Docker to the latest.
It looks related to JENKINS-59903, which was fixed in Jenkins version 2.201.
Sorry, it was reported, not fixed.
henryborchers Which version are you currently running? We installed fresh just a couple weeks ago but I can't check the version right now.
casz Our current server is 2016 thus we cannot use base images beyond microsoft/windowsservercore:ltsc2016 more or less. As per henryborchers 2019 host and images can exhibit the same issues.
However, for other reasons we need to provision a second server so we are shooting for 2019. I will try to keep the rest of the setup the same and see if we experience the same issue.
Our use case is different and we are not running Jenkins agents inside the container, but I suppose we could try that Jenkins image you reference if we continue to have problems on 2019. We won't be able to use it on 2016 because it seems to be based on windowsservercore 1809 which is beyond what we can run on 2016 as far as I can tell.
Edit: Confirmed...
>docker pull jenkins/agent:latest-windows latest-windows: Pulling from jenkins/agent a Windows version 10.0.17763-based image is incompatible with a 10.0.14393 host
If you pass `–isolation=hyperv` you can run images based on any windows kernel, regardless of what kernel the host is on.
solvingj Does not work for me. At least not when trying to pull a 1903 image on Server 2019 1809. Maybe I only works for backward comparability, not forward?
>docker build --isolation="hyperv" -t "test_full" -f Dockerfile_1903 . Sending build context to Docker daemon 13.82kB Step 1/7 : FROM mcr.microsoft.com/powershell:7.0.0-preview.5-nanoserver-1903 7.0.0-preview.5-nanoserver-1903: Pulling from powershell a Windows version 10.0.18362-based image is incompatible with a 10.0.17763 host
Edit: If I try and older version like nanoserver-1803 I get "The container operating system does not match the host operating system." on a powershell step.without the --isolation flag. When adding the flag I get "The request is not supported." on the same step.
I don't use quotes, but i don't think that's the issue. I think you either need to be on a newer version of docker, or you need to enable experimental features.
We're on the latest version 19.03.4 on Server 2019 (1809) now. Just edited my previous comment with more info. I get different results from docker build if I add the --isolation flag. Fails either way but getting a different result makes me thing it's attempting to apply the isolation setting. Either way I don't think it will solve our issues.
This matrix is how I learned about it. It might be helpful to you: https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility
We've now tried this on a Windows Server 2019 host (1809) and a vanilla{{ }}mcr.microsoft.com/windows/servercore:1809 image and the bat call still hangs. Even worse when we try a nanoserver image such as mcr.microsoft.com/powershell:nanoserver-1809 it will hang on both powershell and bat and when the Jenkins job is manually cancelled the server will experience a critical error and reboot itself!
Critical: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
How did you install docker?
Please follow this guide
https://docs.docker.com/install/windows/docker-ee/
That is the method we used for install.
The curious thing is that sometimes the bat calls work. We have some large and complex pipeline scripts in which many or even most of the bat calls will work but some still fail and the hello world example above fails ever time.
I suspect that something else in the larger pipelines are inadvertently sidestepping / “fixing” the issue in real time. So there might be some kind of context in which the calls work and another (like the hello world) where they don’t. Haven’t been able to narrow it down yet.
I'm also having this problem. However it's hanging on powershell as well.
Edit: typo