Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70157

StackoverflowException when binding drive D: to Inbound Agent (Windows Docker Image)

      I'm trying to run the Inbound Agent on Windows as an Docker image I get the following error:

      Process is terminated due to StackOverflowException.

      The command I'm executing is the following:

      docker run -it --rm -v 'd:/:d:' jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 

      When omitting the volume mount parameter the container runs successfully. The Inbound Agent on Windows has worked for me for years, so I'm guessing something must has been changed on the Agent, so its very unlikely it's a problem with Windows and/or Docker itself. 

      Also when running this command I can see the contents of D:\:

      docker run -it --rm -v "d:/:d:" mcr.microsoft.com/windows/nanoserver:ltsc2019 cmd /s /c dir d: 

      Thank you.

          [JENKINS-70157] StackoverflowException when binding drive D: to Inbound Agent (Windows Docker Image)

          Fabian Grutschus added a comment - - edited

          Reverted back to 3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 which is working while 3071.v7e9b_0dc08466-5-jdk11-windowsservercore-ltsc2019 is still affected.

          Fabian Grutschus added a comment - - edited Reverted back to 3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 which is working while 3071.v7e9b_0dc08466-5-jdk11-windowsservercore-ltsc2019 is still affected.

          Hi fabiang , could you provide more infirmation to help us reproduce?

          Just tried the following stack (I'm working on sharing it to help but need more time on this):

           

          • Windows 11 Pro with Docker Desktop in "Windows Engine" mode
          • Started a controller with:
          docker run -p 8080:8080 -p 40000:40000 -d jenkins/jenkins:windowsservercore-2019 
          • Started an agent container with: 
            docker run --rm -ti -v "d:/:d:" --entrypoint=cmd jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019
          • Settting up the controller with the following settings:
            • Install recommended plugins
            • Set up controller hostname to the public IP of my host machine
            • Disable authentication and autorization
            • Create a new permanent agent named "test1" with the workdir set to "C:/Users/jenkins/Work" and websockets enabled
          • Checking that "D:" is available with the command "dir D:" => I see the content of my D: harddrive
          • Starting the agent process with the following command:
            java -jar C:/ProgramData/Jenkins/agent.jar -jnlpUrl http://<host Public IP>:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work
          • The agent starts fine and I can run a pipeline with no error.

           

          I've repeated this with the following inbound agent images:

           

           

          Let's see what differs between our 2 setups to understand what is going on.

          Damien Duportal added a comment - Hi fabiang , could you provide more infirmation to help us reproduce? Just tried the following stack (I'm working on sharing it to help but need more time on this):   Windows 11 Pro with Docker Desktop in "Windows Engine" mode Started a controller with: docker run -p 8080:8080 -p 40000:40000 -d jenkins/jenkins:windowsservercore-2019 Started an agent container with:  docker run --rm -ti -v "d:/:d:" --entrypoint=cmd jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 Settting up the controller with the following settings: Install recommended plugins Set up controller hostname to the public IP of my host machine Disable authentication and autorization Create a new permanent agent named "test1" with the workdir set to "C:/Users/jenkins/Work" and websockets enabled Checking that "D:" is available with the command "dir D:" => I see the content of my D: harddrive Starting the agent process with the following command: java -jar C:/ProgramData/Jenkins/agent.jar -jnlpUrl http: //<host Public IP>:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work The agent starts fine and I can run a pipeline with no error.   I've repeated this with the following inbound agent images: "jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019" freshly pulled " jenkins/inbound-agent:3077.vd69cf116da_6f-3-windowsservercore-ltsc2019 " (release yesterday) "jenkins/inbound-agent: 3071.v7e9b_0dc08466-5-jdk11-windowsservercore-ltsc2019" { }     Let's see what differs between our 2 setups to understand what is going on.

          Hey dduportal,

          thank you for looking into it. I've done the same steps as you and then it's working for me.

          But instead of running the java command inside the container and doing the following, I can reproduce the error:

          docker run --rm -ti -v "d:/:d:" jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 -jnlpUrl http://172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work 

          While this command it working:

          docker run --rm -ti -v "d:/:d:" jenkins/inbound-agent:3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 -jnlpUrl http://172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work 

          Output:

          C:\ProgramData\Jenkins\jenkins-agent.ps1 : A parameter cannot be found that matches parameter name 'jnlpUrl'.
              + CategoryInfo          : InvalidArgument: (:) [jenkins-agent.ps1], ParentContainsErrorRecordException
              + FullyQualifiedErrorId : NamedParameterNotFound,jenkins-agent.ps1 

          When omitting ---volume I got the same output:

          docker run --rm -ti jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 -jnlpUrl http://172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work 

          So my guess here it's related to the entrypoint script `jenkins-agent.ps1` in the container somehow?

          Some extra infos

          Operating system: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.3650)

          No Docker Desktop

          docker info

          Client:
           Context:    default
           Debug Mode: false
          
          Server:
           Containers: 1
            Running: 1
            Paused: 0
            Stopped: 0
           Images: 51
           Server Version: 20.10.17
           Storage Driver: windowsfilter
            Windows:
           Logging Driver: json-file
           Plugins:
            Volume: local
            Network: ics internal l2bridge l2tunnel nat null overlay private transparent
            Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
           Swarm: inactive
           Default Isolation: process
           Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)
           Operating System: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.3650)
           OSType: windows
           Architecture: x86_64
           CPUs: 4
           Total Memory: 4GiB
           Name: xxxxxx
           ID: xxxxxx
           Docker Root Dir: D:\docker
           Debug Mode: false
           Registry: https://index.docker.io/v1/
           Labels:
           Experimental: false
           Insecure Registries:
            127.0.0.0/8
           Live Restore Enabled: false 

          docker version

          Client:
           Version:           20.10.17
           API version:       1.41
           Go version:        go1.17.11
           Git commit:        100c70180f
           Built:             Tue Jun  7 02:24:49 2022
           OS/Arch:           windows/amd64
           Context:           default
           Experimental:      true
          
          Server:
           Engine:
            Version:          20.10.17
            API version:      1.41 (minimum version 1.24)
            Go version:       go1.17.11
            Git commit:       a89b84221c
            Built:            Tue Jun  7 02:22:55 2022
            OS/Arch:          windows/amd64
            Experimental:     false 

           

          Fabian Grutschus added a comment - Hey dduportal , thank you for looking into it. I've done the same steps as you and then it's working for me. But instead of running the java command inside the container and doing the following, I can reproduce the error: docker run --rm -ti -v "d:/:d:" jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 -jnlpUrl http: //172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work While this command it working: docker run --rm -ti -v "d:/:d:" jenkins/inbound-agent:3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 -jnlpUrl http: //172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work Output: C:\ProgramData\Jenkins\jenkins-agent.ps1 : A parameter cannot be found that matches parameter name 'jnlpUrl' .     + CategoryInfo          : InvalidArgument: (:) [jenkins-agent.ps1], ParentContainsErrorRecordException     + FullyQualifiedErrorId : NamedParameterNotFound,jenkins-agent.ps1 When omitting ---volume I got the same output: docker run --rm -ti jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 -jnlpUrl http: //172.22.0.65:8080/computer/test1/jenkins-agent.jnlp -workDir C:/Users/jenkins/Work So my guess here it's related to the entrypoint script `jenkins-agent.ps1` in the container somehow? Some extra infos Operating system: Windows Server 2019 Datacenter Version 1809 (OS Build 17763.3650) No Docker Desktop docker info Client:  Context:     default  Debug Mode: false Server:  Containers: 1   Running: 1   Paused: 0   Stopped: 0  Images: 51  Server Version: 20.10.17  Storage Driver: windowsfilter   Windows:  Logging Driver: json-file  Plugins:   Volume: local   Network: ics internal l2bridge l2tunnel nat null overlay private transparent   Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog  Swarm: inactive  Default Isolation: process  Kernel Version: 10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)  Operating System : Windows Server 2019 Datacenter Version 1809 (OS Build 17763.3650)  OSType: windows  Architecture: x86_64  CPUs: 4  Total Memory: 4GiB  Name: xxxxxx  ID: xxxxxx  Docker Root Dir: D:\docker  Debug Mode: false  Registry: https: //index.docker.io/v1/  Labels:  Experimental: false  Insecure Registries:   127.0.0.0/8  Live Restore Enabled: false docker version Client:  Version:           20.10.17  API version:       1.41  Go version:        go1.17.11  Git commit:        100c70180f  Built:             Tue Jun  7 02:24:49 2022  OS/Arch:           windows/amd64  Context:           default  Experimental:       true Server:  Engine:   Version:          20.10.17   API version:      1.41 (minimum version 1.24)   Go version:       go1.17.11   Git commit:       a89b84221c   Built:            Tue Jun  7 02:22:55 2022   OS/Arch:          windows/amd64   Experimental:     false  

          I've also tried this (in cmd):

          docker run --rm -ti -v "d:/:d:" --entrypoint=cmd jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019
          # inside container then
          C:\ProgramData\Jenkins\jenkins-agent.ps1 -workdir C:/Users/jenkins/Work
          # container seem to be getting killed
          echo %ERRORLEVEL% # Output: -1073741571

          While 3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 is working again

          Fabian Grutschus added a comment - I've also tried this (in cmd): docker run --rm -ti -v "d:/:d:" --entrypoint=cmd jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 # inside container then C:\ProgramData\Jenkins\jenkins-agent.ps1 -workdir C:/Users/jenkins/Work # container seem to be getting killed echo %ERRORLEVEL% # Output: -1073741571 While 3071.v7e9b_0dc08466-1-jdk17-windowsservercore-ltsc2019 is working again

          Fabian Grutschus added a comment - - edited

          To make it even weirder, this fails:

          docker run --rm -ti -v "d:/:d:" --entrypoint=powershell jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019

          ok: 

          docker run --rm -ti --entrypoint=powershell jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019

          ok:

          docker run --rm -ti -v "d:/:d:" --entrypoint=powershell jenkins/inbound-agent:3071.v7e9b_0dc08466-1-jdk11-windowsservercore-ltsc2019 

          So we can say it's not the entrypoint script. Maybe because any new PS modules is installed in any of the upstream images?

          Fabian Grutschus added a comment - - edited To make it even weirder, this fails: docker run --rm -ti -v "d:/:d:" --entrypoint=powershell jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 ok:  docker run --rm -ti --entrypoint=powershell jenkins/inbound-agent:jdk11-windowsservercore-ltsc2019 ok: docker run --rm -ti -v "d:/:d:" --entrypoint=powershell jenkins/inbound-agent:3071.v7e9b_0dc08466-1-jdk11-windowsservercore-ltsc2019 So we can say it's not the entrypoint script. Maybe because any new PS modules is installed in any of the upstream images?

          Fabian Grutschus added a comment - - edited

          It's indeed an upstream issue with Eclipse Temurin. I've opened a bug here: https://github.com/adoptium/adoptium-support/issues/642

          Fabian Grutschus added a comment - - edited It's indeed an upstream issue with Eclipse Temurin. I've opened a bug here: https://github.com/adoptium/adoptium-support/issues/642

          It turned out to be an issue with Powershell and/or Windows, but only happens when Docker's `data-root` is on the same drive as the bind mount. This issue can be closed. Upstream bug is here: https://github.com/microsoft/Windows-Containers/issues/308

          Fabian Grutschus added a comment - It turned out to be an issue with Powershell and/or Windows, but only happens when Docker's `data-root` is on the same drive as the bind mount. This issue can be closed. Upstream bug is here: https://github.com/microsoft/Windows-Containers/issues/308

            Unassigned Unassigned
            fabiang Fabian Grutschus
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: