Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51145

PowerShell pipeline step does not seem to be durable

      While running a PowerShell pipeline step I triggered a graceful shutdown, while also monitoring the powershell.exe process on the Jenkins host. The PowerShell process unexpectedly crashed, and the job failed with a -1 exit code.

      Job definition:

      node ('Windows') {
          powershell '''
          while (1) {
              write-host "Testing"
              Start-Sleep 1
          }
          '''
      }
      

      Console output:

      Started by user Gabriel Loewen
      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] node
      Running on WinHost in C:\Program Files (x86)\Jenkins\workspace\Test Durability
      [Pipeline] {
      [Pipeline] powershell
      [Test Durability] Running PowerShell script
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Resuming build at Fri May 04 17:40:07 UTC 2018 after Jenkins restart
      Waiting to resume part of Test Durability #5: ???
      Testing
      Waiting to resume part of Test Durability #5: WinHost is offline
      Waiting to resume part of Test Durability #5: WinHost is offline
      Waiting to resume part of Test Durability #5: WinHost is offline
      Ready to run at Fri May 04 17:40:19 UTC 2018
      Testing
      Testing
      Testing
      Testing
      Testing
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE

       

      I ran the same test again and this time it hung, even though the PowerShell process seems to have crashed.

      At this point I do not know how to debug the issue, and what can be done to ensure durability in the powershell pipeline step.

      svanoort, teilo, let me know if you need any more details and if you can assist in this investigation.

          [JENKINS-51145] PowerShell pipeline step does not seem to be durable

          Sam Van Oort added a comment -

          gabloe Do you have the durability setting for the Pipeline and the versions of workflow-support, workflow-cps, and workflow-job handy? I want to make sure that doesn't relate to a bug in one of those last 3 plugins (have been doing a lot of fixes there recently).

          Sam Van Oort added a comment - gabloe Do you have the durability setting for the Pipeline and the versions of workflow-support, workflow-cps, and workflow-job handy? I want to make sure that doesn't relate to a bug in one of those last 3 plugins (have been doing a lot of fixes there recently).

          Gabriel Loewen added a comment - - edited

          Durability is set to max durability (MAX_SURVIVABILITY).  I've updated the environment details to include those versions.  Thanks!

          Gabriel Loewen added a comment - - edited Durability is set to max durability (MAX_SURVIVABILITY).  I've updated the environment details to include those versions.  Thanks!

          Actually, I see the same behavior for the windows batch step.  Looking at the jenkins logs I see the following, which seems to be a likely culprit.

          After disconnecting the slave agent, the powershell process continues to work normally, but as soon as the slave agent is reconnected, it seems like the first thing that happens is that is kills the powershell process.  Is this expected?  Or could I have something misconfigured?

          2018-05-08 09:45:12,931 DEBUG - Starting ServiceWrapper in the CLI mode
          2018-05-08 09:45:13,042 INFO - Restarting the service with id 'jenkinsslave-C__Program Files (x86)_Jenkins'
          2018-05-08 09:45:13,061 INFO - Stopping jenkinsslave-C__Program Files (x86)_Jenkins
          2018-05-08 09:45:13,066 DEBUG - ProcessKill 14096
          2018-05-08 09:45:13,185 INFO - Found child process: 8368 Name: conhost.exe
          2018-05-08 09:45:13,191 INFO - Found child process: 13712 Name: powershell.exe
          2018-05-08 09:45:13,273 INFO - Stopping process 8368
          2018-05-08 09:45:13,283 INFO - Send SIGINT 8368
          2018-05-08 09:45:13,290 WARN - SIGINT to 8368 failed - Killing as fallback
          2018-05-08 09:45:13,374 INFO - Found child process: 9432 Name: conhost.exe
          2018-05-08 09:45:13,458 INFO - Stopping process 9432
          2018-05-08 09:45:13,466 INFO - Send SIGINT 9432
          2018-05-08 09:45:13,471 WARN - SIGINT to 9432 failed - Killing as fallback
          2018-05-08 09:45:13,476 INFO - Stopping process 13712
          2018-05-08 09:45:13,484 INFO - Send SIGINT 13712
          2018-05-08 09:45:13,490 WARN - SIGINT to 13712 failed - Killing as fallback
          2018-05-08 09:45:13,496 INFO - Stopping process 14096
          2018-05-08 09:45:13,504 INFO - Send SIGINT 14096
          2018-05-08 09:45:13,510 WARN - SIGINT to 14096 failed - Killing as fallback

          Gabriel Loewen added a comment - Actually, I see the same behavior for the windows batch step.  Looking at the jenkins logs I see the following, which seems to be a likely culprit. After disconnecting the slave agent, the powershell process continues to work normally, but as soon as the slave agent is reconnected, it seems like the first thing that happens is that is kills the powershell process.  Is this expected?  Or could I have something misconfigured? 2018-05-08 09:45:12,931 DEBUG - Starting ServiceWrapper in the CLI mode 2018-05-08 09:45:13,042 INFO - Restarting the service with id 'jenkinsslave-C__Program Files (x86)_Jenkins' 2018-05-08 09:45:13,061 INFO - Stopping jenkinsslave-C__Program Files (x86)_Jenkins 2018-05-08 09:45:13,066 DEBUG - ProcessKill 14096 2018-05-08 09:45:13,185 INFO - Found child process: 8368 Name: conhost.exe 2018-05-08 09:45:13,191 INFO - Found child process: 13712 Name: powershell.exe 2018-05-08 09:45:13,273 INFO - Stopping process 8368 2018-05-08 09:45:13,283 INFO - Send SIGINT 8368 2018-05-08 09:45:13,290 WARN - SIGINT to 8368 failed - Killing as fallback 2018-05-08 09:45:13,374 INFO - Found child process: 9432 Name: conhost.exe 2018-05-08 09:45:13,458 INFO - Stopping process 9432 2018-05-08 09:45:13,466 INFO - Send SIGINT 9432 2018-05-08 09:45:13,471 WARN - SIGINT to 9432 failed - Killing as fallback 2018-05-08 09:45:13,476 INFO - Stopping process 13712 2018-05-08 09:45:13,484 INFO - Send SIGINT 13712 2018-05-08 09:45:13,490 WARN - SIGINT to 13712 failed - Killing as fallback 2018-05-08 09:45:13,496 INFO - Stopping process 14096 2018-05-08 09:45:13,504 INFO - Send SIGINT 14096 2018-05-08 09:45:13,510 WARN - SIGINT to 14096 failed - Killing as fallback

          This seems to be the same issue as described here: https://issues.jenkins-ci.org/browse/JENKINS-27617

          Am I to understand that durable tasks are not actually durable on Windows when Jenkins is running as a Windows service?

          Gabriel Loewen added a comment - This seems to be the same issue as described here:  https://issues.jenkins-ci.org/browse/JENKINS-27617 Am I to understand that durable tasks are not actually durable on Windows when Jenkins is running as a Windows service?

          vivek You've assigned this bug to me, but I am actually not working on this.  The underlying issue is that when Jenkins is running as a windows service and the agent goes offline, once the agent is reconnected the entire process tree under slave.jar including the durable tasks are killed.  I believe that the fix for this needs to be process isolation for any durable tasks running on windows agents.

          Gabriel Loewen added a comment - vivek You've assigned this bug to me, but I am actually not working on this.  The underlying issue is that when Jenkins is running as a windows service and the agent goes offline, once the agent is reconnected the entire process tree under slave.jar including the durable tasks are killed.  I believe that the fix for this needs to be process isolation for any durable tasks running on windows agents.

          Vivek Pandey added a comment -

          gabloe No worries. Thanks for the update and details.

          Vivek Pandey added a comment - gabloe No worries. Thanks for the update and details.

          Carroll Chiou added a comment -

          Closing as it is a duplicate of JENKINS-27617

          Carroll Chiou added a comment - Closing as it is a duplicate of JENKINS-27617

            Unassigned Unassigned
            gabloe Gabriel Loewen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: