Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51145

PowerShell pipeline step does not seem to be durable

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      While running a PowerShell pipeline step I triggered a graceful shutdown, while also monitoring the powershell.exe process on the Jenkins host. The PowerShell process unexpectedly crashed, and the job failed with a -1 exit code.

      Job definition:

      node ('Windows') {
          powershell '''
          while (1) {
              write-host "Testing"
              Start-Sleep 1
          }
          '''
      }
      

      Console output:

      Started by user Gabriel Loewen
      Running in Durability level: MAX_SURVIVABILITY
      [Pipeline] node
      Running on WinHost in C:\Program Files (x86)\Jenkins\workspace\Test Durability
      [Pipeline] {
      [Pipeline] powershell
      [Test Durability] Running PowerShell script
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Testing
      Resuming build at Fri May 04 17:40:07 UTC 2018 after Jenkins restart
      Waiting to resume part of Test Durability #5: ???
      Testing
      Waiting to resume part of Test Durability #5: WinHost is offline
      Waiting to resume part of Test Durability #5: WinHost is offline
      Waiting to resume part of Test Durability #5: WinHost is offline
      Ready to run at Fri May 04 17:40:19 UTC 2018
      Testing
      Testing
      Testing
      Testing
      Testing
      [Pipeline] }
      [Pipeline] // node
      [Pipeline] End of Pipeline
      ERROR: script returned exit code -1
      Finished: FAILURE

       

      I ran the same test again and this time it hung, even though the PowerShell process seems to have crashed.

      At this point I do not know how to debug the issue, and what can be done to ensure durability in the powershell pipeline step.

      Sam Van Oort, James Nord, let me know if you need any more details and if you can assist in this investigation.

        Attachments

          Issue Links

            Activity

            Hide
            svanoort Sam Van Oort added a comment -

            Gabriel Loewen Do you have the durability setting for the Pipeline and the versions of workflow-support, workflow-cps, and workflow-job handy? I want to make sure that doesn't relate to a bug in one of those last 3 plugins (have been doing a lot of fixes there recently).

            Show
            svanoort Sam Van Oort added a comment - Gabriel Loewen Do you have the durability setting for the Pipeline and the versions of workflow-support, workflow-cps, and workflow-job handy? I want to make sure that doesn't relate to a bug in one of those last 3 plugins (have been doing a lot of fixes there recently).
            Hide
            gabloe Gabriel Loewen added a comment - - edited

            Durability is set to max durability (MAX_SURVIVABILITY).  I've updated the environment details to include those versions.  Thanks!

            Show
            gabloe Gabriel Loewen added a comment - - edited Durability is set to max durability (MAX_SURVIVABILITY).  I've updated the environment details to include those versions.  Thanks!
            Hide
            gabloe Gabriel Loewen added a comment -

            Actually, I see the same behavior for the windows batch step.  Looking at the jenkins logs I see the following, which seems to be a likely culprit.

            After disconnecting the slave agent, the powershell process continues to work normally, but as soon as the slave agent is reconnected, it seems like the first thing that happens is that is kills the powershell process.  Is this expected?  Or could I have something misconfigured?

            2018-05-08 09:45:12,931 DEBUG - Starting ServiceWrapper in the CLI mode
            2018-05-08 09:45:13,042 INFO - Restarting the service with id 'jenkinsslave-C__Program Files (x86)_Jenkins'
            2018-05-08 09:45:13,061 INFO - Stopping jenkinsslave-C__Program Files (x86)_Jenkins
            2018-05-08 09:45:13,066 DEBUG - ProcessKill 14096
            2018-05-08 09:45:13,185 INFO - Found child process: 8368 Name: conhost.exe
            2018-05-08 09:45:13,191 INFO - Found child process: 13712 Name: powershell.exe
            2018-05-08 09:45:13,273 INFO - Stopping process 8368
            2018-05-08 09:45:13,283 INFO - Send SIGINT 8368
            2018-05-08 09:45:13,290 WARN - SIGINT to 8368 failed - Killing as fallback
            2018-05-08 09:45:13,374 INFO - Found child process: 9432 Name: conhost.exe
            2018-05-08 09:45:13,458 INFO - Stopping process 9432
            2018-05-08 09:45:13,466 INFO - Send SIGINT 9432
            2018-05-08 09:45:13,471 WARN - SIGINT to 9432 failed - Killing as fallback
            2018-05-08 09:45:13,476 INFO - Stopping process 13712
            2018-05-08 09:45:13,484 INFO - Send SIGINT 13712
            2018-05-08 09:45:13,490 WARN - SIGINT to 13712 failed - Killing as fallback
            2018-05-08 09:45:13,496 INFO - Stopping process 14096
            2018-05-08 09:45:13,504 INFO - Send SIGINT 14096
            2018-05-08 09:45:13,510 WARN - SIGINT to 14096 failed - Killing as fallback

            Show
            gabloe Gabriel Loewen added a comment - Actually, I see the same behavior for the windows batch step.  Looking at the jenkins logs I see the following, which seems to be a likely culprit. After disconnecting the slave agent, the powershell process continues to work normally, but as soon as the slave agent is reconnected, it seems like the first thing that happens is that is kills the powershell process.  Is this expected?  Or could I have something misconfigured? 2018-05-08 09:45:12,931 DEBUG - Starting ServiceWrapper in the CLI mode 2018-05-08 09:45:13,042 INFO - Restarting the service with id 'jenkinsslave-C__Program Files (x86)_Jenkins' 2018-05-08 09:45:13,061 INFO - Stopping jenkinsslave-C__Program Files (x86)_Jenkins 2018-05-08 09:45:13,066 DEBUG - ProcessKill 14096 2018-05-08 09:45:13,185 INFO - Found child process: 8368 Name: conhost.exe 2018-05-08 09:45:13,191 INFO - Found child process: 13712 Name: powershell.exe 2018-05-08 09:45:13,273 INFO - Stopping process 8368 2018-05-08 09:45:13,283 INFO - Send SIGINT 8368 2018-05-08 09:45:13,290 WARN - SIGINT to 8368 failed - Killing as fallback 2018-05-08 09:45:13,374 INFO - Found child process: 9432 Name: conhost.exe 2018-05-08 09:45:13,458 INFO - Stopping process 9432 2018-05-08 09:45:13,466 INFO - Send SIGINT 9432 2018-05-08 09:45:13,471 WARN - SIGINT to 9432 failed - Killing as fallback 2018-05-08 09:45:13,476 INFO - Stopping process 13712 2018-05-08 09:45:13,484 INFO - Send SIGINT 13712 2018-05-08 09:45:13,490 WARN - SIGINT to 13712 failed - Killing as fallback 2018-05-08 09:45:13,496 INFO - Stopping process 14096 2018-05-08 09:45:13,504 INFO - Send SIGINT 14096 2018-05-08 09:45:13,510 WARN - SIGINT to 14096 failed - Killing as fallback
            Hide
            gabloe Gabriel Loewen added a comment -

            This seems to be the same issue as described here: https://issues.jenkins-ci.org/browse/JENKINS-27617

            Am I to understand that durable tasks are not actually durable on Windows when Jenkins is running as a Windows service?

            Show
            gabloe Gabriel Loewen added a comment - This seems to be the same issue as described here:  https://issues.jenkins-ci.org/browse/JENKINS-27617 Am I to understand that durable tasks are not actually durable on Windows when Jenkins is running as a Windows service?
            Hide
            gabloe Gabriel Loewen added a comment -

            Vivek Pandey You've assigned this bug to me, but I am actually not working on this.  The underlying issue is that when Jenkins is running as a windows service and the agent goes offline, once the agent is reconnected the entire process tree under slave.jar including the durable tasks are killed.  I believe that the fix for this needs to be process isolation for any durable tasks running on windows agents.

            Show
            gabloe Gabriel Loewen added a comment - Vivek Pandey You've assigned this bug to me, but I am actually not working on this.  The underlying issue is that when Jenkins is running as a windows service and the agent goes offline, once the agent is reconnected the entire process tree under slave.jar including the durable tasks are killed.  I believe that the fix for this needs to be process isolation for any durable tasks running on windows agents.
            Hide
            vivek Vivek Pandey added a comment -

            Gabriel Loewen No worries. Thanks for the update and details.

            Show
            vivek Vivek Pandey added a comment - Gabriel Loewen No worries. Thanks for the update and details.

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              gabloe Gabriel Loewen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: