Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50219

Jenkins agent windows service fails to restart with an unhandled COMException in WinSw's log

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Jenkins 2.9.84
      windows-slave-installer 1.9.2
      jenkins-slave.exe running on Windows 10 + jre 1.8.0.161 under local admin credentials.

      We have noticed that our Windows 10 nodes have been intermittently failing to come back online after Jenkins / master restarts.

       

      The following entries are shown in WinSw's Jenkins-slave.wrapper.log file at the time of the failures. In particular it appears that the unhandled COMException is preventing the service restarting successfully.

       

      2018-03-16 12:26:38,011 DEBUG - Starting ServiceWrapper in the CLI mode
      2018-03-16 12:26:38,589 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
      2018-03-16 12:26:38,682 DEBUG - Completed. Exit code is 0
      2018-03-16 12:26:38,870 DEBUG - Starting ServiceWrapper in the CLI mode
      2018-03-16 12:26:39,151 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
      2018-03-16 12:26:39,276 INFO  - Stopping jenkinsslave-c__jenkins
      2018-03-16 12:26:39,292 DEBUG - ProcessKill 332
      2018-03-16 12:26:39,386 INFO  - Found child process: 420 Name: conhost.exe
      2018-03-16 12:26:39,433 INFO  - Stopping process 420
      2018-03-16 12:26:39,448 INFO  - Send SIGINT 420
      2018-03-16 12:26:39,464 WARN  - SIGINT to 420 failed - Killing as fallback
      2018-03-16 12:26:39,464 INFO  - Stopping process 332
      2018-03-16 12:26:39,479 INFO  - Send SIGINT 332
      2018-03-16 12:26:39,495 WARN  - SIGINT to 332 failed - Killing as fallback
      2018-03-16 12:26:39,511 INFO  - Finished jenkinsslave-c__jenkins
      2018-03-16 12:26:39,511 DEBUG - Completed. Exit code is 0
      2018-03-16 12:26:40,342 FATAL - Unhandled exception
      System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB))
         at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
         at System.Management.ManagementObjectSearcher.Get()
         at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] args)
         at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor)
         at winsw.WrapperService.Main(String[] args)
      

      After some inspection of WinSw's source code we have determined that the exception is being thrown from the line "s = svc.Select(d.Id);" in the following snippet (found in the "Run" method in "src/Core/ServiceWrapper/Main.cs lines 687-704). When a secondary WinSw process is run with the "restart!" parameter by Jenkins.

       

      if (args[0] == "restart")
      {
          Log.Info("Restarting the service with id '" + d.Id + "'");
          if (s == null) 
              ThrowNoSuchService();
          if(s.Started)
              s.StopService();
          while (s.Started)
          {
              Thread.Sleep(1000);
              s = svc.Select(d.Id);
          }
          s.StartService();
          return;
      }

      We are currently able to work around the problem by restarting the service manually on affected nodes.

       

       

          [JENKINS-50219] Jenkins agent windows service fails to restart with an unhandled COMException in WinSw's log

          Oleg Nenashev added a comment -

          tom_m_third_dimension Please also submit a ticket to https://github.com/kohsuke/winsw . That project is outside Jenkins governance

          Oleg Nenashev added a comment - tom_m_third_dimension Please also submit a ticket to https://github.com/kohsuke/winsw . That project is outside Jenkins governance

          Tom Manning added a comment -

          After some further investigation I believe I have found a more permanent work around for the problem. After observing that the problem occurs when the master is restarted after logging out from an interactive session on the "Log On As" account that runs the Jenkins agent service.

          The fix is to enable the group policy setting "Local Computer Policy->Administrative Templates->System->User Profiles->Do not forcefully unload the users registry at user logoff".
           
          The following articles describe the cause:
          https://support.microsoft.com/en-gb/help/2287297/a-com-application-may-stop-working-on-windows-server-2008-when-a-user
          https://support.microsoft.com/en-gb/help/3114011/800703fa-illegal-operation-attempted-on-a-registry-key-that-has-been-m

          Tom Manning added a comment - After some further investigation I believe I have found a more permanent work around for the problem. After observing that the problem occurs when the master is restarted after logging out from an interactive session on the "Log On As" account that runs the Jenkins agent service. The fix is to enable the group policy setting "Local Computer Policy->Administrative Templates->System->User Profiles->Do not forcefully unload the users registry at user logoff".   The following articles describe the cause: https://support.microsoft.com/en-gb/help/2287297/a-com-application-may-stop-working-on-windows-server-2008-when-a-user https://support.microsoft.com/en-gb/help/3114011/800703fa-illegal-operation-attempted-on-a-registry-key-that-has-been-m

          Mario Klebsch added a comment -

          I also see this error regularly after server reboots. But I am sure, that there were no interactive sessions on the "Log On As" account of the client failing to restart.

          I change the policy setting anyway, I will see, whether that helps or not.

           

          But I wonder, whether the restart of the jenkins client really is needed. In jenkins-slave.err.log I see the following entries:

           

          Dez 02, 2018 4:56:56 AM hudson.remoting.jnlp.Main$CuiListener status
          INFORMATION: Terminated
          Dez 02, 2018 4:57:11 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
          INFORMATION: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out
          ...
          Dez 02, 2018 4:58:22 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
          INFORMATION: Failed to connect to the master. Will try again: java.net.ConnectException Connection refused: connect
          ...
          Dez 02, 2018 5:03:46 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
          INFORMATION: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out
          ...
          Dez 02, 2018 5:04:56 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady
          INFORMATION: Master isn't ready to talk to us on http://jenkins01:8080/tcpSlaveAgentListener/. Will try again: response code=503
          ...
          Dez 02, 2018 5:09:20 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect
          INFORMATION: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a1d34b
          

           

          None of the errors listed should (IMHO) be a reason to restart the client.

          The log does not tell, why the clients decide to restart itself. It may be the duration of the downtime, but the time difference between 5:09:20 and 4:56:56 is 744 seconds. I have 65 "waitForReady" messages and 26 "Will try again: response code=503" messages.

          At least the "response code=503" error message prevent the jenkins client reboot, especially since it is a known jenkins problem, that jeninks reboots take ages.

           

          Mario

           

           

          Mario Klebsch added a comment - I also see this error regularly after server reboots. But I am sure, that there were no interactive sessions on the "Log On As" account of the client failing to restart. I change the policy setting anyway, I will see, whether that helps or not.   But I wonder, whether the restart of the jenkins client really is needed. In jenkins-slave.err.log I see the following entries:   Dez 02, 2018 4:56:56 AM hudson.remoting.jnlp.Main$CuiListener status INFORMATION: Terminated Dez 02, 2018 4:57:11 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady INFORMATION: Failed to connect to the master. Will try again: java.net.SocketTimeoutException connect timed out ... Dez 02, 2018 4:58:22 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady INFORMATION: Failed to connect to the master. Will try again: java.net.ConnectException Connection refused: connect ... Dez 02, 2018 5:03:46 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady INFORMATION: Failed to connect to the master. Will try again: java.net.SocketTimeoutException Read timed out ... Dez 02, 2018 5:04:56 AM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver waitForReady INFORMATION: Master isn't ready to talk to us on http: //jenkins01:8080/tcpSlaveAgentListener/. Will try again: response code=503 ... Dez 02, 2018 5:09:20 AM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect INFORMATION: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@63a1d34b   None of the errors listed should (IMHO) be a reason to restart the client. The log does not tell, why the clients decide to restart itself. It may be the duration of the downtime, but the time difference between 5:09:20 and 4:56:56 is 744 seconds. I have 65 "waitForReady" messages and 26 "Will try again: response code=503" messages. At least the "response code=503" error message prevent the jenkins client reboot, especially since it is a known jenkins problem, that jeninks reboots take ages.   Mario    

          Oleg Nenashev added a comment -

          I might be able to work on it at some point, but I will unassign it for now so that anybody else can pick it up and work on it

          Oleg Nenashev added a comment - I might be able to work on it at some point, but I will unassign it for now so that anybody else can pick it up and work on it

          Angelo Loria added a comment -

          tom_m_third_dimension added a comment - 2018-05-24 05:47

          After some further investigation I believe I have found a more permanent work around for the problem. After observing that the problem occurs when the master is restarted after logging out from an interactive session on the "Log On As" account that runs the Jenkins agent service.

          The fix is to enable the group policy setting "Local Computer Policy->Administrative Templates->System->User Profiles->Do not forcefully unload the users registry at user logoff".

          FYI this resolved the issue for us on a Windows Server 2019 VM.

          Angelo Loria added a comment - tom_m_third_dimension  added a comment - 2018-05-24 05:47 After some further investigation I believe I have found a more permanent work around for the problem. After observing that the problem occurs when the master is restarted after logging out from an interactive session on the "Log On As" account that runs the Jenkins agent service. The fix is to enable the group policy setting "Local Computer Policy->Administrative Templates->System->User Profiles->Do not forcefully unload the users registry at user logoff". FYI this resolved the issue for us on a Windows Server 2019 VM.

          Niels Kristian Jensen added a comment - - edited

          I agree! The fix worked for me as well. This is a tail of the log, first a crash (at 7:18) then the correct start-up at 8:39 Thank you tom_m_third_dimension

          2020-11-13 07:18:48,535 INFO - Stopping process 7820
          2020-11-13 07:18:48,535 INFO - Send SIGINT 7820
          2020-11-13 07:18:48,550 WARN - SIGINT to 7820 failed - Killing as fallback
          2020-11-13 07:18:48,550 INFO - Stopping process 808
          2020-11-13 07:18:48,550 INFO - Send SIGINT 808
          2020-11-13 07:18:48,550 WARN - SIGINT to 808 failed - Killing as fallback
          2020-11-13 07:18:48,566 INFO - Finished Jenkins
          2020-11-13 07:18:48,566 DEBUG - Completed. Exit code is 0
          2020-11-13 07:18:49,394 FATAL - Unhandled exception
          System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB))
          at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
          at System.Management.ManagementObjectSearcher.Get()
          at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] arguments)
          at WMI.Win32ServicesProxy.Select(String )
          at winsw.WrapperService.<Run>g_Restart|33_5(<>c_DisplayClass33_0& )
          at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor)
          at winsw.WrapperService.Main(String[] args)
          2020-11-13 07:21:11,170 DEBUG - Starting WinSW in the service mode
          2020-11-13 07:21:11,201 INFO - Starting java -Xrs -Xmx2048m -Xss4m -Dhudson.slaves.WorkspaceList=_ -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -Dhudson.model.DirectoryBrowserSupport.CSP="default-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self' 'unsafe-inline'" -jar "D:\Jenkins\jenkins.war" --httpPort=8080 --webroot="D:\Jenkins\war"
          2020-11-13 07:21:11,201 INFO - Extension loaded: killOnStartup
          2020-11-13 07:21:11,217 DEBUG - Checking the potentially runaway process with PID=808
          2020-11-13 07:21:11,232 DEBUG - No runaway process with PID=808. The process has been already stopped.
          2020-11-13 07:21:11,264 INFO - Started process 4284
          2020-11-13 07:21:11,279 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender
          2020-11-13 07:21:11,279 INFO - Recording PID of the started process:4284. PID file destination is D:\Jenkins\jenkins.pid
          2020-11-13 08:39:44,899 DEBUG - Starting WinSW in the CLI mode
          2020-11-13 08:39:45,134 INFO - Restarting the service with id 'Jenkins'
          2020-11-13 08:39:45,149 DEBUG - Completed. Exit code is 0
          2020-11-13 08:39:45,274 DEBUG - Starting WinSW in the CLI mode
          2020-11-13 08:39:45,493 INFO - Restarting the service with id 'Jenkins'
          2020-11-13 08:39:45,524 INFO - Stopping Jenkins
          2020-11-13 08:39:45,524 DEBUG - ProcessKill 4284
          2020-11-13 08:39:45,634 INFO - Found child process: 8952 Name: conhost.exe
          2020-11-13 08:39:45,681 INFO - Stopping process 8952
          2020-11-13 08:39:45,681 INFO - Send SIGINT 8952
          2020-11-13 08:39:45,681 WARN - SIGINT to 8952 failed - Killing as fallback
          2020-11-13 08:39:45,681 INFO - Stopping process 4284
          2020-11-13 08:39:45,696 INFO - Send SIGINT 4284
          2020-11-13 08:39:45,696 WARN - SIGINT to 4284 failed - Killing as fallback
          2020-11-13 08:39:45,696 INFO - Finished Jenkins
          2020-11-13 08:39:45,696 DEBUG - Completed. Exit code is 0
          2020-11-13 08:39:46,915 DEBUG - Starting WinSW in the service mode
          2020-11-13 08:39:46,946 DEBUG - Completed. Exit code is 0
          2020-11-13 08:39:46,978 INFO - Extension loaded: killOnStartup
          2020-11-13 08:39:46,978 DEBUG - Checking the potentially runaway process with PID=4284
          2020-11-13 08:39:46,993 DEBUG - No runaway process with PID=4284. The process has been already stopped.
          2020-11-13 08:39:47,024 INFO - Started process 8752
          2020-11-13 08:39:47,040 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender
          2020-11-13 08:39:47,040 INFO - Recording PID of the started process:8752. PID file destination is D:\Jenkins\jenkins.pid

          oleg_nenashev - I don't know if this is the conclusion to both this issue and the linked issue? That's why I assigned it to you - I'm not in the inner circle of this software.

          Niels Kristian Jensen added a comment - - edited I agree! The fix worked for me as well. This is a tail of the log, first a crash (at 7:18) then the correct start-up at 8:39 Thank you tom_m_third_dimension 2020-11-13 07:18:48,535 INFO - Stopping process 7820 2020-11-13 07:18:48,535 INFO - Send SIGINT 7820 2020-11-13 07:18:48,550 WARN - SIGINT to 7820 failed - Killing as fallback 2020-11-13 07:18:48,550 INFO - Stopping process 808 2020-11-13 07:18:48,550 INFO - Send SIGINT 808 2020-11-13 07:18:48,550 WARN - SIGINT to 808 failed - Killing as fallback 2020-11-13 07:18:48,566 INFO - Finished Jenkins 2020-11-13 07:18:48,566 DEBUG - Completed. Exit code is 0 2020-11-13 07:18:49,394 FATAL - Unhandled exception System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB)) at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo) at System.Management.ManagementObjectSearcher.Get() at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] arguments) at WMI.Win32ServicesProxy.Select(String ) at winsw.WrapperService.<Run>g_ Restart|33_5(<>c _DisplayClass33_0& ) at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor) at winsw.WrapperService.Main(String[] args) 2020-11-13 07:21:11,170 DEBUG - Starting WinSW in the service mode 2020-11-13 07:21:11,201 INFO - Starting java -Xrs -Xmx2048m -Xss4m -Dhudson.slaves.WorkspaceList=_ -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -Dhudson.model.DirectoryBrowserSupport.CSP="default-src 'self'; style-src 'self' 'unsafe-inline'; script-src 'self' 'unsafe-inline'" -jar "D:\Jenkins\jenkins.war" --httpPort=8080 --webroot="D:\Jenkins\war" 2020-11-13 07:21:11,201 INFO - Extension loaded: killOnStartup 2020-11-13 07:21:11,217 DEBUG - Checking the potentially runaway process with PID=808 2020-11-13 07:21:11,232 DEBUG - No runaway process with PID=808. The process has been already stopped. 2020-11-13 07:21:11,264 INFO - Started process 4284 2020-11-13 07:21:11,279 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender 2020-11-13 07:21:11,279 INFO - Recording PID of the started process:4284. PID file destination is D:\Jenkins\jenkins.pid 2020-11-13 08:39:44,899 DEBUG - Starting WinSW in the CLI mode 2020-11-13 08:39:45,134 INFO - Restarting the service with id 'Jenkins' 2020-11-13 08:39:45,149 DEBUG - Completed. Exit code is 0 2020-11-13 08:39:45,274 DEBUG - Starting WinSW in the CLI mode 2020-11-13 08:39:45,493 INFO - Restarting the service with id 'Jenkins' 2020-11-13 08:39:45,524 INFO - Stopping Jenkins 2020-11-13 08:39:45,524 DEBUG - ProcessKill 4284 2020-11-13 08:39:45,634 INFO - Found child process: 8952 Name: conhost.exe 2020-11-13 08:39:45,681 INFO - Stopping process 8952 2020-11-13 08:39:45,681 INFO - Send SIGINT 8952 2020-11-13 08:39:45,681 WARN - SIGINT to 8952 failed - Killing as fallback 2020-11-13 08:39:45,681 INFO - Stopping process 4284 2020-11-13 08:39:45,696 INFO - Send SIGINT 4284 2020-11-13 08:39:45,696 WARN - SIGINT to 4284 failed - Killing as fallback 2020-11-13 08:39:45,696 INFO - Finished Jenkins 2020-11-13 08:39:45,696 DEBUG - Completed. Exit code is 0 2020-11-13 08:39:46,915 DEBUG - Starting WinSW in the service mode 2020-11-13 08:39:46,946 DEBUG - Completed. Exit code is 0 2020-11-13 08:39:46,978 INFO - Extension loaded: killOnStartup 2020-11-13 08:39:46,978 DEBUG - Checking the potentially runaway process with PID=4284 2020-11-13 08:39:46,993 DEBUG - No runaway process with PID=4284. The process has been already stopped. 2020-11-13 08:39:47,024 INFO - Started process 8752 2020-11-13 08:39:47,040 DEBUG - Forwarding logs of the process System.Diagnostics.Process (java) to winsw.SizeBasedRollingLogAppender 2020-11-13 08:39:47,040 INFO - Recording PID of the started process:8752. PID file destination is D:\Jenkins\jenkins.pid oleg_nenashev - I don't know if this is the conclusion to both this issue and the linked issue? That's why I assigned it to you - I'm not in the inner circle of this software.

          Marco Hald added a comment -

          The Workaround also worked for me.

          Is it planned to change the behavior ?

          Marco Hald added a comment - The Workaround also worked for me. Is it planned to change the behavior ?

            oleg_nenashev Oleg Nenashev
            tom_m_third_dimension Tom Manning
            Votes:
            14 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated: