Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-50219

Jenkins agent windows service fails to restart with an unhandled COMException in WinSw's log

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Labels:
    • Environment:
      Jenkins 2.9.84
      windows-slave-installer 1.9.2
      jenkins-slave.exe running on Windows 10 + jre 1.8.0.161 under local admin credentials.
    • Similar Issues:

      Description

      We have noticed that our Windows 10 nodes have been intermittently failing to come back online after Jenkins / master restarts.

       

      The following entries are shown in WinSw's Jenkins-slave.wrapper.log file at the time of the failures. In particular it appears that the unhandled COMException is preventing the service restarting successfully.

       

      2018-03-16 12:26:38,011 DEBUG - Starting ServiceWrapper in the CLI mode
      2018-03-16 12:26:38,589 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
      2018-03-16 12:26:38,682 DEBUG - Completed. Exit code is 0
      2018-03-16 12:26:38,870 DEBUG - Starting ServiceWrapper in the CLI mode
      2018-03-16 12:26:39,151 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
      2018-03-16 12:26:39,276 INFO  - Stopping jenkinsslave-c__jenkins
      2018-03-16 12:26:39,292 DEBUG - ProcessKill 332
      2018-03-16 12:26:39,386 INFO  - Found child process: 420 Name: conhost.exe
      2018-03-16 12:26:39,433 INFO  - Stopping process 420
      2018-03-16 12:26:39,448 INFO  - Send SIGINT 420
      2018-03-16 12:26:39,464 WARN  - SIGINT to 420 failed - Killing as fallback
      2018-03-16 12:26:39,464 INFO  - Stopping process 332
      2018-03-16 12:26:39,479 INFO  - Send SIGINT 332
      2018-03-16 12:26:39,495 WARN  - SIGINT to 332 failed - Killing as fallback
      2018-03-16 12:26:39,511 INFO  - Finished jenkinsslave-c__jenkins
      2018-03-16 12:26:39,511 DEBUG - Completed. Exit code is 0
      2018-03-16 12:26:40,342 FATAL - Unhandled exception
      System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB))
         at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
         at System.Management.ManagementObjectSearcher.Get()
         at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] args)
         at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor)
         at winsw.WrapperService.Main(String[] args)
      

      After some inspection of WinSw's source code we have determined that the exception is being thrown from the line "s = svc.Select(d.Id);" in the following snippet (found in the "Run" method in "src/Core/ServiceWrapper/Main.cs lines 687-704). When a secondary WinSw process is run with the "restart!" parameter by Jenkins.

       

      if (args[0] == "restart")
      {
          Log.Info("Restarting the service with id '" + d.Id + "'");
          if (s == null) 
              ThrowNoSuchService();
          if(s.Started)
              s.StopService();
          while (s.Started)
          {
              Thread.Sleep(1000);
              s = svc.Select(d.Id);
          }
          s.StartService();
          return;
      }

      We are currently able to work around the problem by restarting the service manually on affected nodes.

       

       

        Attachments

          Issue Links

            Activity

            tom_m_third_dimension Tom Manning created issue -
            tom_m_third_dimension Tom Manning made changes -
            Field Original Value New Value
            Description We have noticed that our Windows 10 nodes have been intermittently failing to come back online after Jenkins / master restarts.

             

            The following entries are shown in WinSw's Jenkins-slave.wrapper.log file at the time of the failures. In particular it appears that the unhandled COMException is preventing the service restarting successfully.

             
            {code:java}
            2018-03-16 12:26:38,011 DEBUG - Starting ServiceWrapper in the CLI mode
            2018-03-16 12:26:38,589 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
            2018-03-16 12:26:38,682 DEBUG - Completed. Exit code is 0
            2018-03-16 12:26:38,870 DEBUG - Starting ServiceWrapper in the CLI mode
            2018-03-16 12:26:39,151 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
            2018-03-16 12:26:39,276 INFO  - Stopping jenkinsslave-c__jenkins
            2018-03-16 12:26:39,292 DEBUG - ProcessKill 332
            2018-03-16 12:26:39,386 INFO  - Found child process: 420 Name: conhost.exe
            2018-03-16 12:26:39,433 INFO  - Stopping process 420
            2018-03-16 12:26:39,448 INFO  - Send SIGINT 420
            2018-03-16 12:26:39,464 WARN  - SIGINT to 420 failed - Killing as fallback
            2018-03-16 12:26:39,464 INFO  - Stopping process 332
            2018-03-16 12:26:39,479 INFO  - Send SIGINT 332
            2018-03-16 12:26:39,495 WARN  - SIGINT to 332 failed - Killing as fallback
            2018-03-16 12:26:39,511 INFO  - Finished jenkinsslave-c__jenkins
            2018-03-16 12:26:39,511 DEBUG - Completed. Exit code is 0
            2018-03-16 12:26:40,342 FATAL - Unhandled exception
            System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB))
               at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
               at System.Management.ManagementObjectSearcher.Get()
               at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] args)
               at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor)
               at winsw.WrapperService.Main(String[] args)
            {code}
            After some inspection of WinSw's source code we have determined that the exception is being thrown from the line "s = svc.Select(d.Id);" in the following snippet (found in the "Run" method in "src/Core/ServiceWrapper/Main.cs {color:#333333}lines 687-704){color}. When a secondary WinSw process is run with the "restart!" parameter by Jenkins.

             
            if (args[0] == "restart")
            \{
                Log.Info("Restarting the service with id '" + d.Id + "'");
                if (s == null)
                    ThrowNoSuchService();
                if(s.Started)
                    s.StopService();
                while (s.Started)
                \{
                    Thread.Sleep(1000);
                    s = svc.Select(d.Id);
                }
                s.StartService();
                return;
            }
            We are currently able to work around the problem by restarting the service manually on affected nodes.

             
            We have noticed that our Windows 10 nodes have been intermittently failing to come back online after Jenkins / master restarts.

             

            The following entries are shown in WinSw's Jenkins-slave.wrapper.log file at the time of the failures. In particular it appears that the unhandled COMException is preventing the service restarting successfully.

             
            {code:java}
            2018-03-16 12:26:38,011 DEBUG - Starting ServiceWrapper in the CLI mode
            2018-03-16 12:26:38,589 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
            2018-03-16 12:26:38,682 DEBUG - Completed. Exit code is 0
            2018-03-16 12:26:38,870 DEBUG - Starting ServiceWrapper in the CLI mode
            2018-03-16 12:26:39,151 INFO  - Restarting the service with id 'jenkinsslave-c__jenkins'
            2018-03-16 12:26:39,276 INFO  - Stopping jenkinsslave-c__jenkins
            2018-03-16 12:26:39,292 DEBUG - ProcessKill 332
            2018-03-16 12:26:39,386 INFO  - Found child process: 420 Name: conhost.exe
            2018-03-16 12:26:39,433 INFO  - Stopping process 420
            2018-03-16 12:26:39,448 INFO  - Send SIGINT 420
            2018-03-16 12:26:39,464 WARN  - SIGINT to 420 failed - Killing as fallback
            2018-03-16 12:26:39,464 INFO  - Stopping process 332
            2018-03-16 12:26:39,479 INFO  - Send SIGINT 332
            2018-03-16 12:26:39,495 WARN  - SIGINT to 332 failed - Killing as fallback
            2018-03-16 12:26:39,511 INFO  - Finished jenkinsslave-c__jenkins
            2018-03-16 12:26:39,511 DEBUG - Completed. Exit code is 0
            2018-03-16 12:26:40,342 FATAL - Unhandled exception
            System.Runtime.InteropServices.COMException (0x80040150): Could not read key from registry (Exception from HRESULT: 0x80040150 (REGDB_E_READREGDB))
               at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
               at System.Management.ManagementObjectSearcher.Get()
               at WMI.WmiRoot.ClassHandler.Invoke(Object proxy, MethodInfo method, Object[] args)
               at winsw.WrapperService.Run(String[] _args, ServiceDescriptor descriptor)
               at winsw.WrapperService.Main(String[] args)
            {code}
            After some inspection of WinSw's source code we have determined that the exception is being thrown from the line "s = svc.Select(d.Id);" in the following snippet (found in the "Run" method in "src/Core/ServiceWrapper/Main.cs {color:#333333}lines 687-704){color}. When a secondary WinSw process is run with the "restart!" parameter by Jenkins.

             
            {code:java}
            if (args[0] == "restart")
            {
                Log.Info("Restarting the service with id '" + d.Id + "'");
                if (s == null)
                    ThrowNoSuchService();
                if(s.Started)
                    s.StopService();
                while (s.Started)
                {
                    Thread.Sleep(1000);
                    s = svc.Select(d.Id);
                }
                s.StartService();
                return;
            }{code}
            We are currently able to work around the problem by restarting the service manually on affected nodes.

             

             
            oleg_nenashev Oleg Nenashev made changes -
            Labels newbie-friendly
            oleg_nenashev Oleg Nenashev made changes -
            Assignee Oleg Nenashev [ oleg_nenashev ]
            nkjensen Niels Kristian Jensen made changes -
            Link This issue relates to JENKINS-23147 [ JENKINS-23147 ]
            nkjensen Niels Kristian Jensen made changes -
            Assignee Oleg Nenashev [ oleg_nenashev ]

              People

              Assignee:
              oleg_nenashev Oleg Nenashev
              Reporter:
              tom_m_third_dimension Tom Manning
              Votes:
              12 Vote for this issue
              Watchers:
              15 Start watching this issue

                Dates

                Created:
                Updated: