Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-47657

Agent running as Windows service kills all running jobs on reconnect

    • durable-task 1.38

      We are running several JNLP agents on Windows as Windows service using the Winsw wrapper. On some machines, when an agent loses the connection to the controller, all running processes are killed and the jobs never complete.

      This happens since the agent tries to restart itself when it loses connection. There are two possibilities:

      • If the agent runs as a user that is a local admin (sadly the default, since services run as the SYSTEM user by default), winsw restarts the service. Upon restarting the service, both winsw and Windows kill all processes that belong to the service, which includes all processes of currently running jobs.
      • If the agent runs as an unprivileged user, the agent fails to restart itself and logs a confusing error message. However, it reconnects without issue and jobs keep running.

      Frankly, I don't see any reason why an agent should restart itself on connection loss. In the case of an agent running as a Windows service, it can never work properly and is thus entirely useless.

      A solution would be to remove jenkins.slaves.restarter.WinswSlaveRestarter entirely.

          [JENKINS-47657] Agent running as Windows service kills all running jobs on reconnect

          Thomas Bächler created issue -
          Thomas Bächler made changes -
          Assignee New: Kohsuke Kawaguchi [ kohsuke ]
          Oleg Nenashev made changes -
          Labels New: winsw
          Oleg Nenashev made changes -
          Component/s New: windows-slave-installer-module [ 21834 ]
          Jesse Glick made changes -
          Assignee Original: Kohsuke Kawaguchi [ kohsuke ]
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-27617 [ JENKINS-27617 ]
          Alex Earl made changes -
          Description Original: We are running several JNLP slaves on Windows as Windows service using the Winsw wrapper. On some machines, when an agent loses the connection to the master, all running processes are killed and the jobs never complete.

          This happens since the agent tries to restart itself when it loses connection. There are two possibilities:
           * If the agent runs as a user that is a local admin (sadly the default, since services run as the SYSTEM user by default), winsw restarts the service. Upon restarting the service, both winsw and Windows kill all processes that belong to the service, which includes all processes of currently running jobs.
           * If the agent runs as an unprivileged user, the agent fails to restart itself and logs a confusing error message. However, it reconnects without issue and jobs keep running.

          Frankly, I don't see any reason why an agent should restart itself on connection loss. In the case of an agent running as a Windows service, it can never work properly and is thus entirely useless.

          A solution would be to remove jenkins.slaves.restarter.WinswSlaveRestarter entirely.
          New: We are running several JNLP agents on Windows as Windows service using the Winsw wrapper. On some machines, when an agent loses the connection to the controller, all running processes are killed and the jobs never complete.

          This happens since the agent tries to restart itself when it loses connection. There are two possibilities:
           * If the agent runs as a user that is a local admin (sadly the default, since services run as the SYSTEM user by default), winsw restarts the service. Upon restarting the service, both winsw and Windows kill all processes that belong to the service, which includes all processes of currently running jobs.
           * If the agent runs as an unprivileged user, the agent fails to restart itself and logs a confusing error message. However, it reconnects without issue and jobs keep running.

          Frankly, I don't see any reason why an agent should restart itself on connection loss. In the case of an agent running as a Windows service, it can never work properly and is thus entirely useless.

          A solution would be to remove jenkins.slaves.restarter.WinswSlaveRestarter entirely.
          Carroll Chiou made changes -
          Released As New: durable-task 1.38
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Fixed but Unreleased [ 10203 ]
          Carroll Chiou made changes -
          Status Original: Fixed but Unreleased [ 10203 ] New: Resolved [ 5 ]

            Unassigned Unassigned
            procom_bl Thomas Bächler
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: