Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-1705

Hudson startes 100's of ssh sessions for offline slaves

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Blocker
    • Resolution: Fixed
    • Component/s: remoting
    • Labels:
      None
    • Environment:
      Platform: All, OS: Windows XP
    • Similar Issues:

      Description

      Im using Hudson on a windows xp machine with cygwin.
      the slaves connected using cygwin ssh.

      In the last 2 releases of hudson,
      Over a few hours, for some reason heaps of ssh.exe sessions are started.
      I have a few slaves which as offline. Whenever hudson try's to connect to them
      it starts a new ssh.exe session but dosnt kill it when the connection failes.

      I noticed the following error caused an extra 4 ssh sessions to be started.


      SEVERE: Unable to launch the slave agent for linux02
      java.io.EOFException: unexpected stream termination
      at hudson.remoting.Channel.<init>(Channel.java:258)
      at hudson.model.Slave$ComputerImpl.setChannel(Slave.java:390)
      at hudson.model.Slave$ComputerImpl$1.run(Slave.java:342)
      at
      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
      at java.lang.Thread.run(Thread.java:619)
      -------------------

        Attachments

          Activity

          rajp rajp created issue -
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/core/src/main/java/hudson/slaves/CommandLauncher.java
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=9434
          Log:
          [FIXED JENKINS-1705] In case of abnormal termination, kill the process

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/slaves/CommandLauncher.java trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=9434 Log: [FIXED JENKINS-1705] In case of abnormal termination, kill the process
          scm_issue_link SCM/JIRA link daemon made changes -
          Field Original Value New Value
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Hide
          lloydchang Lloyd Chang added a comment -

          Using Hudson 1.224, I still see the issue. Hudson is calling a Cygwin shell
          script, which launches SSH, then sleeps a long time (or else Hudson continues to
          re-launch the shell script).

          In Windows and Cygwin process tree, I see bash.exe, ssh.exe, and sleep.exe
          proceses left open by Hudson, even after I stop the Hudson server from the UI,
          then kill the Hudson Java process. I double-checked via Process Explorer -
          http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx , bash.exe,
          ssh.exe, and sleep.exe all end up with Parent IDs of 1

          To try working around the issue, I tried having SSH script sleep for a long
          time, to avoid respawning, but the root cause remains. I think Hudson forks off
          sub-processes that become parent-less zombies somehow.

          It is similar to the behavior described in
          http://www.cygwin.com/ml/cygwin-xfree/2006-12/msg00082.html , whereas someone
          proposed a background monitoring process workaround in
          http://lists.samba.org/archive/rsync/2003-November/007736.html

          Due to above issues, I switched to a Hudson *nix server, and started using the
          SSH plugin to avoid the root cause.

          Show
          lloydchang Lloyd Chang added a comment - Using Hudson 1.224, I still see the issue. Hudson is calling a Cygwin shell script, which launches SSH, then sleeps a long time (or else Hudson continues to re-launch the shell script). In Windows and Cygwin process tree, I see bash.exe, ssh.exe, and sleep.exe proceses left open by Hudson, even after I stop the Hudson server from the UI, then kill the Hudson Java process. I double-checked via Process Explorer - http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx , bash.exe, ssh.exe, and sleep.exe all end up with Parent IDs of 1 To try working around the issue, I tried having SSH script sleep for a long time, to avoid respawning, but the root cause remains. I think Hudson forks off sub-processes that become parent-less zombies somehow. It is similar to the behavior described in http://www.cygwin.com/ml/cygwin-xfree/2006-12/msg00082.html , whereas someone proposed a background monitoring process workaround in http://lists.samba.org/archive/rsync/2003-November/007736.html Due to above issues, I switched to a Hudson *nix server, and started using the SSH plugin to avoid the root cause.
          lloydchang Lloyd Chang made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in hudson
          User: : kohsuke
          Path:
          trunk/hudson/main/core/src/main/java/hudson/slaves/SlaveComputer.java
          trunk/www/changelog.html
          http://fisheye4.cenqua.com/changelog/hudson/?cs=10955
          Log:
          [FIXED JENKINS-1705]
          A trouble-shooting of another incident revealed this problem. I believe this was the cause of 1705. See the comment in the code for the analysis.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in hudson User: : kohsuke Path: trunk/hudson/main/core/src/main/java/hudson/slaves/SlaveComputer.java trunk/www/changelog.html http://fisheye4.cenqua.com/changelog/hudson/?cs=10955 Log: [FIXED JENKINS-1705] A trouble-shooting of another incident revealed this problem. I believe this was the cause of 1705. See the comment in the code for the analysis.
          scm_issue_link SCM/JIRA link daemon made changes -
          Resolution Fixed [ 1 ]
          Status Reopened [ 4 ] Resolved [ 5 ]
          abayer Andrew Bayer made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 131778 ] JNJira + In-Review [ 200920 ]

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            rajp rajp
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: