Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-31688

jenkins safe-restart hanging forever when waiting for locking a ssh-slave connection for unregistration without timeout

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Blocker Blocker
    • ssh-slaves-plugin
    • None

      safe-restart is waiting to close a registered connection.
      but that connection is waiting for being notified forever while the notifier thread either dead or hanging. there is no timeout in wait.
      so safe-restart hanging forever.

      our environment is jenkins 1.538 and ssh-slave plugin 1.9
      but seems the issue i described still hold true in latest code base.
      https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/src/main/java/hudson/plugins/sshslaves/PluginImpl.java
      /**

      • Closes all the registered connections.
        */
        private static synchronized void closeRegisteredConnections() {
        for (Connection connection : activeConnections)
        Unknown macro: { LOGGER.log(Level.INFO, "Forcing connection to {0}}

        activeConnections.clear();
        }

      https://github.com/jenkinsci/trilead-ssh2/blob/master/src/com/trilead/ssh2/channel/ChannelManager.java
      private void waitUntilChannelOpen(Channel c) throws IOException
      {
      synchronized (c)
      {
      while (c.state == Channel.STATE_OPENING)
      {
      try

      { c.wait(); }

      catch (InterruptedException ignore)

      { throw new InterruptedIOException(); }

      }

      if (c.state != Channel.STATE_OPEN)

      { removeChannel(c.localID); throw ioException("Could not open channel (state:" + c.state + ")", c); }

      }
      }

      stack trace which shows"deadlock"
      "safe-restart thread" prio=10 tid=0x00007fd87e4ab800 nid=0x5157 waiting for monitor entry [0x00007fd9aa8e7000]
      java.lang.Thread.State: BLOCKED (on object monitor)
      at com.trilead.ssh2.Connection.getHostname(Connection.java:961)

      • waiting to lock <0x0000000682669c88> (a com.trilead.ssh2.Connection)
        at hudson.plugins.sshslaves.PluginImpl.closeRegisteredConnections(PluginImpl.java:70)
      • locked <0x0000000674712e38> (a java.lang.Class for hudson.plugins.sshslaves.PluginImpl)
        at hudson.plugins.sshslaves.PluginImpl.stop(PluginImpl.java:61)
        at hudson.PluginWrapper.stop(PluginWrapper.java:376)
        at hudson.PluginManager.stop(PluginManager.java:734)
        at jenkins.model.Jenkins.cleanUp(Jenkins.java:2797)
        at hudson.lifecycle.UnixLifecycle.restart(UnixLifecycle.java:71)
        at jenkins.model.Jenkins$23.run(Jenkins.java:3400)

      "Channel reader thread: xyz.com" prio=10 tid=0x00007fd870014800 nid=0xbaec in Object.wait() [0x00007fd80ceb8000]
      java.lang.Thread.State: WAITING (on object monitor)
      at java.lang.Object.wait(Native Method)
      at java.lang.Object.wait(Object.java:503)
      at com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(ChannelManager.java:110)

      • locked <0x0000000685743cc0> (a com.trilead.ssh2.channel.Channel)
        at com.trilead.ssh2.channel.ChannelManager.openSessionChannel(ChannelManager.java:584)
        at com.trilead.ssh2.Session.<init>(Session.java:42)
        at com.trilead.ssh2.Connection.openSession(Connection.java:1129)
      • locked <0x0000000682669c88> (a com.trilead.ssh2.Connection)
        at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:99)
        at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
        at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:1213)
      • locked <0x0000000672021fc8> (a hudson.plugins.sshslaves.SSHLauncher)
        at hudson.slaves.SlaveComputer$2.onClosed(SlaveComputer.java:456)
        at hudson.remoting.Channel.terminate(Channel.java:831)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:76)

            ifernandezcalvo Ivan Fernandez Calvo
            ellenshen Ellen shen
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: