Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-44414

Lost ssh connection is not recognized

    XMLWordPrintable

Details

    Description

      We've been using version 2.12.0 of the plugin since quite a while without any issues. After upgrading to 2.23.0 we frequently (every few days) run into the problem that somehow the ssh connection is lost without the underlying ssh implementation recognizing this.

      I created a slightly modified version to get more log output and encountered that the read method of the Reader
      while (reader.read(cb) != -1)
      returns with 0 bytes read.

      Could it be that this is related to https://github.com/sonyxperiadev/gerrit-events/pull/58

      If yes the real problem is then in the underlying ssh implementation.

      Attachments

        Issue Links

          Activity

            mawinter69 Markus Winter added a comment -

            fixed

            mawinter69 Markus Winter added a comment - fixed
            thadguidry Thad Guidry added a comment -

            At Ericsson, we have also experienced this issue and see a CLOSED_WAITING on the Gerrit Trigger. sometimes after 1 or 2 days.  We have taken a binary heap dump that can be opened in Java VisualVM for analysis.  Here's just one of the thread snippets that I found in the heap dump:

            "Thread-10" prio=5 tid=69 TIMED_WAITING
            at java.lang.Thread.sleep(Native Method)
            at com.sonymobile.tools.gerrit.gerritevents.GerritConnection.run(GerritConnection.java:427)
            Local Variable: com.jcraft.jsch.ChannelExec#1
            Local Variable: java.io.InputStreamReader#340
            Local Variable: java.nio.HeapCharBuffer#4836
            Local Variable: com.sonymobile.tools.gerrit.gerritevents.dto.attr.Provider#1

            Download Heap dump :

            https://drive.google.com/file/d/0B533WzlrxWraVHZEU01MSnpXcW8/view?usp=sharing

            thadguidry Thad Guidry added a comment - At Ericsson, we have also experienced this issue and see a CLOSED_WAITING on the Gerrit Trigger. sometimes after 1 or 2 days.  We have taken a binary heap dump that can be opened in Java VisualVM for analysis.  Here's just one of the thread snippets that I found in the heap dump: "Thread-10" prio=5 tid=69 TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.sonymobile.tools.gerrit.gerritevents.GerritConnection.run(GerritConnection.java:427) Local Variable: com.jcraft.jsch.ChannelExec#1 Local Variable: java.io.InputStreamReader#340 Local Variable: java.nio.HeapCharBuffer#4836 Local Variable: com.sonymobile.tools.gerrit.gerritevents.dto.attr.Provider#1 Download Heap dump : https://drive.google.com/file/d/0B533WzlrxWraVHZEU01MSnpXcW8/view?usp=sharing
            mawinter69 Markus Winter added a comment -

            This is basically the same root cause.

            mawinter69 Markus Winter added a comment - This is basically the same root cause.
            mawinter69 Markus Winter added a comment -

            We upgraded to 2.23.3 on a test instance but we still see the same problem.

            Connection is lost without the plugin to recognize this. We don't have a watchdog configured currently but I would expect that the plugin would restart the connection anyway.

            mawinter69 Markus Winter added a comment - We upgraded to 2.23.3 on a test instance but we still see the same problem. Connection is lost without the plugin to recognize this. We don't have a watchdog configured currently but I would expect that the plugin would restart the connection anyway.
            mawinter69 Markus Winter added a comment -

            Probably this is the same root cause

            mawinter69 Markus Winter added a comment - Probably this is the same root cause

            People

              rsandell rsandell
              mawinter69 Markus Winter
              Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: