Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-6965

GT seems to lose it's connection to Gerrit without re-connecting

    XMLWordPrintable

Details

    Description

      Haven't managed to catch it in the act yet, but it works really well for a while, then I notice it stops getting triggered. I go into hudson/manage/gerrittrigger and click restart and it seems to connect ok (watching gerrit logs) and it works fine again.

      This has happened consistently since setting it up. Appears to work for just a few hours, before needing a kick up the pants...

      Great work!

      Attachments

        Activity

          rsandell rsandell added a comment -

          This is a bit strange, it would be really good if you could "catch it in the act" with some logs.
          Are you using the 2.0 release or an earlier snapshot version?

          For me it has been running smoothly for days (restarted main server for other reasons),

          but we have noticed some thread leakage that will be fixed in the 2.1 release.

          rsandell rsandell added a comment - This is a bit strange, it would be really good if you could "catch it in the act" with some logs. Are you using the 2.0 release or an earlier snapshot version? For me it has been running smoothly for days (restarted main server for other reasons), but we have noticed some thread leakage that will be fixed in the 2.1 release.
          antonystubbs antonystubbs added a comment -

          I have a feeling it's being caused by another build causing the infamous too many files open bug/issue/feature in Hudson, which is causing congestion. Restarting Tomcat fixed it this evening.

          I have disabled all other builds, and will report back soon.

          I am using the latest release available through Hudson's plugin system.

          antonystubbs antonystubbs added a comment - I have a feeling it's being caused by another build causing the infamous too many files open bug/issue/feature in Hudson, which is causing congestion. Restarting Tomcat fixed it this evening. I have disabled all other builds, and will report back soon. I am using the latest release available through Hudson's plugin system.
          antonystubbs antonystubbs added a comment -

          Doesn't seem to "stay working" for me for longer than a few hours...

          Noticed it hadn't been working today, so did a little test.

          Went into hudson, manage gerrit-trigger. Tested connection, saw a successful connect and disconnect in gerrit logs.

          clicked stop. nothing in gerrit logs from hudson. - implies it wasn't running? I think the manage page really needs some sort of "Status" field so we can see if gerrit-trigger thinks it's connected or not.
          clicked start. got a connection in gerrit logs from hudson.

          If gerrit-trigger is d/c, doesn't it try to reconnect? From looking at GerritHandler#run, it doesn't seem that it does?

          I see there is logging in gerrit-trigger, who do i get the debug log? </lazy>

          antonystubbs antonystubbs added a comment - Doesn't seem to "stay working" for me for longer than a few hours... Noticed it hadn't been working today, so did a little test. Went into hudson, manage gerrit-trigger. Tested connection, saw a successful connect and disconnect in gerrit logs. clicked stop. nothing in gerrit logs from hudson. - implies it wasn't running? I think the manage page really needs some sort of "Status" field so we can see if gerrit-trigger thinks it's connected or not. clicked start. got a connection in gerrit logs from hudson. If gerrit-trigger is d/c, doesn't it try to reconnect? From looking at GerritHandler#run, it doesn't seem that it does? I see there is logging in gerrit-trigger, who do i get the debug log? </lazy>
          antonystubbs antonystubbs added a comment -

          Ok, I'm seeing these appearing in the Gerrit logs:

          ==> gerrit/review_site/logs/error_log <==
          [2010-07-18 04:10:25,980] WARN org.apache.sshd.server.session.ServerSession : Exception caught
          java.io.IOException: Connection timed out
          at sun.nio.ch.FileDispatcher.read0(Native Method)
          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
          at sun.nio.ch.IOUtil.read(IOUtil.java:206)
          at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
          at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:202)
          at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:42)
          at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:620)
          at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:598)
          at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:587)
          at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:61)
          at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:969)
          at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:619)

          ==> gerrit/review_site/logs/sshd_log <==
          [2010-07-18 04:10:26,024 +0000] eb5e7197 hudson a/1000081 LOGOUT

          With no following "hudson LOGIN" events... Hudson user being the Hudson Trigger module...

          antonystubbs antonystubbs added a comment - Ok, I'm seeing these appearing in the Gerrit logs: ==> gerrit/review_site/logs/error_log <== [2010-07-18 04:10:25,980] WARN org.apache.sshd.server.session.ServerSession : Exception caught java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:202) at org.apache.mina.transport.socket.nio.NioProcessor.read(NioProcessor.java:42) at org.apache.mina.core.polling.AbstractPollingIoProcessor.read(AbstractPollingIoProcessor.java:620) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:598) at org.apache.mina.core.polling.AbstractPollingIoProcessor.process(AbstractPollingIoProcessor.java:587) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$400(AbstractPollingIoProcessor.java:61) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:969) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) ==> gerrit/review_site/logs/sshd_log <== [2010-07-18 04:10:26,024 +0000] eb5e7197 hudson a/1000081 LOGOUT With no following "hudson LOGIN" events... Hudson user being the Hudson Trigger module...
          rsandell rsandell added a comment -

          This is starting to bug me a lot, I cannot reproduce it. However I try to kill the connection the reconnect loop reconnects when the connection is back.
          It must be ending up in some semi connected state in your environment that I can't reproduce (or don't know how to reproduce).

          I will change logging api to java.util.logging, so it's more easily filtered in Hudson's UI for the 2.1 release and try to put some clever debug logs on the lower levels.

          rsandell rsandell added a comment - This is starting to bug me a lot, I cannot reproduce it. However I try to kill the connection the reconnect loop reconnects when the connection is back. It must be ending up in some semi connected state in your environment that I can't reproduce (or don't know how to reproduce). I will change logging api to java.util.logging, so it's more easily filtered in Hudson's UI for the 2.1 release and try to put some clever debug logs on the lower levels.
          antonystubbs antonystubbs added a comment -

          Clicking "restart" as our builds arent' triggering, I'm getting:

          ==> gerrit/review_site/logs/sshd_log <==
          [2010-07-22 06:42:16,334 +0000] 0e887335 hudson a/1000081 'gerrit stream-events' 0ms 850392ms killed
          
          ==> gerrit/review_site/logs/error_log <==
          [2010-07-22 06:42:16,346] WARN  org.apache.sshd.server.session.ServerSession : Exception caught
          org.apache.mina.core.write.WriteToClosedSessionException
                  at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:573)
                  at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:525)
                  at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeSessions(AbstractPollingIoProcessor.java:497)
                  at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:61)
                  at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:974)
                  at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:619)
          
          ==> gerrit/review_site/logs/sshd_log <==
          [2010-07-22 06:42:16,518 +0000] 0e887335 hudson a/1000081 LOGOUT
          [2010-07-22 06:42:17,718 +0000] aee847ea hudson a/1000081 LOGIN FROM x.x.x.x
          
          antonystubbs antonystubbs added a comment - Clicking "restart" as our builds arent' triggering, I'm getting: ==> gerrit/review_site/logs/sshd_log <== [2010-07-22 06:42:16,334 +0000] 0e887335 hudson a/1000081 'gerrit stream-events' 0ms 850392ms killed ==> gerrit/review_site/logs/error_log <== [2010-07-22 06:42:16,346] WARN org.apache.sshd.server.session.ServerSession : Exception caught org.apache.mina.core.write.WriteToClosedSessionException at org.apache.mina.core.polling.AbstractPollingIoProcessor.clearWriteRequestQueue(AbstractPollingIoProcessor.java:573) at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeNow(AbstractPollingIoProcessor.java:525) at org.apache.mina.core.polling.AbstractPollingIoProcessor.removeSessions(AbstractPollingIoProcessor.java:497) at org.apache.mina.core.polling.AbstractPollingIoProcessor.access$600(AbstractPollingIoProcessor.java:61) at org.apache.mina.core.polling.AbstractPollingIoProcessor$Processor.run(AbstractPollingIoProcessor.java:974) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:619) ==> gerrit/review_site/logs/sshd_log <== [2010-07-22 06:42:16,518 +0000] 0e887335 hudson a/1000081 LOGOUT [2010-07-22 06:42:17,718 +0000] aee847ea hudson a/1000081 LOGIN FROM x.x.x.x
          ccutrer Cody Cutrer added a comment -

          We're also seeing this issue - I have to restart gerrit trigger every morning. We're running the version located here: http://ci.hudson-labs.org/view/Plugins/job/plugins_gerrit-trigger-plugin/13/com.sonyericsson.hudson.plugins.gerrit$gerrit-trigger/ (needed the merge commit fix). What logging can I enable to help?

          ccutrer Cody Cutrer added a comment - We're also seeing this issue - I have to restart gerrit trigger every morning. We're running the version located here: http://ci.hudson-labs.org/view/Plugins/job/plugins_gerrit-trigger-plugin/13/com.sonyericsson.hudson.plugins.gerrit$gerrit-trigger/ (needed the merge commit fix). What logging can I enable to help?
          rsandell rsandell added a comment -

          We are running our Hudson instance for weeks at the time without loosing the connection (we are restarting it for other reasons), so I have problems reproducing it. Could it be an OS or JVM issue?
          You could try to increase the TCP or SSH timeouts on the OS level both on the server running Hudson and on the server running Gerrit as a workaround.

          But the trigger should of course notice the timeout and automatically reconnect by design, but for some reason the dropped connection is not noticed by the ssh library that we are using.

          rsandell rsandell added a comment - We are running our Hudson instance for weeks at the time without loosing the connection (we are restarting it for other reasons), so I have problems reproducing it. Could it be an OS or JVM issue? You could try to increase the TCP or SSH timeouts on the OS level both on the server running Hudson and on the server running Gerrit as a workaround. But the trigger should of course notice the timeout and automatically reconnect by design, but for some reason the dropped connection is not noticed by the ssh library that we are using.
          rsandell rsandell added a comment -

          Are there any brave users that can test if this commit helped?
          https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9adbcef689d8feac63e8780648bd417991dc0292
          Since I can't reproduce it in our environments.

          rsandell rsandell added a comment - Are there any brave users that can test if this commit helped? https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9adbcef689d8feac63e8780648bd417991dc0292 Since I can't reproduce it in our environments.
          ccutrer Cody Cutrer added a comment -

          Installed, and scripted restarting of the gerrit connection every hour disabled. I'll let you know on monday if it's still connected.

          ccutrer Cody Cutrer added a comment - Installed, and scripted restarting of the gerrit connection every hour disabled. I'll let you know on monday if it's still connected.
          ccutrer Cody Cutrer added a comment -

          Yay, still working today, and jenkins has not been restarted, nor has gerrit connection been restarted. I would say the issue has been fixed.

          ccutrer Cody Cutrer added a comment - Yay, still working today, and jenkins has not been restarted, nor has gerrit connection been restarted. I would say the issue has been fixed.
          evernat evernat added a comment -

          This issue was fixed a year ago according to the last comment.

          evernat evernat added a comment - This issue was fixed a year ago according to the last comment.
          sosandstrom Ola Sandstrom added a comment -

          Yesterday, we stopped and started gerrit/git.
          From that point, gerrit never triggered Jenkins, until we restarted the gerrit-trigger. We're using v 2.5.1 of the plugin, 1.456 of Jenkins.

          sosandstrom Ola Sandstrom added a comment - Yesterday, we stopped and started gerrit/git. From that point, gerrit never triggered Jenkins, until we restarted the gerrit-trigger. We're using v 2.5.1 of the plugin, 1.456 of Jenkins.
          glundh glundh added a comment -

          You guys that still see this issue, would you mind building & testing this release?

          https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9d6e7db02dea2e79ec87a9bb02d8c0cbeaa4fbbd

          The jsch dependency has been updated, and it has a few fixes which could potentially help solve this issue.

          Please know that this is a unreleased version of Gerrit-Trigger (do not yet bring it into your 99.999% uptime production system).

          glundh glundh added a comment - You guys that still see this issue, would you mind building & testing this release? https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9d6e7db02dea2e79ec87a9bb02d8c0cbeaa4fbbd The jsch dependency has been updated, and it has a few fixes which could potentially help solve this issue. Please know that this is a unreleased version of Gerrit-Trigger (do not yet bring it into your 99.999% uptime production system).

          How do we merge https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9adbcef689d8feac63e8780648bd417991dc0292 this into gerrit plugin installed on a server?
          Can any1 guide?

          anushreeag Anushree Ganjam added a comment - How do we merge https://github.com/jenkinsci/gerrit-trigger-plugin/commit/9adbcef689d8feac63e8780648bd417991dc0292 this into gerrit plugin installed on a server? Can any1 guide?

          People

            rsandell rsandell
            antonystubbs antonystubbs
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: