Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-39498

Missed Events Playback Feature fails at Jenkins restart

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • gerrit-trigger-plugin
    • None
    • Jenkins 2.27
      Gerrit Trigger plugin 2.22.0
      Gerrit 2.12.2
      Jenkins and Gerrit on operating system SLES 11.3

    Description

      Our Jenkins has some 2500 jobs, the majority of them connected to Gerrit projects with Gerrit Trigger plugin.
      When restarting Jenkins, the Jenkins job queue is filled with some 500 jobs, which have already been run before! In other words: the Gerrit Trigger plugin seems to forget that Jenkins has run theses jobs before.
      We experienced the same behavior in the passed when we stopped the Gerrit server in the Jenkins Gerrit Trigger configuration and then restarted it.

      Attachments

        Issue Links

          Activity

            skipot Sergii Kipot added a comment -

            I have the similar issue with

            Jenkins 2.60.1
            Gerrit Trigger plugin 2.24.0
            Gerrit 2.13.6

            Here is my observations of the issue:

            1. For some time(up to a few days) everything works ok and event timestamp is updated constantly in gerrit-server-event-data/gerrit.mgm.sipwise.com/gerrit-trigger-server-timestamps.xml
            2. Then at some point gerrit plugin starts showing warning on page 'Manage Jenkins -> Gerrit Trigger -> Edit'  -
                     "Gerrit Missed Events Playback is not supported. Verify if the connection has the REST API enabled and that the Gerrit Events-log plugin is installed and configured on the Gerrit Server."and event info in gerrit-trigger-server-timestamps.xml stops to be updated. New events are still processed though.
            3. As a result during jenkins restart (or manual gerrit trigger reconnect) all events since last written timestamp will be pulled into a queue even though they have been already processed once.
            4. Nothing related to connection loss or alike is written to jenkins or gerrit logs.

            Basically it looks as if plugin didn't try or failed to reconnect to gerrit API after keepalive timeout.

            Any ideas what could be the root of the issue and how to debug it further?

            skipot Sergii Kipot added a comment - I have the similar issue with Jenkins 2.60.1 Gerrit Trigger plugin 2.24.0 Gerrit 2.13.6 Here is my observations of the issue: For some time(up to a few days) everything works ok and event timestamp is updated constantly in gerrit-server-event-data/gerrit.mgm.sipwise.com/gerrit-trigger-server-timestamps.xml Then at some point gerrit plugin starts showing warning on page 'Manage Jenkins -> Gerrit Trigger -> Edit'  -        "Gerrit Missed Events Playback is not supported. Verify if the connection has the REST API enabled and that the Gerrit Events-log plugin is installed and configured on the Gerrit Server."and event info in gerrit-trigger-server-timestamps.xml stops to be updated. New events are still processed though. As a result during jenkins restart (or manual gerrit trigger reconnect) all events since last written timestamp will be pulled into a queue even though they have been already processed once. Nothing related to connection loss or alike is written to jenkins or gerrit logs. Basically it looks as if plugin didn't try or failed to reconnect to gerrit API after keepalive timeout. Any ideas what could be the root of the issue and how to debug it further?
            rsandell rsandell added a comment -

            skipot If it is like you say then we somehow need to figure out how it looses the information about events playback being supported.

            My guess is that, as you say, at some point when Jenkins reconnects it can't determine that the feature is supported and so by design stops recording the last event that came in. 

            But then when you restart Jenkins it detects the feature again just fine and picks the last timestamp recorded which is by then quite old.

            So to properly fix it we need to find out how the detection fails. I can't really do that since I don't have access to a live instance where this happens, but I can try to add some more logging around that area to perhaps help you.

             

            I can also do a somewhat dirty fix for when this happens, both are a bit risky, but one is riskier than the other.

            The less risky one would be to remove the timestamp file whenever it is detected that the feature is not supported. That way when Jenkins restarts or reconnects it won't replay any events at all since there is no timestamp to replay from. It would be "as if" you connected to the gerrit server for the first time.

            Another approach could be to make it so that if the feature is detected as not supported but there is a timestamp file available then continue to record the timestamps as if the feature is there even though we don't know if it is. So when Jenkins reconnects and detects the feature anew it will have the correct timestamp to query from.
            There are though risk for some unforseen errors that could happen here as well as a problem with legit scenarios of the plugin actually being removed fom Gerrit and so Jenkins spectacularly failing to talk to Gerrit about it when reconnecting.

            rsandell rsandell added a comment - skipot If it is like you say then we somehow need to figure out how it looses the information about events playback being supported. My guess is that, as you say, at some point when Jenkins reconnects it can't determine that the feature is supported and so by design stops recording the last event that came in.  But then when you restart Jenkins it detects the feature again just fine and picks the last timestamp recorded which is by then quite old. So to properly fix it we need to find out how the detection fails. I can't really do that since I don't have access to a live instance where this happens, but I can try to add some more logging around that area to perhaps help you.   I can also do a somewhat dirty fix for when this happens, both are a bit risky, but one is riskier than the other. The less risky one would be to remove the timestamp file whenever it is detected that the feature is not supported. That way when Jenkins restarts or reconnects it won't replay any events at all since there is no timestamp to replay from. It would be "as if" you connected to the gerrit server for the first time. Another approach could be to make it so that if the feature is detected as not supported but there is a timestamp file available then continue to record the timestamps as if the feature is there even though we don't know if it is. So when Jenkins reconnects and detects the feature anew it will have the correct timestamp to query from. There are though risk for some unforseen errors that could happen here as well as a problem with legit scenarios of the plugin actually being removed fom Gerrit and so Jenkins spectacularly failing to talk to Gerrit about it when reconnecting.
            skipot Sergii Kipot added a comment -

            Hi rsandell,

            I would say we should try to catch the root of the issue before applying any workarounds. Can you please add logging you mentioned so I can try it and collect more info about the issue?

            skipot Sergii Kipot added a comment - Hi rsandell , I would say we should try to catch the root of the issue before applying any workarounds. Can you please add logging you mentioned so I can try it and collect more info about the issue?
            rsandell rsandell added a comment -

            @scoheb maybe you can gain some more insight on this than me?

            rsandell rsandell added a comment - @scoheb maybe you can gain some more insight on this than me?
            skipot Sergii Kipot added a comment -

            Hi,

            Any ideas how to progress here? I will be glad to help with further debugging.

            skipot Sergii Kipot added a comment - Hi, Any ideas how to progress here? I will be glad to help with further debugging.
            skipot Sergii Kipot added a comment -

            Any feedback? Please tell how to continue with debugging.

            skipot Sergii Kipot added a comment - Any feedback? Please tell how to continue with debugging.
            scoheb Scott Hebert added a comment -

            Hi,

            Are you able to tell me if you saw any of the following WARNING log messages when the events stopped being recorded:

            • Playback of missed events not supported for server
            • Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!
            • Event CreatedOn is 0...Gerrit Server does not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!"
            • Not able to verify if Gerrit plugin events-log is installed
            • connectionDown for server:

            I tend to prefer rsandell's possible fix...remove the timestamp file whenever it is detected that the feature is not supported (Gerrit glitch or Gerrit restart and events-log is no longer installed). The reason being is that if the gerrit-trigger plugin IS processing events AND the playback manager CANNOT persist them (for reasons yet unknown), then we cannot track what has to be played back.

             

            scoheb Scott Hebert added a comment - Hi, Are you able to tell me if you saw any of the following WARNING log messages when the events stopped being recorded: Playback of missed events not supported for server Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled! Event CreatedOn is 0...Gerrit Server does not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!" Not able to verify if Gerrit plugin events-log is installed connectionDown for server: I tend to prefer rsandell 's possible fix...remove the timestamp file whenever it is detected that the feature is not supported (Gerrit glitch or Gerrit restart and events-log is no longer installed). The reason being is that if the gerrit-trigger plugin IS processing events AND the playback manager CANNOT persist them (for reasons yet unknown), then we cannot track what has to be played back.  
            skipot Sergii Kipot added a comment -

            There is the following message in log:
                 Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!

            and in gerrit trigger settings page there is warning at the top:

                "Gerrit Missed Events Playback is not supported. Verify if the connection has the REST API enabled and that the Gerrit Events-log plugin is installed and configured on the Gerrit Server."

            And connection to gerrit events-log is never restored after that until someone manually disable/enable gerrit trigger from its status page (Manage Jenkins->Gerrir Trigger->press status circle icon) or jenkins is restarted.

            So even though removing the timestamp file is reasonable solution it doesn't fix the root of the issue - inability to reconnect Gerrit Events-log plugin
             
             

            skipot Sergii Kipot added a comment - There is the following message in log:      Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled! and in gerrit trigger settings page there is warning at the top:     "Gerrit Missed Events Playback is not supported. Verify if the connection has the REST API enabled and that the Gerrit Events-log plugin is installed and configured on the Gerrit Server." And connection to gerrit events-log is never restored after that until someone manually disable/enable gerrit trigger from its status page (Manage Jenkins->Gerrir Trigger->press status circle icon) or jenkins is restarted. So even though removing the timestamp file is reasonable solution it doesn't fix the root of the issue - inability to reconnect Gerrit Events-log plugin    
            scoheb Scott Hebert added a comment -

            I am working on a PR that will periodically attempt to verify if a connection can be made to Gerrit to make use of the events-log plugin. See https://github.com/jenkinsci/gerrit-trigger-plugin/pull/335

            It is a work in progress...some tests are failing.

            I will let you know when I have a snapshot you can test with.

            scoheb Scott Hebert added a comment - I am working on a PR that will periodically attempt to verify if a connection can be made to Gerrit to make use of the events-log plugin. See https://github.com/jenkinsci/gerrit-trigger-plugin/pull/335 It is a work in progress...some tests are failing. I will let you know when I have a snapshot you can test with.
            scoheb Scott Hebert added a comment - skipot Please give this – https://ci.jenkins.io/job/Plugins/job/gerrit-trigger-plugin/view/change-requests/job/PR-335/lastSuccessfulBuild/artifact/target/gerrit-trigger.hpi a try...on a staging/dev system.
            skipot Sergii Kipot added a comment -

            Thank you. I installed the plugin you provided. Will inform you on results after running it for a few days.

            skipot Sergii Kipot added a comment - Thank you. I installed the plugin you provided. Will inform you on results after running it for a few days.
            skipot Sergii Kipot added a comment -

            Tested even in production. Things looks ok now. Every time I see connection loss in logs:
                 Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn

            there is also reconnection during the same second or two

                 com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener onStarted

                 Missed Events Playback used to be NOT supported. now it IS!

            So the PR looks fine for merging and releasing new gerrit-trigger version

             

            skipot Sergii Kipot added a comment - Tested even in production. Things looks ok now. Every time I see connection loss in logs:      Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn there is also reconnection during the same second or two      com.sonyericsson.hudson.plugins.gerrit.trigger.gerritnotifier.ToGerritRunListener onStarted      Missed Events Playback used to be NOT supported. now it IS! So the PR looks fine for merging and releasing new gerrit-trigger version  

            Code changed in jenkins
            User: Scott Hebert
            Path:
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/GerritServer.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/mock/Setup.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsFunctionalTest.java
            http://jenkins-ci.org/commit/gerrit-trigger-plugin/8478576ea555c1e9b588b5e23030aca43e148e76
            Log:
            JENKINS-39498 Fix invalid data

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Scott Hebert Path: src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/GerritServer.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/mock/Setup.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsFunctionalTest.java http://jenkins-ci.org/commit/gerrit-trigger-plugin/8478576ea555c1e9b588b5e23030aca43e148e76 Log: JENKINS-39498 Fix invalid data

            Code changed in jenkins
            User: Robert Sandell
            Path:
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackEnabledChecker.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/mock/Setup.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsFunctionalTest.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsLoadPersistTest.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java
            http://jenkins-ci.org/commit/gerrit-trigger-plugin/fde68b35f54f73982458f97e4a631ddd894b6408
            Log:
            Merge pull request #335 from scoheb/missed-events

            JENKINS-39498 Fix invalid data

            Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/7a5f54bf4263...fde68b35f54f

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Robert Sandell Path: src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackEnabledChecker.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/mock/Setup.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsFunctionalTest.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsLoadPersistTest.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java http://jenkins-ci.org/commit/gerrit-trigger-plugin/fde68b35f54f73982458f97e4a631ddd894b6408 Log: Merge pull request #335 from scoheb/missed-events JENKINS-39498 Fix invalid data Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/7a5f54bf4263...fde68b35f54f

            I'm seeing something similar.  I have a job that gets triggered when a new patch set is submitted, and it'll all work perfectly fine until it "misses" some.  I'll see these error messages:

            Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager persist
            Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled!
            Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            Missed Events Playback used to be NOT supported. now it IS!

            And then subsequent jobs will trigger normally, but the ones that happened at that moment must be manually triggered.

            This message seems to correlate to patch sets being submitted to Gerrit.  I suspect that the plugin receives the event but then has some issue processing it and skips it (and all temporally proximate events) until the "now it IS!" message.

            If I subscribe to the events stream, I don't see any events with a "eventCreatedOn" attribute of null; they all have it set appropriately.

            I haven't been able to pin down where in the source this gets lost; might be the event-log code from "com.sonymobile.tools.gerrit.gerritevents.dto.events.GerritTriggeredEvent"?

            tekkamanendless Douglas Manley added a comment - I'm seeing something similar.  I have a job that gets triggered when a new patch set is submitted, and it'll all work perfectly fine until it "misses" some.  I'll see these error messages: Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager persist Event CreatedOn is null...Gerrit Server might not support attribute eventCreatedOn. Will NOT persist this event and Missed Events will be disabled! Jan 19, 2018 3:04:18 PM WARNING com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck Missed Events Playback used to be NOT supported. now it IS! And then subsequent jobs will trigger normally, but the ones that happened at that moment must be manually triggered. This message seems to correlate to patch sets being submitted to Gerrit.  I suspect that the plugin receives the event but then has some issue processing it and skips it (and all temporally proximate events) until the "now it IS!" message. If I subscribe to the events stream, I don't see any events with a "eventCreatedOn" attribute of null; they all have it set appropriately. I haven't been able to pin down where in the source this gets lost; might be the event-log code from "com.sonymobile.tools.gerrit.gerritevents.dto.events.GerritTriggeredEvent"?

            From what I can tell, a Gerrit event comes in, but the plugin can't establish a connection due to a timeout:

            Jan 19, 2018 3:33:36 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer startConnection
            WARNING: Already started!
            Jan 19, 2018 3:33:46 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer doWakeup
            {{SEVERE: Could not start connection. }}
            java.lang.InterruptedException: time out.

            Shortly after that, "checkIfEventsLogPluginSupported" is called:

            {{ public void checkIfEventsLogPluginSupported() {}}
               GerritServer server = PluginImpl.getServer_(serverName);
               if (server != null && server.getConfig() != null) {
                  isSupported = GerritPluginChecker.isPluginEnabled(
                     server.getConfig(), EVENTS_LOG_PLUGIN_NAME, true);
               }
            {{ }}}

            They key thing here is that "isSupported" is always altered if we have a Gerrit server configured (and we do, because why else would we be here talking about  this?).  And "GerritPluginChecker.isPluginEnabled" will return "false" if there is any kind of error in the connection:

            {{ } catch (IOException e) {}}
               logger.warn(Messages.PluginHttpConnectionGeneralError(pluginName,
                  e.getMessage()), e);
               return false;
            {{ } finally {}}

            I am still trying to verify that those log messages are written; I don't see anything in my logs for "GerritPluginChecker", but it must be getting called because "performCheck" is definitely registering a change in "isSupported", and that only happens via "GerritPluginChecker.isPluginEnabled".

            After that, the timestamps are purged:

            Jan 19, 2018 3:34:16 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            WARNING: Deleting /var/lib/jenkins/gerrit-server-event-data/gerrit.extreme-scale.com/gerrit-trigger-server-timestamps.xml

            And then almost immediately:

            Jan 19, 2018 3:34:35 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck
            WARNING: Missed Events Playback used to be NOT supported. now it IS!

            At this point, my old timestamps are lost, so the original event that triggered this whole workflow is lost, causing it to never be built.

            What I think that we should do is update "isPluginEnabled" to throw an exception on I/O failure and then not make any changes to "isSupported" when this occurs.  "GerritPluginChecker.isPluginEnabled" is only called in "GerritMissedEventsPlaybackManager.java", so the amount of work would be minimal.  I may submit a pull request for this...

            tekkamanendless Douglas Manley added a comment - From what I can tell, a Gerrit event comes in, but the plugin can't establish a connection due to a timeout: Jan 19, 2018 3:33:36 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer startConnection WARNING: Already started! Jan 19, 2018 3:33:46 PM com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer doWakeup {{SEVERE: Could not start connection. }} java.lang.InterruptedException: time out. Shortly after that, "checkIfEventsLogPluginSupported" is called: {{ public void checkIfEventsLogPluginSupported() {}}    GerritServer server = PluginImpl.getServer_(serverName);    if (server != null && server.getConfig() != null) {       isSupported = GerritPluginChecker.isPluginEnabled(          server.getConfig(), EVENTS_LOG_PLUGIN_NAME, true);     } {{ }}} They key thing here is that "isSupported" is always  altered if we have a Gerrit server configured (and we do, because why else would we be here talking about  this?).  And "GerritPluginChecker.isPluginEnabled" will return "false" if there is any kind of error in the connection: {{ } catch (IOException e) {}}    logger.warn(Messages.PluginHttpConnectionGeneralError(pluginName,       e.getMessage()), e);    return false; {{ } finally {}} I am still trying to verify that those log messages are written; I don't see anything in my logs for "GerritPluginChecker", but it must be getting called because "performCheck" is definitely registering a change in "isSupported", and that only happens via "GerritPluginChecker.isPluginEnabled". After that, the timestamps are purged: Jan 19, 2018 3:34:16 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck WARNING: Deleting /var/lib/jenkins/gerrit-server-event-data/gerrit.extreme-scale.com/gerrit-trigger-server-timestamps.xml And then almost immediately: Jan 19, 2018 3:34:35 PM com.sonyericsson.hudson.plugins.gerrit.trigger.playback.GerritMissedEventsPlaybackManager performCheck WARNING: Missed Events Playback used to be NOT supported. now it IS! At this point, my old timestamps are lost, so the original event that triggered this whole workflow is lost, causing it to never be built. What I think that we should do is update "isPluginEnabled" to throw an exception on I/O failure and then  not  make any changes to "isSupported" when this occurs.  "GerritPluginChecker.isPluginEnabled" is only called in "GerritMissedEventsPlaybackManager.java", so the amount of work would be minimal.  I may submit a pull request for this...

            On my Gerrit server at this time, I see:
            {{[2018-01-19 15:26:03,468] [sshd-SshServer[3e89f5bc]-nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:32372])[state=Opened] IOException: Connection reset by peer }}
            {{[2018-01-19 15:26:10,333] [sshd-SshServer[3e89f5bc]-nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:32787])[state=Opened] IOException: Connection reset by peer }}
            [2018-01-19 15:26:11,224] [sshd-SshServer[3e89f5bc]-nio2-thread-1] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl[svc_jenkins@/52.203.209.121:28949])[state=Opened] IOException: Connection reset by peer

            This corresponds to the timeouts that I was seeing in Jenkins.

            tekkamanendless Douglas Manley added a comment - On my Gerrit server at this time, I see: {{ [2018-01-19 15:26:03,468] [sshd-SshServer [3e89f5bc] -nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:32372] ) [state=Opened] IOException: Connection reset by peer }} {{ [2018-01-19 15:26:10,333] [sshd-SshServer [3e89f5bc] -nio2-thread-2] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:32787] ) [state=Opened] IOException: Connection reset by peer }} [2018-01-19 15:26:11,224] [sshd-SshServer [3e89f5bc] -nio2-thread-1] WARN  org.apache.sshd.server.session.ServerSessionImpl : exceptionCaught(ServerSessionImpl [svc_jenkins@/52.203.209.121:28949] ) [state=Opened] IOException: Connection reset by peer This corresponds to the timeouts that I was seeing in Jenkins.

            Ah okay, there's one more place where "isSupported" is switched to false, and that's in the "persist" method, which attempts to add a Gerrit event to some list.  If the event lacks "eventCreatedOn" (or if it is set to zero), then that function unilaterally disables event logging.

            That seems a bit strange; I would understand if that function merely ignored events that it didn't support (namely those without an "eventCreatedOn" value), but disabling the whole thing based on a single event is a bit overboard.  In addition, there are numerous places where the "checkIfEventsLogPluginSupport" method is called, so there's ample opportunity for the state to be changed normally.

            This whole thing looks like it's been created to avoid some unspeakable disaster as if something terrible were to happen if "persist" were called and the "events-log" plugin wasn't there.  My reading of the code is that it wouldn't really do anything; "persist" wouldn't store the event, and the playback stuff wouldn't have anything to playback.  Most of the important logic around "isSupported" happens when the connection is started, which makes sense.  Changing that value mid-way based on a missing field from a streaming bus doesn't really help with making the plugin any more robust.

            tekkamanendless Douglas Manley added a comment - Ah okay, there's one more place where "isSupported" is switched to false, and that's in the "persist" method, which attempts to add a Gerrit event to some list.  If the event lacks "eventCreatedOn" (or if it is set to zero), then that function unilaterally disables event logging. That seems a bit strange; I would understand if that function merely ignored events that it didn't support (namely those without an "eventCreatedOn" value), but disabling the whole thing based on a single event is a bit overboard.  In addition, there are numerous places where the "checkIfEventsLogPluginSupport" method is called, so there's ample opportunity for the state to be changed normally. This whole thing looks like it's been created to avoid some unspeakable disaster as if something terrible were to happen if "persist" were called and the "events-log" plugin wasn't there.  My reading of the code is that it wouldn't really do anything; "persist" wouldn't store the event, and the playback stuff wouldn't have anything to playback.  Most of the important logic around "isSupported" happens when the connection is started, which makes sense.  Changing that value mid-way based on a missing field from a streaming bus doesn't really help with making the plugin any more robust.

            Code changed in jenkins
            User: Robert Sandell
            Path:
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java
            src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java
            src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java
            http://jenkins-ci.org/commit/gerrit-trigger-plugin/e48c9b239c62d1c29204af528ae8ea55e24c7f13
            Log:
            Merge pull request #346 from tekkamanendless/master

            JENKINS-39498 Stop panicking about "eventCreatedOn" and losing my position

            Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/f6cdb2fcb3e6...e48c9b239c62

            scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Robert Sandell Path: src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManager.java src/main/java/com/sonyericsson/hudson/plugins/gerrit/trigger/utils/GerritPluginChecker.java src/test/java/com/sonyericsson/hudson/plugins/gerrit/trigger/playback/GerritMissedEventsPlaybackManagerTest.java http://jenkins-ci.org/commit/gerrit-trigger-plugin/e48c9b239c62d1c29204af528ae8ea55e24c7f13 Log: Merge pull request #346 from tekkamanendless/master JENKINS-39498 Stop panicking about "eventCreatedOn" and losing my position Compare: https://github.com/jenkinsci/gerrit-trigger-plugin/compare/f6cdb2fcb3e6...e48c9b239c62

            People

              scoheb Scott Hebert
              andreas_pelzer Andreas Pelzer
              Votes:
              7 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: