Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-20272

Disconnected nodes don't need to be checked for reponse time

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      From the Jenkins log of a non-productive instance I set up to test something:

      Okt 25, 2013 11:26:43 AM hudson.WebAppMain$3 run
      INFO: Jenkins is fully up and running
      Okt 25, 2013 3:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      Okt 25, 2013 4:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      Okt 25, 2013 5:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      Okt 25, 2013 6:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      Okt 25, 2013 7:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      Okt 25, 2013 8:26:36 PM hudson.node_monitors.ResponseTimeMonitor$1 monitor
      WARNING: Making ffoooo offline because it’s not responding
      

      The node in question is a dumb slave configured to connect via JNLP. It never was connected.

      There should be no warnings about responsiveness for nodes that aren't connected.

        Attachments

          Activity

          Hide
          peter_schuetze peter_schuetze added a comment - - edited

          I have lots of Nodes that are only used for deployments. They will only be taken online when needed and disconnect on 15 min inactivity. Looks very silly to have all of them marked as unresponsive just because they are offline. In addition, I am a fan of "only report what's an issue".

          Show
          peter_schuetze peter_schuetze added a comment - - edited I have lots of Nodes that are only used for deployments. They will only be taken online when needed and disconnect on 15 min inactivity. Looks very silly to have all of them marked as unresponsive just because they are offline. In addition, I am a fan of "only report what's an issue".
          Hide
          davida2009 David Aldrich added a comment -

          If I mark a node as offline, I notice that Jenkins still tries to connect to it, fails and gives a warning. Is that the same point as stated above? I agree that Jenkins shouldn't try to connect to nodes marked offline, it fills the log with pointless warning messages.

          Show
          davida2009 David Aldrich added a comment - If I mark a node as offline, I notice that Jenkins still tries to connect to it, fails and gives a warning. Is that the same point as stated above? I agree that Jenkins shouldn't try to connect to nodes marked offline, it fills the log with pointless warning messages.
          Hide
          danielbeck Daniel Beck added a comment -

          David Aldrich: Your slave retention strategy is "Keep online as much as possible", and that is what Jenkins does. "Temporarily mark offline" just makes it unavailable for building jobs. See also JENKINS-13140.

          Workaround: Build and install https://github.com/daniel-beck/jenkins-keep-slave-disconnected-plugin and configure the retention strategy provided by it.

          Show
          danielbeck Daniel Beck added a comment - David Aldrich : Your slave retention strategy is "Keep online as much as possible", and that is what Jenkins does. "Temporarily mark offline" just makes it unavailable for building jobs. See also JENKINS-13140 . Workaround: Build and install https://github.com/daniel-beck/jenkins-keep-slave-disconnected-plugin and configure the retention strategy provided by it.
          Hide
          davida2009 David Aldrich added a comment -

          Hi Daniel,
          Thanks for your reply, I understand now. I would like to try your plugin. Are there instructions for how to build it?

          Could you be persuaded to publish a pre-built version please?

          Best regards
          David

          Show
          davida2009 David Aldrich added a comment - Hi Daniel, Thanks for your reply, I understand now. I would like to try your plugin. Are there instructions for how to build it? Could you be persuaded to publish a pre-built version please? Best regards David
          Hide
          davida2009 David Aldrich added a comment -

          Hi Daniel

          Thanks for making the 'Keep Offline Slaves Disconnected Retention Strategy Plugin' available through plugin manager. I have installed the plugin, restarted and set the Availability for nodes I have taken offline to:

          Keep this slave online as much as possible, but don't reconnect if temporarily marked offline by the user

          I no longer see messages in the log of type:

          Attempting to reconnect xxxx

          But I do still messages of type:

          Making xxxx offline because it’s not responding

          Is that what you would expect?

          Best regards

          David

          Show
          davida2009 David Aldrich added a comment - Hi Daniel Thanks for making the 'Keep Offline Slaves Disconnected Retention Strategy Plugin' available through plugin manager. I have installed the plugin, restarted and set the Availability for nodes I have taken offline to: Keep this slave online as much as possible, but don't reconnect if temporarily marked offline by the user I no longer see messages in the log of type: Attempting to reconnect xxxx But I do still messages of type: Making xxxx offline because it’s not responding Is that what you would expect? Best regards David
          Hide
          danielbeck Daniel Beck added a comment -

          Hi David,

          that was all Nicolas – thank him.

          Making xxxx offline because it’s not responding

          That's a known issue in the response time monitor which doesn't care that a node is disconnected: JENKINS-20272

          Show
          danielbeck Daniel Beck added a comment - Hi David, that was all Nicolas – thank him. Making xxxx offline because it’s not responding That's a known issue in the response time monitor which doesn't care that a node is disconnected: JENKINS-20272
          Hide
          davida2009 David Aldrich added a comment -

          Well, thank you Nicolas, if you are watching this issue.

          Thanks for your reply Daniel. I guess you don't mean JENKINS-20272 because that is this issue?

          Show
          davida2009 David Aldrich added a comment - Well, thank you Nicolas, if you are watching this issue. Thanks for your reply Daniel. I guess you don't mean JENKINS-20272 because that is this issue?
          Hide
          danielbeck Daniel Beck added a comment -

          Uh, no, given your response I thought this was the reconnecting issue. This issue is still unresolved and it happens for all disconnected nodes, so yes, I'd expect this to happen.

          Show
          danielbeck Daniel Beck added a comment - Uh, no, given your response I thought this was the reconnecting issue. This issue is still unresolved and it happens for all disconnected nodes, so yes, I'd expect this to happen.
          Hide
          davida2009 David Aldrich added a comment -

          Hi Daniel

          It's a few years since this we discussed this. It seems that the ResponseTimeMonitor still warns that it is marking disconnected nodes offline. We often have a number of nodes that are intentionally taken offline, so our system log is crowded with these warning messages. Could they be disabled?

          Best regards

          David

          Show
          davida2009 David Aldrich added a comment - Hi Daniel It's a few years since this we discussed this. It seems that the ResponseTimeMonitor still warns that it is marking disconnected nodes offline. We often have a number of nodes that are intentionally taken offline, so our system log is crowded with these warning messages. Could they be disabled? Best regards David
          Show
          abayer Andrew Bayer added a comment - PR up at https://github.com/jenkinsci/jenkins/pull/2911
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Andrew Bayer
          Path:
          core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
          test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java
          http://jenkins-ci.org/commit/jenkins/5f125d110eb9f65ec6bd6030466df846c4c96f34
          Log:
          [FIXED JENKINS-20272] Don't monitor response on offline agents (#2911)

          • [FIXED JENKINS-20272] Don't monitor response on offline agents
          • Updating to only not check if channel is null.
          • Fix broken test.
          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Andrew Bayer Path: core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java http://jenkins-ci.org/commit/jenkins/5f125d110eb9f65ec6bd6030466df846c4c96f34 Log: [FIXED JENKINS-20272] Don't monitor response on offline agents (#2911) [FIXED JENKINS-20272] Don't monitor response on offline agents Updating to only not check if channel is null. Fix broken test.
          Hide
          aheritier Arnaud Héritier added a comment -

          Andrew Bayer Daniel Beck it should be marked as solved ? Merged in 2.72 but not backported in 2.60 LTS AFAIK

          Show
          aheritier Arnaud Héritier added a comment - Andrew Bayer Daniel Beck it should be marked as solved ? Merged in 2.72 but not backported in 2.60 LTS AFAIK
          Hide
          olivergondza Oliver Gondža added a comment -

          I suspect the fix is insufficient. The test attached there was passing before the fix, I still see machine disconnected in an improved test[1] and most important of all, I see provisioned computers in openstack-cloud plugin turned offline before launcher kicks in.

          Hence I am reopening and self-assigning this.

          [1] https://github.com/olivergondza/jenkins/commit/a19980b577129d707106b6650305dd49aac5ca8a

          Show
          olivergondza Oliver Gondža added a comment - I suspect the fix is insufficient. The test attached there was passing before the fix, I still see machine disconnected in an improved test [1] and most important of all, I see provisioned computers in openstack-cloud plugin turned offline before launcher kicks in. Hence I am reopening and self-assigning this. [1] https://github.com/olivergondza/jenkins/commit/a19980b577129d707106b6650305dd49aac5ca8a
          Hide
          olivergondza Oliver Gondža added a comment -
          Show
          olivergondza Oliver Gondža added a comment - Fix proposed: https://github.com/jenkinsci/jenkins/pull/3453
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Oliver Gondža
          Path:
          core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java
          core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
          test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java
          http://jenkins-ci.org/commit/jenkins/8872b44cc8823c2106dfc5ba9a344fd1d82e4823
          Log:
          JENKINS-20272 - Disconnected nodes should not be disconnected repeatedly (#3453)

          • [FIX JENKINS-20272] Do not disconnect skipped computers

          *NOTE:* This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

          Functionality will be removed from GitHub.com on January 31st, 2019.

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java http://jenkins-ci.org/commit/jenkins/8872b44cc8823c2106dfc5ba9a344fd1d82e4823 Log: JENKINS-20272 - Disconnected nodes should not be disconnected repeatedly (#3453) JENKINS-20272 Reproduce in test [FIX JENKINS-20272] Do not disconnect skipped computers JENKINS-20272 Clean the API a bit * NOTE: * This service been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/ Functionality will be removed from GitHub.com on January 31st, 2019.
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Extra fix has been applied in 2.126

          Show
          oleg_nenashev Oleg Nenashev added a comment - Extra fix has been applied in 2.126
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Oliver Gondža
          Path:
          core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java
          core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java
          test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java
          http://jenkins-ci.org/commit/jenkins/527141e5e8dafe09e4d1c68f10f582243c53c456
          Log:
          JENKINS-20272 - Disconnected nodes should not be disconnected repeatedly (#3453)

          • [FIX JENKINS-20272] Do not disconnect skipped computers

          (cherry picked from commit 8872b44cc8823c2106dfc5ba9a344fd1d82e4823)

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oliver Gondža Path: core/src/main/java/hudson/node_monitors/AbstractAsyncNodeMonitorDescriptor.java core/src/main/java/hudson/node_monitors/ResponseTimeMonitor.java test/src/test/java/hudson/node_monitors/ResponseTimeMonitorTest.java http://jenkins-ci.org/commit/jenkins/527141e5e8dafe09e4d1c68f10f582243c53c456 Log: JENKINS-20272 - Disconnected nodes should not be disconnected repeatedly (#3453) JENKINS-20272 Reproduce in test [FIX JENKINS-20272] Do not disconnect skipped computers JENKINS-20272 Clean the API a bit (cherry picked from commit 8872b44cc8823c2106dfc5ba9a344fd1d82e4823)

            People

            Assignee:
            olivergondza Oliver Gondža
            Reporter:
            danielbeck Daniel Beck
            Votes:
            7 Vote for this issue
            Watchers:
            11 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: