Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-34213

Under high throughput the Unexporter can fail to clean up fast enough

      The Unexporter does not try to drain the queue, rather it unexports at most one object every 200ms... which can create GC pressure when the remoting layer is under stress.

          [JENKINS-34213] Under high throughput the Unexporter can fail to clean up fast enough

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          http://jenkins-ci.org/commit/remoting/98a7e3a837853ab1e8bf8b008862ed5738378bfc
          Log:
          JENKINS-34213 Collect and report meaningful stats, batch the reference collection for better stability

          • We want to report on these things only if they are an issue. Logging of the actual stats should be below the radar of the users using a default logger level of `INFO` provided that the Unexporter is not doing much
          • When the Unexporter is busy (i.e. the m1 rate is > 100/sec) then we should start reporting at `INFO`
          • In the event that there is sustained high levels of work, we should alert the user and recommend turning off the stack traces to reduce GC pressure
          • My stress testing revealed that under very heavy load it is better to batch the removal and then batch the clean-up even if this batching means that the sweeps are not as frequent.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/hudson/remoting/RemoteInvocationHandler.java http://jenkins-ci.org/commit/remoting/98a7e3a837853ab1e8bf8b008862ed5738378bfc Log: JENKINS-34213 Collect and report meaningful stats, batch the reference collection for better stability We want to report on these things only if they are an issue. Logging of the actual stats should be below the radar of the users using a default logger level of `INFO` provided that the Unexporter is not doing much When the Unexporter is busy (i.e. the m1 rate is > 100/sec) then we should start reporting at `INFO` In the event that there is sustained high levels of work, we should alert the user and recommend turning off the stack traces to reduce GC pressure My stress testing revealed that under very heavy load it is better to batch the removal and then batch the clean-up even if this batching means that the sweeps are not as frequent.

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          http://jenkins-ci.org/commit/remoting/d9654829cade2bbedae787a1b5db5c2a66005b03
          Log:
          JENKINS-34213 Oops typo in the stats measurement

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/hudson/remoting/RemoteInvocationHandler.java http://jenkins-ci.org/commit/remoting/d9654829cade2bbedae787a1b5db5c2a66005b03 Log: JENKINS-34213 Oops typo in the stats measurement

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          http://jenkins-ci.org/commit/remoting/4dbf0ea480defd9bf98fca6d8a50f499df4ff08d
          Log:
          JENKINS-34213 Copy and paste error

          The three most common programming errors are:

          • Off by one errors
          • Copy and paste errors

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/hudson/remoting/RemoteInvocationHandler.java http://jenkins-ci.org/commit/remoting/4dbf0ea480defd9bf98fca6d8a50f499df4ff08d Log: JENKINS-34213 Copy and paste error The three most common programming errors are: Off by one errors Copy and paste errors

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          http://jenkins-ci.org/commit/remoting/c9c077b7508fd715315a59a0423b7faec0b1a143
          Log:
          JENKINS-34213 Just when you were least expecting it, findbugs finds an actual real bug

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/hudson/remoting/RemoteInvocationHandler.java http://jenkins-ci.org/commit/remoting/c9c077b7508fd715315a59a0423b7faec0b1a143 Log: JENKINS-34213 Just when you were least expecting it, findbugs finds an actual real bug

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          src/main/java/hudson/remoting/RemoteInvocationHandler.java
          http://jenkins-ci.org/commit/remoting/f4d287652e09da627f51779bec1c3ca7f1d46e70
          Log:
          [FIXED JENKINS-34213] Ensure that the unexporter cleans up whatever it can each sweep (#81)

          • [FIXED JENKINS-34213] Ensure that the unexporter cleans up whatever it can each sweep
          • JENKINS-34213 Collect and report meaningful stats, batch the reference collection for better stability
          • We want to report on these things only if they are an issue. Logging of the actual stats should be below the radar of the users using a default logger level of `INFO` provided that the Unexporter is not doing much
          • When the Unexporter is busy (i.e. the m1 rate is > 100/sec) then we should start reporting at `INFO`
          • In the event that there is sustained high levels of work, we should alert the user and recommend turning off the stack traces to reduce GC pressure
          • My stress testing revealed that under very heavy load it is better to batch the removal and then batch the clean-up even if this batching means that the sweeps are not as frequent.
          • Ignore when the channel is already closed due to a race between testing for close and close

          Spotted in https://jenkins.ci.cloudbees.com/job/libraries/job/remoting/177/console

          The three most common programming errors are:

          • Off by one errors
          • Copy and paste errors
          • JENKINS-34213 Just when you were least expecting it, findbugs finds an actual real bug

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: src/main/java/hudson/remoting/RemoteInvocationHandler.java http://jenkins-ci.org/commit/remoting/f4d287652e09da627f51779bec1c3ca7f1d46e70 Log: [FIXED JENKINS-34213] Ensure that the unexporter cleans up whatever it can each sweep (#81) [FIXED JENKINS-34213] Ensure that the unexporter cleans up whatever it can each sweep JENKINS-34213 Collect and report meaningful stats, batch the reference collection for better stability We want to report on these things only if they are an issue. Logging of the actual stats should be below the radar of the users using a default logger level of `INFO` provided that the Unexporter is not doing much When the Unexporter is busy (i.e. the m1 rate is > 100/sec) then we should start reporting at `INFO` In the event that there is sustained high levels of work, we should alert the user and recommend turning off the stack traces to reduce GC pressure My stress testing revealed that under very heavy load it is better to batch the removal and then batch the clean-up even if this batching means that the sweeps are not as frequent. JENKINS-34213 Oops typo in the stats measurement Ignore when the channel is already closed due to a race between testing for close and close Spotted in https://jenkins.ci.cloudbees.com/job/libraries/job/remoting/177/console JENKINS-34213 Copy and paste error The three most common programming errors are: Off by one errors Copy and paste errors JENKINS-34213 Just when you were least expecting it, findbugs finds an actual real bug

          What do we do stephenconnolly ? Do we try to provide a stable version of the remoting to have this included in 1.651.3 ?

          Arnaud Héritier added a comment - What do we do stephenconnolly ? Do we try to provide a stable version of the remoting to have this included in 1.651.3 ?

          We need ci_jenkinsci_org to actually cut a new release of remoting... when that has been out long enough then the fix will be eligible for the LTS release line

          Stephen Connolly added a comment - We need ci_jenkinsci_org to actually cut a new release of remoting... when that has been out long enough then the fix will be eligible for the LTS release line

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/409438f36dc80f20964fb16f8d88041e11ba4ed4
          Log:
          [JENKINS-19445, JENKINS-34213, JENKINS-34808, JENKINS-34121] Bump remoting to 2.59. (#2344)

          • [JENKINS-19445, JENKINS-34213, JENKINS-34808] Bump remoting to 2.58.

          Changes:

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml http://jenkins-ci.org/commit/jenkins/409438f36dc80f20964fb16f8d88041e11ba4ed4 Log: [JENKINS-19445, JENKINS-34213, JENKINS-34808, JENKINS-34121] Bump remoting to 2.59. (#2344) [JENKINS-19445, JENKINS-34213, JENKINS-34808] Bump remoting to 2.58. Changes: JENKINS-34213 ( https://issues.jenkins-ci.org/browse/JENKINS-34213 ) - Ensure that the unexporter cleans up whatever it can each sweep ( https://github.com/jenkinsci/remoting/pull/81 ) JENKINS-19445 ( https://issues.jenkins-ci.org/browse/JENKINS-19445 ) Force class load on UserRequest in order to prevent deadlock on windows nodes when using JNA and Subversion ( https://github.com/jenkinsci/remoting/pull/81 ) JENKINS-34808 ( https://issues.jenkins-ci.org/browse/JENKINS-34808 ) - Allow user to adjust socket timeout ( https://github.com/jenkinsci/remoting/pull/68 ) JENKINS-34121 - Upgrade remoting to 2.59

          Oleg Nenashev added a comment -

          It has been released as remoting-2.58 and jenkins-2.4

          Oleg Nenashev added a comment - It has been released as remoting-2.58 and jenkins-2.4

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          http://jenkins-ci.org/commit/jenkins/12e79963cca5122351943ee107f65c3ad91a2e25
          Log:
          [JENKINS-19445, JENKINS-34213, JENKINS-34808, JENKINS-34121] Bump remoting to 2.59. (#2344)

          • [JENKINS-19445, JENKINS-34213, JENKINS-34808] Bump remoting to 2.58.

          Changes:

          (cherry picked from commit 409438f36dc80f20964fb16f8d88041e11ba4ed4)

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml http://jenkins-ci.org/commit/jenkins/12e79963cca5122351943ee107f65c3ad91a2e25 Log: [JENKINS-19445, JENKINS-34213, JENKINS-34808, JENKINS-34121] Bump remoting to 2.59. (#2344) [JENKINS-19445, JENKINS-34213, JENKINS-34808] Bump remoting to 2.58. Changes: JENKINS-34213 ( https://issues.jenkins-ci.org/browse/JENKINS-34213 ) - Ensure that the unexporter cleans up whatever it can each sweep ( https://github.com/jenkinsci/remoting/pull/81 ) JENKINS-19445 ( https://issues.jenkins-ci.org/browse/JENKINS-19445 ) Force class load on UserRequest in order to prevent deadlock on windows nodes when using JNA and Subversion ( https://github.com/jenkinsci/remoting/pull/81 ) JENKINS-34808 ( https://issues.jenkins-ci.org/browse/JENKINS-34808 ) - Allow user to adjust socket timeout ( https://github.com/jenkinsci/remoting/pull/68 ) JENKINS-34121 - Upgrade remoting to 2.59 (cherry picked from commit 409438f36dc80f20964fb16f8d88041e11ba4ed4)

            stephenconnolly Stephen Connolly
            stephenconnolly Stephen Connolly
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: