Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-68321

SCMEvent threads waiting on rate limit when rate limit isn't close to being hit

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • GitHub Branch Source Plugin version 2.11.4

      Our organization is experiencing large delays when jenkins processes webhooks (on the order of minutes up to hours). When looking at a threadDump, the following is observed in all SCMEvent threads:

      class 
      org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl
       Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]"class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]" Id=2608 Group=main TIMED_WAITING
      	at java.lang.Thread.sleep(Native Method)
      	at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.waitUntilRateLimit(ApiRateLimitChecker.java:325)
      	at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.checkRateLimit(ApiRateLimitChecker.java:261)
      	at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$RateLimitCheckerAdapter.checkRateLimit(ApiRateLimitChecker.java:242)
      	at org.kohsuke.github.GitHubRateLimitChecker.checkRateLimit(GitHubRateLimitChecker.java:128)
      	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:383)
      	at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:355)
      	at org.kohsuke.github.Requester.fetch(Requester.java:76)
      	at org.kohsuke.github.GHRepository.read(GHRepository.java:132)
      	at org.kohsuke.github.GHPerson.getRepository(GHPerson.java:146)
      	at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSource(GitHubSCMNavigator.java:1389)
      	at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSources(GitHubSCMNavigator.java:926)
      	at jenkins.scm.api.SCMNavigator.visitSources(SCMNavigator.java:221)
      	at jenkins.branch.OrganizationFolder$SCMEventListenerImpl.onSCMHeadEvent(OrganizationFolder.java:1049)
      	at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:246)
      	at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:229)
      	at jenkins.scm.api.SCMEvent$Dispatcher.run(SCMEvent.java:505)
      	at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:750)
      
      	Number of locked synchronizers = 1
      	- java.util.concurrent.ThreadPoolExecutor$Worker@594bc15a
      

      However, the rate limit for the service account has not come close to 0. The minimum observed is 3000 out of 5000 remaining. This is observed on dashboards as well as when testing the connection from the jenkins UI.

      We are using public GitHub. The rate limiting strategy is set to "Throttle at/near rate limit". It used to be set to "Normalize API requests", but this exacerbated the problem. 

      Notably, the following is seen in the github branch source logs:

      2022-04-20 20:09:02.440+0000 [id=247607]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
      2022-04-20 20:09:02.512+0000 [id=247609]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
      2022-04-20 20:09:02.512+0000 [id=247608]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings. 

      The following is also seen every few seconds in the github branch source logs:

      2022-04-20 20:09:21.187+0000 [id=247232]        FINE    jenkins.scm.api.SCMSource#defaultListener: Connecting to https://api.github.com using REDACTED 

      Let me know if any other information would be helpful.

          [JENKINS-68321] SCMEvent threads waiting on rate limit when rate limit isn't close to being hit

          Glenn Duffy created issue -
          Glenn Duffy made changes -
          Description Original: Our organization is experiencing large delays when processing webhooks (minutes up to hours). When looking at a threadDump, the following is observed in all SCMEvent threads:
          {code:java}
          class
          org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl
           Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]"class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]" Id=2608 Group=main TIMED_WAITING
          at java.lang.Thread.sleep(Native Method)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.waitUntilRateLimit(ApiRateLimitChecker.java:325)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.checkRateLimit(ApiRateLimitChecker.java:261)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$RateLimitCheckerAdapter.checkRateLimit(ApiRateLimitChecker.java:242)
          at org.kohsuke.github.GitHubRateLimitChecker.checkRateLimit(GitHubRateLimitChecker.java:128)
          at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:383)
          at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:355)
          at org.kohsuke.github.Requester.fetch(Requester.java:76)
          at org.kohsuke.github.GHRepository.read(GHRepository.java:132)
          at org.kohsuke.github.GHPerson.getRepository(GHPerson.java:146)
          at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSource(GitHubSCMNavigator.java:1389)
          at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSources(GitHubSCMNavigator.java:926)
          at jenkins.scm.api.SCMNavigator.visitSources(SCMNavigator.java:221)
          at jenkins.branch.OrganizationFolder$SCMEventListenerImpl.onSCMHeadEvent(OrganizationFolder.java:1049)
          at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:246)
          at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:229)
          at jenkins.scm.api.SCMEvent$Dispatcher.run(SCMEvent.java:505)
          at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:750)

          Number of locked synchronizers = 1
          - java.util.concurrent.ThreadPoolExecutor$Worker@594bc15a
          {code}
          However, the rate limit for the service account has not come close to 0. The minimum observed is 3000 out of 5000 remaining. This is observed on dashboards as well as when testing the connection from the jenkins UI.

          We are using public GitHub. The rate limiting strategy is set to "Throttle at/near rate limit". It used to be set to "Normalize API requests", but this exacerbated the problem. 

          Notably, the following is seen in the github branch source logs:
          {code:java}
          2022-04-20 20:09:02.440+0000 [id=247607]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
          2022-04-20 20:09:02.512+0000 [id=247609]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
          2022-04-20 20:09:02.512+0000 [id=247608]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings. {code}
          The following is also seen every few seconds in the github branch source logs:
          {code:java}
          2022-04-20 20:09:21.187+0000 [id=247232]        FINE    jenkins.scm.api.SCMSource#defaultListener: Connecting to https://api.github.com using REDACTED {code}
          Let me know if any other information would be helpful.
          New: Our organization is experiencing large delays when jenkins processes webhooks (on the order of minutes up to hours). When looking at a threadDump, the following is observed in all SCMEvent threads:
          {code:java}
          class
          org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl
           Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]"class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Wed Apr 20 14:18:16 EDT 2022 / SCMEvent [#4]" Id=2608 Group=main TIMED_WAITING
          at java.lang.Thread.sleep(Native Method)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.waitUntilRateLimit(ApiRateLimitChecker.java:325)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.checkRateLimit(ApiRateLimitChecker.java:261)
          at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$RateLimitCheckerAdapter.checkRateLimit(ApiRateLimitChecker.java:242)
          at org.kohsuke.github.GitHubRateLimitChecker.checkRateLimit(GitHubRateLimitChecker.java:128)
          at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:383)
          at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:355)
          at org.kohsuke.github.Requester.fetch(Requester.java:76)
          at org.kohsuke.github.GHRepository.read(GHRepository.java:132)
          at org.kohsuke.github.GHPerson.getRepository(GHPerson.java:146)
          at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSource(GitHubSCMNavigator.java:1389)
          at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSources(GitHubSCMNavigator.java:926)
          at jenkins.scm.api.SCMNavigator.visitSources(SCMNavigator.java:221)
          at jenkins.branch.OrganizationFolder$SCMEventListenerImpl.onSCMHeadEvent(OrganizationFolder.java:1049)
          at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:246)
          at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:229)
          at jenkins.scm.api.SCMEvent$Dispatcher.run(SCMEvent.java:505)
          at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:750)

          Number of locked synchronizers = 1
          - java.util.concurrent.ThreadPoolExecutor$Worker@594bc15a
          {code}
          However, the rate limit for the service account has not come close to 0. The minimum observed is 3000 out of 5000 remaining. This is observed on dashboards as well as when testing the connection from the jenkins UI.

          We are using public GitHub. The rate limiting strategy is set to "Throttle at/near rate limit". It used to be set to "Normalize API requests", but this exacerbated the problem. 

          Notably, the following is seen in the github branch source logs:
          {code:java}
          2022-04-20 20:09:02.440+0000 [id=247607]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
          2022-04-20 20:09:02.512+0000 [id=247609]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings.
          2022-04-20 20:09:02.512+0000 [id=247608]        INFO    o.j.p.g.ApiRateLimitChecker$RateLimitCheckerAdapter#checkRateLimit: LocalChecker for rate limit was not set for this thread. Configured using system settings. {code}
          The following is also seen every few seconds in the github branch source logs:
          {code:java}
          2022-04-20 20:09:21.187+0000 [id=247232]        FINE    jenkins.scm.api.SCMSource#defaultListener: Connecting to https://api.github.com using REDACTED {code}
          Let me know if any other information would be helpful.

          Glenn Duffy added a comment -

          I haven't heard any response on this ticket and it was created almost a month ago. Can someone please comment on it?

          Glenn Duffy added a comment - I haven't heard any response on this ticket and it was created almost a month ago. Can someone please comment on it?

          Rob Hamilton added a comment -

          This is affecting us as well. All of our SCM threads get blocked and we're then reduced to polling for changes

          Rob Hamilton added a comment - This is affecting us as well. All of our SCM threads get blocked and we're then reduced to polling for changes

          Rob Hamilton added a comment -

          Just sharing our mitigation for this issue which is to have a job that interrupt threads in a TIMED_WAITING state

           

          cat << EOF > script.groovy
          println "Script started"
            Thread.allStackTraces.keySet().each() { 
              if (it.name.contains("SCMHeadEvent") ) {
                if (it.state.toString().equals("TIMED_WAITING")) { 
                  println "Interrupting thread \${it.id} \${it.name} \${it.state}"
                  it.interrupt() 
                }
              } 
            }
          println "Script finished"
          EOF
          curl -sSv ${JENKINS_URL}/scriptText -u ${USERNAME}:${JENKINS_API_KEY} --data-urlencode "script=$(cat script.groovy)" 

           

           

          Rob Hamilton added a comment - Just sharing our mitigation for this issue which is to have a job that interrupt threads in a TIMED_WAITING state   cat << EOF > script.groovy println "Script started"   Thread .allStackTraces.keySet().each() {      if (it.name.contains( "SCMHeadEvent" ) ) {       if (it.state.toString().equals( "TIMED_WAITING" )) {          println "Interrupting thread \${it.id} \${it.name} \${it.state}"         it.interrupt()        }     }    } println "Script finished" EOF curl -sSv ${JENKINS_URL}/scriptText -u ${USERNAME}:${JENKINS_API_KEY} --data-urlencode "script=$(cat script.groovy)"    

          Glenn Duffy added a comment -

          robhamilton Thanks for the info. That's a clever workaround. How often is this run? Manually when the issue is seen or periodically?

          Glenn Duffy added a comment - robhamilton Thanks for the info. That's a clever workaround. How often is this run? Manually when the issue is seen or periodically?

          Rob Hamilton added a comment -

          Our job runs every 5 minutes, and we're actually executing the curl part in for loop so it executes multiple times within that 5 minutes with a sleep (it's a very busy system)

          Rob Hamilton added a comment - Our job runs every 5 minutes, and we're actually executing the curl part in for loop so it executes multiple times within that 5 minutes with a sleep (it's a very busy system)

          Glenn Duffy added a comment -

          Thanks robhamilton 

          Glenn Duffy added a comment - Thanks robhamilton  

          Jesse Glick added a comment -

          I tried Never check rate limit but alas

          GitHub throttling is disabled, which is not allowed for public GitHub usage, so ThrottleOnOver will be used instead. To configure a different rate limiting strategy, go to "GitHub API usage" under "Configure System" in the Jenkins settings.

          I am not sure why this be “not allowed”. If you are using App authentication, you automatically get a high enough rate limit that it is likely pointless for Jenkins to even be paying attention. (I think the same is true for GitHub Enterprise.) Of course it would need to retry requests with a delay if the error is received that your rate limit has been exceeded, but I suspect this already happens. As far as I am aware the rate limiting feature dates to the old days of bot accounts with a “personal” access token and a stringent rate limit.

          Jesse Glick added a comment - I tried Never check rate limit but alas GitHub throttling is disabled, which is not allowed for public GitHub usage, so ThrottleOnOver will be used instead. To configure a different rate limiting strategy, go to "GitHub API usage" under "Configure System" in the Jenkins settings. I am not sure why this be “not allowed”. If you are using App authentication, you automatically get a high enough rate limit that it is likely pointless for Jenkins to even be paying attention. (I think the same is true for GitHub Enterprise.) Of course it would need to retry requests with a delay if the error is received that your rate limit has been exceeded, but I suspect this already happens. As far as I am aware the rate limiting feature dates to the old days of bot accounts with a “personal” access token and a stringent rate limit.

          Jesse Glick added a comment -

          FWIW in https://github.com/jenkinsci/github-branch-source-plugin/pull/313#discussion_r456647733 bitwiseman says

          There is literally never a reason to not check rate limits when interacting with github.com.

          Jesse Glick added a comment - FWIW in https://github.com/jenkinsci/github-branch-source-plugin/pull/313#discussion_r456647733 bitwiseman says There is literally never a reason to not check rate limits when interacting with github.com .

            Unassigned Unassigned
            gduffy Glenn Duffy
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: