Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65354

One mis-configured job can be prevent all webhook triggered jobs from triggering

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None

      We're seeing huge delays in processing events from GitHub (at least 2 hours from PR open to build scheduled).

      PushGHEventSubscriber is where this begins.

      SCMEvent defines a thread pool with a max size of 10.

      I've attached a set of thread dumps, but you can see in this excerpt that all 10 threads are sleeping waiting for rate limiting

      This does not seem correct as we are running with GitHubApps and approximately 15000 api calls are allowed.

      I've checked a few of our GitHub apps and all of them have over 12,000 calls remaining.

      cc bitwiseman carroll dnusbaum

      I believe this behaviour only started after we upgraded to https://github.com/jenkinsci/github-branch-source-plugin/releases/tag/github-branch-source-2.10.1.

      I'm going to try either downgrade or revert the specific changes tomorrow to get us past this.

       class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:50:33 UTC 2021 / SCMEvent [#7]sleeping , holding [ 0x00000004b8a01310 ]
       class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:51:10 UTC 2021 / SCMEvent [#6]sleeping , holding [ 0x00000004aa801628 ]
       class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:51:33 UTC 2021 / SCMEvent [#1]sleeping , holding [ 0x0000000489521128 ]
       class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:51:48 UTC 2021 / SCMEvent [#9]sleeping , holding [ 0x00000004aa2001d0 ]
       class org.jenkinsci.plugins.github_branch_source.PullRequestGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:52:38 UTC 2021 / SCMEvent [#5]sleeping , holding [ 0x00000004a79cbb88 ]
       class org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:50:49 UTC 2021 / SCMEvent [#3]sleeping , holding [ 0x0000000488800178 ]
       class org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:51:21 UTC 2021 / SCMEvent [#2]sleeping , holding [ 0x0000000494c00220 ]
       class org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:51:32 UTC 2021 / SCMEvent [#8]sleeping , holding [ 0x00000004b8a02290 ]
       class org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:52:15 UTC 2021 / SCMEvent [#10]sleeping , holding [ 0x0000000488a02170 ]
       class org.jenkinsci.plugins.github_branch_source.PushGHEventSubscriber$SCMHeadEventImpl Thu Apr 08 12:52:37 UTC 2021 / SCMEvent [#4]sleeping , holding [ 0x00000004a7961600 ]
      at java.lang.Thread.sleep(java.base@11.0.10/Native Method)
      at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.waitUntilRateLimit(ApiRateLimitChecker.java:323)
      at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$LocalChecker.checkRateLimit(ApiRateLimitChecker.java:259)
      at org.jenkinsci.plugins.github_branch_source.ApiRateLimitChecker$RateLimitCheckerAdapter.checkRateLimit(ApiRateLimitChecker.java:240)
      at org.kohsuke.github.GitHubRateLimitChecker.checkRateLimit(GitHubRateLimitChecker.java:126)
      at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:392)
      at org.kohsuke.github.GitHubClient.sendRequest(GitHubClient.java:358)
      at org.kohsuke.github.Requester.fetch(Requester.java:76)
      at org.kohsuke.github.GHRepository.read(GHRepository.java:119)
      at org.kohsuke.github.GHPerson.getRepository(GHPerson.java:156)
      at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSource(GitHubSCMNavigator.java:1313)
      at org.jenkinsci.plugins.github_branch_source.GitHubSCMNavigator.visitSources(GitHubSCMNavigator.java:915)
      at jenkins.scm.api.SCMNavigator.visitSources(SCMNavigator.java:221)
      at jenkins.branch.OrganizationFolder$SCMEventListenerImpl.onSCMHeadEvent(OrganizationFolder.java:1165)
      at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:246)
      at jenkins.scm.api.SCMHeadEvent$DispatcherImpl.fire(SCMHeadEvent.java:229)
      at jenkins.scm.api.SCMEvent$Dispatcher.run(SCMEvent.java:505)
      at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
      at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.10/Executors.java:515)
      at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)
      

          [JENKINS-65354] One mis-configured job can be prevent all webhook triggered jobs from triggering

          Tim Jacomb added a comment -

          Hmm

          So I think we found it by grepping the file system

          root@jenkins-0:/var/jenkins_home# grep -r 'Jenkins-Imposed API Limiter' *
          
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4PSO97WekQVZuYnDz1BFrZWMWwsfPVlx/uO423t6Y19ZAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3e4T/gCYTyJPjgAAAA==Jenkins-Imposed API Limiter: Current quota for Github API usage has 28 remaining (7 over budget). Next quota of 60 in 31 min. Sleeping for 13 min.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4Ai10OW0TR4LObcx40IUV9qfb1GF0lukpIHvmoUl8kRKAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3b7ubABkdb6ujgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 10 min remaining.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4BNYgc96YNWhnLy8zY2epxlWS6/5lf2oN3TpCMcdvpEcAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3UG/XgBPPvcOjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 7 min 29 sec remaining.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4P8zs5pUiqWOfRekOK4hdEFSWtC87Byh3d5P67wTqlXCAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3RHBXQAwGKKNjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 4 min 28 sec remaining.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4II5xGsdAyaFSsxti3YaEEJegoqLSSZFnFiVh0Q5NyK0AAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3bGrAgDilLMXjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 1 min 26 sec remaining.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4JhKay05MBQHlida1ObRtwyJRirr7FSv1AIwDVzuvpTIAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3QmdhwCWIekSjgAAAA==Jenkins-Imposed API Limiter: Current quota for Github API usage has 24 remaining (2 over budget). Next quota of 60 in 17 min. Sleeping for 6 min 6 sec.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4MsnpQAVhT/NGYTP9n2woZ/baKZ6AWNR29tpQo2W8HZcAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3amPxQDCvPo7jgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 3 min 3 sec remaining.
          jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha:////4O5RGkWAOb2QkwbhJCiEyxgst1ZJsuLFNRuIoHZX9g0nAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3dltRQATQ4r2jgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 2.3 sec remaining.
          

          One folder had broken configuration,

          Would be nice for this to not break everything , any thoughts?

          Tim Jacomb added a comment - Hmm So I think we found it by grepping the file system root@jenkins-0:/ var /jenkins_home# grep -r 'Jenkins-Imposed API Limiter' * jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4PSO97WekQVZuYnDz1BFrZWMWwsfPVlx/uO423t6Y19ZAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3e4T/gCYTyJPjgAAAA==Jenkins-Imposed API Limiter: Current quota for Github API usage has 28 remaining (7 over budget). Next quota of 60 in 31 min. Sleeping for 13 min. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4Ai10OW0TR4LObcx40IUV9qfb1GF0lukpIHvmoUl8kRKAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3b7ubABkdb6ujgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 10 min remaining. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4BNYgc96YNWhnLy8zY2epxlWS6/5lf2oN3TpCMcdvpEcAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3UG/XgBPPvcOjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 7 min 29 sec remaining. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4P8zs5pUiqWOfRekOK4hdEFSWtC87Byh3d5P67wTqlXCAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3RHBXQAwGKKNjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 4 min 28 sec remaining. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4II5xGsdAyaFSsxti3YaEEJegoqLSSZFnFiVh0Q5NyK0AAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3bGrAgDilLMXjgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 1 min 26 sec remaining. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4JhKay05MBQHlida1ObRtwyJRirr7FSv1AIwDVzuvpTIAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3QmdhwCWIekSjgAAAA==Jenkins-Imposed API Limiter: Current quota for Github API usage has 24 remaining (2 over budget). Next quota of 60 in 17 min. Sleeping for 6 min 6 sec. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4MsnpQAVhT/NGYTP9n2woZ/baKZ6AWNR29tpQo2W8HZcAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3amPxQDCvPo7jgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 3 min 3 sec remaining. jobs/HMCTS_Nightly_ECM/jobs/ethos-repl-docmosis-service/indexing/events.log:ha: ////4O5RGkWAOb2QkwbhJCiEyxgst1ZJsuLFNRuIoHZX9g0nAAAAhB+LCAAAAAAAAP9b85aBtbiIwSa/KF0vKzUvOzOvODlTryCnNB3I0kvPLMkoTYpPKkrMS86IL84vLUpO1XPPLPEoTXLOzyvOz0n1yy9JZYAARiYGRi8GzpLM3NTiksTcgooiBqmM0pTi/Dy9ZIhiPayaGCoKgHRF3dltRQATQ4r2jgAAAA==Jenkins-Imposed API Limiter: Still sleeping, now only 2.3 sec remaining. One folder had broken configuration, Would be nice for this to not break everything , any thoughts?

          Tim Jacomb added a comment - - edited

          I've renamed the issue to reflect the cause and downgraded the priority.

          Not sure if there's a fix here but this sort of thing should be more obvious somehow I think, and ideally not break in as bad a way as it did.

          Tim Jacomb added a comment - - edited I've renamed the issue to reflect the cause and downgraded the priority. Not sure if there's a fix here but this sort of thing should be more obvious somehow I think, and ideally not break in as bad a way as it did.

          Brett added a comment - - edited

          timja Your issue here helped with mine although not quite the identical issue you report.  Writing below in case what we did helps, and also following the issue in case a future plugin has a fix for this.

          Our symptoms

          • Last week we upgraded github-branch-source plugin, from 2.9.0, to 2.9.7, along with Jenkins application from 2.235.5 to 2.263.4.  Many other plugins upgraded with that upgrade as well.
          • Ever since, all of our multi-branch pipeline scans stopped working properly.  Not a single one would complete a scan anymore.  Some simply get stuck on branch indexing and spin to api.github.com.  Some scans would slowly process, however at a rate of scanning a single branch every 2 hours which is just unusable at that point.
          • All other Jenkins calls to api.github.com / github.com seem to be working, just multi-branch pipeline scans all stall suddenly.
          • Like your grep, I did see some references our GitHub API usage quota got near a max, however ours settled down and the issue continued with no references of getting close to the limit.  So ruled that out as a cause for our issue.
          • The lack of a clear error in the logs made this hard to troubleshoot.
          • Reboot of the Jenkins service and host VM had no effect.
          • And not sure about your comment of broken configuration what it refers to.  In our case, no changes to job/folder configuration occurred, and the fix for us outlined below worked without making other changes to our jobs/instance.

           

          Our fix

          • Few days ago we rolled back github-branch-source to 2.9.0 (keeping Jenkins at 2.263.4) and it's working as before.  No other plugins rolled back, just this single one.  Scanning multi-branch pipelines take seconds instead of hours and don't get stalled anymore.

           

          Possible Cause?

          • Sounds like many users worried or found excessive calls to GitHub.  Not sure which version(s) of the plugin that code change is in, however seeing that it was a driving factor in rolling back that single plugin for us.  And in our case it worked as of 2.9.0.

          Brett added a comment - - edited timja Your issue here helped with mine although not quite the identical issue you report.  Writing below in case what we did helps, and also following the issue in case a future plugin has a fix for this. Our symptoms Last week we upgraded github-branch-source plugin, from 2.9.0, to 2.9.7, along with Jenkins application from 2.235.5 to 2.263.4.  Many other plugins upgraded with that upgrade as well. Ever since, all of our multi-branch pipeline scans stopped working properly.  Not a single one would complete a scan anymore.  Some simply get stuck on branch indexing and spin to api.github.com.  Some scans would slowly process, however at a rate of scanning a single branch every 2 hours which is just unusable at that point. All other Jenkins calls to api.github.com / github.com seem to be working, just multi-branch pipeline scans all stall suddenly. Like your grep, I did see some references our GitHub API usage quota got near a max, however ours settled down and the issue continued with no references of getting close to the limit.  So ruled that out as a cause for our issue. The lack of a clear error in the logs made this hard to troubleshoot. Reboot of the Jenkins service and host VM had no effect. And not sure about your comment of broken configuration what it refers to.  In our case, no changes to job/folder configuration occurred, and the fix for us outlined below worked without making other changes to our jobs/instance.   Our fix Few days ago we rolled back github-branch-source to 2.9.0 (keeping Jenkins at 2.263.4) and it's working as before.  No other plugins rolled back, just this single one.  Scanning multi-branch pipelines take seconds instead of hours and don't get stalled anymore.   Possible Cause? So what I found as a possible cause, is this code addition for using a RateLimitChecker, from 'github-branch-source' which you have pointed out there too I believe. https://github.com/jenkinsci/github-branch-source-plugin/pull/384 Sounds like many users worried or found excessive calls to GitHub.  Not sure which version(s) of the plugin that code change is in, however seeing that it was a driving factor in rolling back that single plugin for us.  And in our case it worked as of 2.9.0.

          Tim Jacomb added a comment -

          It would be worth taking a thread dump to try see what is going on, there’s some unreleased fixes on master which might help.

          Tim Jacomb added a comment - It would be worth taking a thread dump to try see what is going on, there’s some unreleased fixes on master which might help.

          If the index log shows Next quota of 60 in 31 min, doesn't it suggest that this is an unauthenticated request ? A quota of 60 is the default quota for unauthenticated users according to https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting. That is odd.

          Allan BURDAJEWICZ added a comment - If the index log shows Next quota of 60 in 31 min , doesn't it suggest that this is an unauthenticated request ? A quota of 60 is the default quota for unauthenticated users according to https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting . That is odd.

            Unassigned Unassigned
            timja Tim Jacomb
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: