[JENKINS-64016] Add repo filtering by graphql query using GitHub search

Type: New Feature
Resolution: Unresolved
Priority: Minor
Component/s: github-branch-source-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

Add support for providing a search query using the graph ql search endpoint

example query:

query findByGitHubSearch($queryString: String!) {
  search(query: $queryString, type: REPOSITORY, first: 100) {
    repositoryCount
    edges {
      node {
        ... on Repository {
          name
        }
      }
    }
  }
}

where query has:

{"queryString": "user:my-user keyword" }

the user part of the query should be automatically added by the plugin

This PR is related: https://github.com/jenkinsci/github-branch-source-plugin/pull/344

but it would be good to have general purpose search as well not just topics

links to

github-api #521

Sam Gleske added a comment - 2023-06-30 02:49 - edited

Typical environment

A typical environment for multibranch pipelines is to scan pull requests, branches, and tags. Some plugins attempt to provide optimizations to prevent a build storm but this plugin in particular suffers from using GitHub API v3 REST.

I have hundreds of repositories with thousands of refs (open PRs, branches, and tags) each. We used to hit API limits pretty regularly and had to switch to GitHub app auth (increasing limit from 5k req/hr to 15k req/hr). Even now it is typical for us to be well above 10k daily while developers contribute to projects.

Consider increasing priority?

If I add jenkinsci/jenkins as a multibranch pipeline and it scans: it eats up over 4000 API requests using GitHub API v3 REST.

I can get the same metadata using GitHub API v4 GraphQL in 17 API requests.

Timing

It takes hours for jenkinsci/jenkins scan to complete with GitHub API v3 REST.

It is less than 10 seconds to pull the same metadata over GitHub API v4 GraphQL.

GraphQL example

https://github.com/samrocketman/jervis/issues/133#issuecomment-1614036278

You can run this by cloning jervis and running "./gradlew console". You can use personal access tokens but in the above example I'm using GitHub app auth.

Example console output from that script:

Discover PRs, Branches, and Tags on: 
jenkinsci/jenkins
Query count: 17
Total pull requests: 65
Total branches: 38
Total tags: 1673
First pull request: 
  name: 8210
  author: 
    login: mawinter69
  baseRef: 
    prefix: refs/heads/
    name: master
    target: 
      author: 
        date: '2023-06-29T20:39:53+02:00'
        email: mc.cache@web.de
        name: Alexander Brandes
        user: 
          login: NotMyFault
      committer: 
        date: '2023-06-29T20:39:53+02:00'
        email: noreply@github.com
        name: GitHub
        user: null
      sha: 9e65c05201c4bd6e6a203556564b4b606735d931
  headRef: 
    prefix: refs/heads/
    name: legacy-token-revoke-button
    target: 
      author: 
        date: '2023-06-29T21:08:44+02:00'
        email: m.winter@sap.com
        name: Markus Winter
        user: 
          login: mawinter69
      committer: 
        date: '2023-06-29T21:08:44+02:00'
        email: m.winter@sap.com
        name: Markus Winter
        user: 
          login: mawinter69
      sha: bb2e40e08ec8706199efffc4b5772a5510abbbc6
First branch: 
  prefix: refs/heads/
  name: JENKINS-69620
  target: 
    author: 
      date: '2023-04-17T21:55:24+02:00'
      email: 1831569+daniel-beck@users.noreply.github.com
      name: Daniel Beck
      user: 
        login: daniel-beck
    committer: 
      date: '2023-04-17T21:55:24+02:00'
      email: noreply@github.com
      name: GitHub
      user: null
    sha: d536a43ec12dd8cc2130f7a7c39202d5bfbae138
First tag: 
  prefix: refs/tags/
  name: '1.312'
  target: 
    author: 
      date: '2009-06-23T20:32:43Z'
      email: kohsuke@71c3de6d-444a-0410-be80-ed276b4c234a
      name: kohsuke
      user: null
    committer: 
      date: '2009-06-23T20:32:43Z'
      email: kohsuke@71c3de6d-444a-0410-be80-ed276b4c234a
      name: kohsuke
      user: null
    sha: b72322675eb0114363a9a86e9ad5a170d1d07ac0

Sam Gleske added a comment - 2023-06-30 02:49 - edited Typical environment A typical environment for multibranch pipelines is to scan pull requests, branches, and tags. Some plugins attempt to provide optimizations to prevent a build storm but this plugin in particular suffers from using GitHub API v3 REST. I have hundreds of repositories with thousands of refs (open PRs, branches, and tags) each. We used to hit API limits pretty regularly and had to switch to GitHub app auth (increasing limit from 5k req/hr to 15k req/hr). Even now it is typical for us to be well above 10k daily while developers contribute to projects. Consider increasing priority? If I add jenkinsci/jenkins as a multibranch pipeline and it scans: it eats up over 4000 API requests using GitHub API v3 REST. I can get the same metadata using GitHub API v4 GraphQL in 17 API requests . Timing It takes hours for jenkinsci/jenkins scan to complete with GitHub API v3 REST. It is less than 10 seconds to pull the same metadata over GitHub API v4 GraphQL. GraphQL example https://github.com/samrocketman/jervis/issues/133#issuecomment-1614036278 You can run this by cloning jervis and running "./gradlew console". You can use personal access tokens but in the above example I'm using GitHub app auth. Example console output from that script: Discover PRs, Branches, and Tags on: jenkinsci/jenkins Query count: 17 Total pull requests: 65 Total branches: 38 Total tags: 1673 First pull request: name: 8210 author: login: mawinter 69 baseRef: prefix: refs/heads/ name: master target: author: date: ' 2023 - 06 - 29 T 20 : 39 : 53 + 02 : 00 ' email: mc.cache@web.de name: Alexander Brandes user: login: NotMyFault committer: date: ' 2023 - 06 - 29 T 20 : 39 : 53 + 02 : 00 ' email: noreply@github.com name: GitHub user: null sha: 9 e 65 c 05201 c 4 bd 6 e 6 a 203556564 b 4 b 606735 d 931 headRef: prefix: refs/heads/ name: legacy-token-revoke-button target: author: date: ' 2023 - 06 - 29 T 21 : 08 : 44 + 02 : 00 ' email: m.winter@sap.com name: Markus Winter user: login: mawinter 69 committer: date: ' 2023 - 06 - 29 T 21 : 08 : 44 + 02 : 00 ' email: m.winter@sap.com name: Markus Winter user: login: mawinter 69 sha: bb 2 e 40 e 08 ec 8706199 efffc 4 b 5772 a 5510 abbbc 6 First branch: prefix: refs/heads/ name: JENKINS- 69620 target: author: date: ' 2023 - 04 - 17 T 21 : 55 : 24 + 02 : 00 ' email: 1831569 +daniel-beck@users.noreply.github.com name: Daniel Beck user: login: daniel-beck committer: date: ' 2023 - 04 - 17 T 21 : 55 : 24 + 02 : 00 ' email: noreply@github.com name: GitHub user: null sha: d 536 a 43 ec 12 dd 8 cc 2130 f 7 a 7 c 39202 d 5 bfbae 138 First tag: prefix: refs/tags/ name: ' 1 . 312 ' target: author: date: ' 2009 - 06 - 23 T 20 : 32 : 43 Z' email: kohsuke@ 71 c 3 de 6 d- 444 a- 0410 -be 80 -ed 276 b 4 c 234 a name: kohsuke user: null committer: date: ' 2009 - 06 - 23 T 20 : 32 : 43 Z' email: kohsuke@ 71 c 3 de 6 d- 444 a- 0410 -be 80 -ed 276 b 4 c 234 a name: kohsuke user: null sha: b 72322675 eb 0114363 a 9 a 86 e 9 ad 5 a 170 d 1 d 07 ac 0

Allan BURDAJEWICZ added a comment - 2023-07-04 11:01

Right, using Graph QL would have great benefits for sure. Linking https://github.com/hub4j/github-api/issues/521 here.

Allan BURDAJEWICZ added a comment - 2023-07-04 11:01 Right, using Graph QL would have great benefits for sure. Linking https://github.com/hub4j/github-api/issues/521 here.

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Sam Gleske added a comment - 2023-06-30 02:49, Edited by Sam Gleske - 2023-06-30 03:14

Typical environment

Consider increasing priority?

Timing

GraphQL example

Expand comment: Sam Gleske added a comment - 2023-06-30 02:49, Edited by Sam Gleske - 2023-06-30 03:14

Collapse comment: Allan BURDAJEWICZ added a comment - 2023-07-04 11:01

Expand comment: Allan BURDAJEWICZ added a comment - 2023-07-04 11:01

People

Dates