Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-64016

Add repo filtering by graphql query using GitHub search

      Add support for providing a search query using the graph ql search endpoint

       

      example query:

      query findByGitHubSearch($queryString: String!) {
        search(query: $queryString, type: REPOSITORY, first: 100) {
          repositoryCount
          edges {
            node {
              ... on Repository {
                name
              }
            }
          }
        }
      }
      
      

      where query has:

      {"queryString": "user:my-user keyword" } 

      the user part of the query should be automatically added by the plugin

       

      This PR is related: https://github.com/jenkinsci/github-branch-source-plugin/pull/344

      but it would be good to have general purpose search as well not just topics

          [JENKINS-64016] Add repo filtering by graphql query using GitHub search

          Sam Gleske added a comment - - edited

          Typical environment

          A typical environment for multibranch pipelines is to scan pull requests, branches, and tags.  Some plugins attempt to provide optimizations to prevent a build storm but this plugin in particular suffers from using GitHub API v3 REST.

          I have hundreds of repositories with thousands of refs (open PRs, branches, and tags) each.  We used to hit API limits pretty regularly and had to switch to GitHub app auth (increasing limit from 5k req/hr to 15k req/hr).  Even now it is typical for us to be well above 10k daily while developers contribute to projects.

          Consider increasing priority?

          If I add jenkinsci/jenkins as a multibranch pipeline and it scans: it eats up over 4000 API requests using GitHub API v3 REST.

          I can get the same metadata using GitHub API v4 GraphQL in 17 API requests.

          Timing

          It takes hours for jenkinsci/jenkins scan to complete with GitHub API v3 REST.

          It is less than 10 seconds to pull the same metadata over GitHub API v4 GraphQL.

          GraphQL example

          https://github.com/samrocketman/jervis/issues/133#issuecomment-1614036278

          You can run this by cloning jervis and running "./gradlew console".  You can use personal access tokens but in the above example I'm using GitHub app auth.

          Example console output from that script:

          Discover PRs, Branches, and Tags on: 
          jenkinsci/jenkins
          Query count: 17
          Total pull requests: 65
          Total branches: 38
          Total tags: 1673
          First pull request: 
            name: 8210
            author: 
              login: mawinter69
            baseRef: 
              prefix: refs/heads/
              name: master
              target: 
                author: 
                  date: '2023-06-29T20:39:53+02:00'
                  email: mc.cache@web.de
                  name: Alexander Brandes
                  user: 
                    login: NotMyFault
                committer: 
                  date: '2023-06-29T20:39:53+02:00'
                  email: noreply@github.com
                  name: GitHub
                  user: null
                sha: 9e65c05201c4bd6e6a203556564b4b606735d931
            headRef: 
              prefix: refs/heads/
              name: legacy-token-revoke-button
              target: 
                author: 
                  date: '2023-06-29T21:08:44+02:00'
                  email: m.winter@sap.com
                  name: Markus Winter
                  user: 
                    login: mawinter69
                committer: 
                  date: '2023-06-29T21:08:44+02:00'
                  email: m.winter@sap.com
                  name: Markus Winter
                  user: 
                    login: mawinter69
                sha: bb2e40e08ec8706199efffc4b5772a5510abbbc6
          First branch: 
            prefix: refs/heads/
            name: JENKINS-69620
            target: 
              author: 
                date: '2023-04-17T21:55:24+02:00'
                email: 1831569+daniel-beck@users.noreply.github.com
                name: Daniel Beck
                user: 
                  login: daniel-beck
              committer: 
                date: '2023-04-17T21:55:24+02:00'
                email: noreply@github.com
                name: GitHub
                user: null
              sha: d536a43ec12dd8cc2130f7a7c39202d5bfbae138
          First tag: 
            prefix: refs/tags/
            name: '1.312'
            target: 
              author: 
                date: '2009-06-23T20:32:43Z'
                email: kohsuke@71c3de6d-444a-0410-be80-ed276b4c234a
                name: kohsuke
                user: null
              committer: 
                date: '2009-06-23T20:32:43Z'
                email: kohsuke@71c3de6d-444a-0410-be80-ed276b4c234a
                name: kohsuke
                user: null
              sha: b72322675eb0114363a9a86e9ad5a170d1d07ac0
          

          Sam Gleske added a comment - - edited Typical environment A typical environment for multibranch pipelines is to scan pull requests, branches, and tags.  Some plugins attempt to provide optimizations to prevent a build storm but this plugin in particular suffers from using GitHub API v3 REST. I have hundreds of repositories with thousands of refs (open PRs, branches, and tags) each.  We used to hit API limits pretty regularly and had to switch to GitHub app auth (increasing limit from 5k req/hr to 15k req/hr).  Even now it is typical for us to be well above 10k daily while developers contribute to projects. Consider increasing priority? If I add jenkinsci/jenkins as a multibranch pipeline and it scans: it eats up over 4000 API requests using GitHub API v3 REST. I can get the same metadata using GitHub API v4 GraphQL in 17 API requests . Timing It takes hours for jenkinsci/jenkins scan to complete with GitHub API v3 REST. It is less than 10 seconds to pull the same metadata over GitHub API v4 GraphQL. GraphQL example https://github.com/samrocketman/jervis/issues/133#issuecomment-1614036278 You can run this by cloning jervis and running "./gradlew console".  You can use personal access tokens but in the above example I'm using GitHub app auth. Example console output from that script: Discover PRs, Branches, and Tags on: jenkinsci/jenkins Query count: 17 Total pull requests: 65 Total branches: 38 Total tags: 1673 First pull request: name: 8210 author: login: mawinter 69 baseRef: prefix: refs/heads/ name: master target: author: date: ' 2023 - 06 - 29 T 20 : 39 : 53 + 02 : 00 ' email: mc.cache@web.de name: Alexander Brandes user: login: NotMyFault committer: date: ' 2023 - 06 - 29 T 20 : 39 : 53 + 02 : 00 ' email: noreply@github.com name: GitHub user: null sha: 9 e 65 c 05201 c 4 bd 6 e 6 a 203556564 b 4 b 606735 d 931 headRef: prefix: refs/heads/ name: legacy-token-revoke-button target: author: date: ' 2023 - 06 - 29 T 21 : 08 : 44 + 02 : 00 ' email: m.winter@sap.com name: Markus Winter user: login: mawinter 69 committer: date: ' 2023 - 06 - 29 T 21 : 08 : 44 + 02 : 00 ' email: m.winter@sap.com name: Markus Winter user: login: mawinter 69 sha: bb 2 e 40 e 08 ec 8706199 efffc 4 b 5772 a 5510 abbbc 6 First branch: prefix: refs/heads/ name: JENKINS- 69620 target: author: date: ' 2023 - 04 - 17 T 21 : 55 : 24 + 02 : 00 ' email: 1831569 +daniel-beck@users.noreply.github.com name: Daniel Beck user: login: daniel-beck committer: date: ' 2023 - 04 - 17 T 21 : 55 : 24 + 02 : 00 ' email: noreply@github.com name: GitHub user: null sha: d 536 a 43 ec 12 dd 8 cc 2130 f 7 a 7 c 39202 d 5 bfbae 138 First tag: prefix: refs/tags/ name: ' 1 . 312 ' target: author: date: ' 2009 - 06 - 23 T 20 : 32 : 43 Z' email: kohsuke@ 71 c 3 de 6 d- 444 a- 0410 -be 80 -ed 276 b 4 c 234 a name: kohsuke user: null committer: date: ' 2009 - 06 - 23 T 20 : 32 : 43 Z' email: kohsuke@ 71 c 3 de 6 d- 444 a- 0410 -be 80 -ed 276 b 4 c 234 a name: kohsuke user: null sha: b 72322675 eb 0114363 a 9 a 86 e 9 ad 5 a 170 d 1 d 07 ac 0

          Right, using Graph QL would have great benefits for sure. Linking https://github.com/hub4j/github-api/issues/521 here.

          Allan BURDAJEWICZ added a comment - Right, using Graph QL would have great benefits for sure. Linking https://github.com/hub4j/github-api/issues/521 here.

            Unassigned Unassigned
            timja Tim Jacomb
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: