Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54495

Better handling of GitHub Organization folder scan to avoid API quota

      When having a big GitHub organization, with hundreds of repos, each with hundreds of branches and tags, refreshing the whole organization is not possible (or it takes ages) due to GitHub API quota being hit.

      This is particularly bad when trying to add a new repo, it could take days, which is completely impractical.

      There are several solutions to this issue that I can think of:

      • Use GitHub GraphQL API to query the whole thing in one (or very few) request(s)
      • Make a "shallow scan", that only discovers repos. Then each repo can be refreshed separately, which can 1. enable the quick addition of new repos and 2. distribute the refresh API bursts in time making hitting the API quota less likely
      • Add a separate function to only discover one repo specified by the user

          [JENKINS-54495] Better handling of GitHub Organization folder scan to avoid API quota

          Liam Newman added a comment -

          Any of these seems like interesting options.

          Targeted scan - This would involve some working with Jelly and Jenkins UI, but it might be easier to implement due to the narrow target.
          Shallow scan - At very least, doing a breadth first scan that then requested scans from the child repo's over time.
          The GraphQL option would be a massive undertaking. But maybe if you switched just to top level repo scan or some other targeted scenario.

          They're all viable in different ways. Perhaps you could file a separate issue for each one and then work on them separately?

          Liam Newman added a comment - Any of these seems like interesting options. Targeted scan - This would involve some working with Jelly and Jenkins UI, but it might be easier to implement due to the narrow target. Shallow scan - At very least, doing a breadth first scan that then requested scans from the child repo's over time. The GraphQL option would be a massive undertaking. But maybe if you switched just to top level repo scan or some other targeted scenario. They're all viable in different ways. Perhaps you could file a separate issue for each one and then work on them separately?

          Eugene G added a comment -

          This is very important. If an organization has tons of old repos and more than 50 repos with Jenkinsfiles it becomes crazy to maintain the list of repos in this small filter field.

          Eugene G added a comment - This is very important. If an organization has tons of old repos and more than 50 repos with Jenkinsfiles it becomes crazy to maintain the list of repos in this small filter field.

          Sam Gleske added a comment - - edited

          I think GitHub API v4 GraphQL would be a better solution.  For example, if you add jenkinsci/jenkins currently you'll get over 4000 requests if you're scanning for branches, tags, and PRs.

          With GraphQL that number is reduced to 17.

          I posted details in JENKINS-64016

          It also would be trivial to build a super template which compounds the request for every repository all at once.

          I'm able to query an org with over 2000 repositories with in around 100 GraphQL requests.  If you need examples I can give them or I may just roll my sleeves and add a new GraphQL based GitHub branch source using client other than the kohsuke GH client.

          I'm tempted to write a GraphQL query for the jenkinsci org which has 2.6k repositories.  That might help push this in that direction.

          Sam Gleske added a comment - - edited I think GitHub API v4 GraphQL would be a better solution.  For example, if you add jenkinsci/jenkins currently you'll get over 4000 requests if you're scanning for branches, tags, and PRs. With GraphQL that number is reduced to 17. I posted details in JENKINS-64016 It also would be trivial to build a super template which compounds the request for every repository all at once. I'm able to query an org with over 2000 repositories with in around 100 GraphQL requests.  If you need examples I can give them or I may just roll my sleeves and add a new GraphQL based GitHub branch source using client other than the kohsuke GH client. I'm tempted to write a GraphQL query for the jenkinsci org which has 2.6k repositories.  That might help push this in that direction.

            Unassigned Unassigned
            lucasocio Leandro Lucarella
            Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: