Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-71749

Scan organisation incorrectly removes folders due to unstable GitHub search repository results



      • GH organization with ~600 repositories
      • Org Folder job for this organization
      • 10+ minutes to fully scan the org



      We observe that some folders (repositories) were removed after Org scan, even if they meet the criteria. The missing repositories seemed to be random, in the next scan different repositories were missing.


      Root cause:

      We investigated the problem by adding logging to the plugin and deploying it to our Jenkins and we found out that the GH search for repositories returns not correct list of repos in our org - some were missing (and removed from Org folder) and some were duplicated. The problem is that the iterator returned from search is lazy and it's evaluated page by page when the loop is executed for all repositories found.

      And here is the real root cause: GH search results are not stable when paging takes place (it is our case - max page size is 100, we have ~600 repos, so we need 6 pages to fetch). There is no explicit sorting applied for the repository search in the plugin and in this case, the "best-match" sorting is applied (according to GH doc) which seems to be unstable.



      We applied a quick fix in a https://github.com/jenkinsci/github-branch-source-plugin/blob/master/src/main/java/org/jenkinsci/plugins/github_branch_source/GitHubSCMNavigator.java#L1212 method by adding explicit stable sorting when searching for repositories:


      We also think that evaluating a search result eagerly should improve the stability of the results, it can be done by returning the already evaluated List , not lazy Iterator:

      return ghRepositorySearchBuilder.list().withPageSize(100).asList(); 


            Unassigned Unassigned
            vanta Krzysztof Wolny
            1 Vote for this issue
            2 Start watching this issue