Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-65794

Redesign discussion to increase plugin performance


      Hi everyone. After using the Jenkins Gitea plugin for years, I thought I'd share my experiences with you hoping to create a discussion about the future design of the plugin.
      While reading this, please keep in mind that I do not want to blame anyone.


      Why have I opened this issue?

      In my environment there are some larger repositories (at least they are large in my opinion ). They have about 400+ branches and ~800 tags each. Not all of these branches/tags are taken care of by the plugin based on a configured naming pattern. Although they are not built by the plugin, they are part of the data retrieval process which brings me to the next question.


      What have I experienced so far?

      Due to an open issue regarding API pagination, not all desired branches are fetched (as I am writing this in June 2021). There is a open PR that will fix this. I am using these modifications for a while now in my environment. Unfortunately, looping over all those branches takes a while due to API response time.

      When recieving a webhook event, the plugin looks up the changes and the current HEADs of the branches/PRs. In my particular environment there are multiple Gitea organization folders in Jenkins pointing to the same organization (and for some parts even the same repositories) in Gitea itself. One is for build-on-push of branches/PRs (lets call this A), another is for building tags on demand (B), others are for deployments (C). Maybe this is not an ideal setup but it faced me with a pretty nasty behavior that I want to describe in the following:
      Since all three organizations are identical entities in Gitea, they share the same webhook. On a push webhook event for A all the other organizations matching the same repositories (B and C) will be triggered as well to scanfor changes and validate if they should trigger jobs. In small repositories this might be a nobrainer. For larger repositories this leads to a massive API request flood for fetching all branches/tags/PRs for a repository, even if they do not trigger any jobs. In my environment, those API requests take ~2s each for branches (means 8x 2s for 400 branches based on 50 items per paged result) and ~30s for fetching all tags. This happens on every push that is triggering the webhook.
      To be clear, this behavior happens with the changes based on the mentioned PR. The current alternative would be not building all desired branches since only the first page of the branches would be used (right now). So it is not actually the current behavior but will be, if the plugin uses the current endpoints.


      What is my proposed redesign?

      As far as I could go with my code investigation and analysis, I think there are other API endpoints that get the same results with less API calls. I am not sure how long they exist.

      The current use of /api/v1/repos/{org}/{repo}/branches and /api/v1/repos/{org}/{repo}/tags for loading existing branches and tags seems not ideal since most parts of the response won't be used at all.
      Alternatively, using _/repos​/{owner}​/{repo}​/git​/refs​/{ref}_ (with {ref}=heads or {ref}=tags), all existing branches/tags could be loaded in one request. It contains details for the HEAD commit for further data loading (if necessary). In my environment this request takes ~100ms for >400 branches and ~120ms for >800 tags.

      With this endpoint change, the plugin could first sort out all branches/tags not matching the given pattern (if any pattern is configured). For the remaining branch/tag list: load the missing details by using the provided HEAD commit details. I think the last part already happens, but for too many items based on the configuration.

      I am not sure, why the plugin loads properties for annotated tags during indexing or webhook processing. Maybe this could be made optionally?

      One last possible change would be having different webhook endpoints for each organization folder in Jenkins. Right now, all events are registered on https://<your-jenkins>/gitea-webhook/post which causes collisions with multiple Jenkins-side organizations refering to the same Gitea-side organization. If my observations are correct, webhooks will be recreated on each indexing process. As a result of this, the same webhook will be recreated by multiple organizations which (potentially) affects the actual webhook sending itself. At the point of a push event the webhook might not be created due to an indexing process by another Jenkins-side organization causing the plugin to miss an event and not trigger a job. 


      Happy for all your feedback on this, especially from stephenconnolly as plugin maintainer.


      PS: If it's happening, I am willing to invest time to participate in that sort of redesign.

            Unassigned Unassigned
            justusbunsi Steven
            1 Vote for this issue
            1 Start watching this issue