Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-2918

Setup ci.jenkins.io Kubernetes agents along with ACI containers

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

      • Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
      • Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

      This changes leads to the depreciation of the attribute useAci for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220).

      If you land on this page because of a deprecation message: please edit the file Jenkinsfile at the root of your repository and replace the atrribute useAci by useContainerAgent.

        Attachments

          Issue Links

            Activity

            Hide
            dduportal Damien Duportal added a comment -
            Show
            dduportal Damien Duportal added a comment - First PR to add ruby template: https://github.com/jenkins-infra/jenkins-infra/pull/1820
            Hide
            dduportal Damien Duportal added a comment -

            Today a lot of PRs:

            Show
            dduportal Damien Duportal added a comment - Today a lot of PRs: Adding maven and maven-11 templates: https://github.com/jenkins-infra/jenkins-infra/pull/1821 Fix Kubernetes memory definition: https://github.com/jenkins-infra/jenkins-infra/pull/1822 Add instanceCap for Kubernetes agents: https://github.com/jenkins-infra/jenkins-infra/pull/1823 Adding node and alpine templates + synchronize agent setups for both ACI and Kubernetes: https://github.com/jenkins-infra/jenkins-infra/pull/1824
            Hide
            dduportal Damien Duportal added a comment - - edited

            First tries and we did the following changes:

            • ACI agent were disabled because too unstable: https://github.com/jenkins-infra/jenkins-infra/pull/1827
            • Due to DockerHub API rate limiting, we quickly hit issues. A temporary measure had been done by adding a docker pullImage secret:
              • https://github.com/jenkins-infra/jenkins-infra/pull/1825
              • It uses a custom DockerHub account `cijenkinsiok8s` to ensures there is "write" authorization on the images on `jenkins`, `jenkinsci` neither `jenkinsciinfra` (as we do not have a Docker Enterprise Account, we cannot create read-only tokens). [EDIT: "NO write" - forgot the no on the original message)]
            Show
            dduportal Damien Duportal added a comment - - edited First tries and we did the following changes: ACI agent were disabled because too unstable: https://github.com/jenkins-infra/jenkins-infra/pull/1827 Due to DockerHub API rate limiting, we quickly hit issues. A temporary measure had been done by adding a docker pullImage secret: https://github.com/jenkins-infra/jenkins-infra/pull/1825 It uses a custom DockerHub account `cijenkinsiok8s` to ensures there is "write" authorization on the images on `jenkins`, `jenkinsci` neither `jenkinsciinfra` (as we do not have a Docker Enterprise Account, we cannot create read-only tokens). [EDIT: "NO write" - forgot the no on the original message)]
            Hide
            dduportal Damien Duportal added a comment - - edited

            After the first tries on the BOM, the Kubernetes agent are wokring pretty well, but a single build of the BOM went from 1h with ACI (when it was working...) to ~2h30.

            We enabled the EKS autoscaling capability to ensure that more resources are added on the go, allowing the BOM build to succeed under ~1h20 and to demonstrate that it still work (to validate the fact that we can add more resources and more clusters in the future).

            Show
            dduportal Damien Duportal added a comment - - edited After the first tries on the BOM, the Kubernetes agent are wokring pretty well, but a single build of the BOM went from 1h with ACI (when it was working...) to ~2h30. We enabled the EKS autoscaling capability to ensure that more resources are added on the go, allowing the BOM build to succeed under ~1h20 and to demonstrate that it still work (to validate the fact that we can add more resources and more clusters in the future). Created the AWS Cloud resources to allow auto-scaling through service account: https://github.com/jenkins-infra/aws/pull/25 Install the cluster-autoscaler helm chart + set it up for EKS: https://github.com/jenkins-infra/charts/pull/1389 Increased the pool size of the autoscaling VM group used within the EKS cluster from 15 to 50 VMs: https://github.com/jenkins-infra/aws/pull/26 New sizing for ci.jenkins.io pods considering the current node size + autoscaling: https://github.com/jenkins-infra/jenkins-infra/pull/1828
            Hide
            dduportal Damien Duportal added a comment -

            As caught by Tim Jacomb in https://github.com/jenkins-infra/jenkins-infra/pull/1829, the Jenkins Core PR hit timeout due to the switch to Kubernetes agents. Sounds like 2 vCPUs were not enough for these.

            The PR https://github.com/jenkins-infra/jenkins-infra/pull/1829 increases the pod size to 4 vCPUs / 8 Gb for ALL agents, to decrease this.

            Next step: improve autoscaling by adding beefier machines, and using spot instances

            Show
            dduportal Damien Duportal added a comment - As caught by Tim Jacomb in https://github.com/jenkins-infra/jenkins-infra/pull/1829 , the Jenkins Core PR hit timeout due to the switch to Kubernetes agents. Sounds like 2 vCPUs were not enough for these. The PR https://github.com/jenkins-infra/jenkins-infra/pull/1829 increases the pod size to 4 vCPUs / 8 Gb for ALL agents, to decrease this. Next step: improve autoscaling by adding beefier machines, and using spot instances
            Hide
            dduportal Damien Duportal added a comment -

            Tim Jacomb I consider this issue fixed: if it is ok for you, any further improvement will be the subject of new issues (such as improving costs, performances, optimizing autoscaling, etc). If it does not sounds good to you, please feel free to reopen the issue with a message.

            Show
            dduportal Damien Duportal added a comment - Tim Jacomb I consider this issue fixed: if it is ok for you, any further improvement will be the subject of new issues (such as improving costs, performances, optimizing autoscaling, etc). If it does not sounds good to you, please feel free to reopen the issue with a message.
            Hide
            timja Tim Jacomb added a comment -

            It would be good to announce it to dev mailing list as a final part I think.

            Most people will think we're running on ACI still otherwise,

            and I think a couple of docs fixes as well in jenkins-infra/pipeline-library and jenkins-infra/documentation?

            Show
            timja Tim Jacomb added a comment - It would be good to announce it to dev mailing list as a final part I think. Most people will think we're running on ACI still otherwise, and I think a couple of docs fixes as well in jenkins-infra/pipeline-library and jenkins-infra/documentation?
            Hide
            dduportal Damien Duportal added a comment -

            Good point, I was too fast closing the issue.

            I take these action points for this week.

            Show
            dduportal Damien Duportal added a comment - Good point, I was too fast closing the issue. I take these action points for this week.
            Hide
            halkeye Gavin Mogan added a comment -

            testing cause Damien Duportal told me to

            Show
            halkeye Gavin Mogan added a comment - testing cause Damien Duportal told me to
            Hide
            dduportal Damien Duportal added a comment -
            Show
            dduportal Damien Duportal added a comment - Change announced (with a typo): https://groups.google.com/g/jenkinsci-dev/c/6NLoigg7Qbo jenkins-infra/documentation update: https://github.com/jenkins-infra/documentation/pull/16 jenkins-infra/runbooks update: https://github.com/jenkins-infra/runbooks/pull/36 Pipeline library update: https://github.com/jenkins-infra/pipeline-library/pull/220
            Hide
            timja Tim Jacomb added a comment -

            no emoji in comment

            Show
            timja Tim Jacomb added a comment - no emoji in comment
            Hide
            ponch F Ponciroli added a comment -

            I have closed it by mistake. Sorry about that :/ 

            Show
            ponch F Ponciroli added a comment - I have closed it by mistake. Sorry about that :/ 

              People

              Assignee:
              dduportal Damien Duportal
              Reporter:
              dduportal Damien Duportal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: