Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-2918

Setup ci.jenkins.io Kubernetes agents along with ACI containers

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

      • Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
      • Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

      This changes leads to the depreciation of the attribute useAci for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220).

      If you land on this page because of a deprecation message: please edit the file Jenkinsfile at the root of your repository and replace the atrribute useAci by useContainerAgent.

        Attachments

          Issue Links

            Activity

            dduportal Damien Duportal created issue -
            dduportal Damien Duportal made changes -
            Field Original Value New Value
            Summary Migrate ci.jenkins.io back to Kubernetes Setup ci.jenkins.io Kubernetes agents along with ACI containers
            dduportal Damien Duportal made changes -
            Description Following up https://issues.jenkins.io/browse/INFRA-2917 , the goal would be to migrate ci.jenkins.io inside a full Kubernetes's controller

            * (x) Define Kubernetes cluster where to run
            * (x) Define the sizing and container settings (limits, JVM opts, disk storage, etc.)
            * (x) Add the initial helmfile into jenkins-infra/charts (preliminary try by [~timjacomb] in https://github.com/jenkins-infra/charts/pull/174)
            * (x) SHIP IT
            The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue
            dduportal Damien Duportal made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            dduportal Damien Duportal made changes -
            Rank Ranked higher
            Hide
            dduportal Damien Duportal added a comment -
            Show
            dduportal Damien Duportal added a comment - First PR to add ruby template: https://github.com/jenkins-infra/jenkins-infra/pull/1820
            Hide
            dduportal Damien Duportal added a comment -

            Today a lot of PRs:

            Show
            dduportal Damien Duportal added a comment - Today a lot of PRs: Adding maven and maven-11 templates: https://github.com/jenkins-infra/jenkins-infra/pull/1821 Fix Kubernetes memory definition: https://github.com/jenkins-infra/jenkins-infra/pull/1822 Add instanceCap for Kubernetes agents: https://github.com/jenkins-infra/jenkins-infra/pull/1823 Adding node and alpine templates + synchronize agent setups for both ACI and Kubernetes: https://github.com/jenkins-infra/jenkins-infra/pull/1824
            Hide
            dduportal Damien Duportal added a comment - - edited

            First tries and we did the following changes:

            • ACI agent were disabled because too unstable: https://github.com/jenkins-infra/jenkins-infra/pull/1827
            • Due to DockerHub API rate limiting, we quickly hit issues. A temporary measure had been done by adding a docker pullImage secret:
              • https://github.com/jenkins-infra/jenkins-infra/pull/1825
              • It uses a custom DockerHub account `cijenkinsiok8s` to ensures there is "write" authorization on the images on `jenkins`, `jenkinsci` neither `jenkinsciinfra` (as we do not have a Docker Enterprise Account, we cannot create read-only tokens). [EDIT: "NO write" - forgot the no on the original message)]
            Show
            dduportal Damien Duportal added a comment - - edited First tries and we did the following changes: ACI agent were disabled because too unstable: https://github.com/jenkins-infra/jenkins-infra/pull/1827 Due to DockerHub API rate limiting, we quickly hit issues. A temporary measure had been done by adding a docker pullImage secret: https://github.com/jenkins-infra/jenkins-infra/pull/1825 It uses a custom DockerHub account `cijenkinsiok8s` to ensures there is "write" authorization on the images on `jenkins`, `jenkinsci` neither `jenkinsciinfra` (as we do not have a Docker Enterprise Account, we cannot create read-only tokens). [EDIT: "NO write" - forgot the no on the original message)]
            Hide
            dduportal Damien Duportal added a comment - - edited

            After the first tries on the BOM, the Kubernetes agent are wokring pretty well, but a single build of the BOM went from 1h with ACI (when it was working...) to ~2h30.

            We enabled the EKS autoscaling capability to ensure that more resources are added on the go, allowing the BOM build to succeed under ~1h20 and to demonstrate that it still work (to validate the fact that we can add more resources and more clusters in the future).

            Show
            dduportal Damien Duportal added a comment - - edited After the first tries on the BOM, the Kubernetes agent are wokring pretty well, but a single build of the BOM went from 1h with ACI (when it was working...) to ~2h30. We enabled the EKS autoscaling capability to ensure that more resources are added on the go, allowing the BOM build to succeed under ~1h20 and to demonstrate that it still work (to validate the fact that we can add more resources and more clusters in the future). Created the AWS Cloud resources to allow auto-scaling through service account: https://github.com/jenkins-infra/aws/pull/25 Install the cluster-autoscaler helm chart + set it up for EKS: https://github.com/jenkins-infra/charts/pull/1389 Increased the pool size of the autoscaling VM group used within the EKS cluster from 15 to 50 VMs: https://github.com/jenkins-infra/aws/pull/26 New sizing for ci.jenkins.io pods considering the current node size + autoscaling: https://github.com/jenkins-infra/jenkins-infra/pull/1828
            Hide
            dduportal Damien Duportal added a comment -

            As caught by Tim Jacomb in https://github.com/jenkins-infra/jenkins-infra/pull/1829, the Jenkins Core PR hit timeout due to the switch to Kubernetes agents. Sounds like 2 vCPUs were not enough for these.

            The PR https://github.com/jenkins-infra/jenkins-infra/pull/1829 increases the pod size to 4 vCPUs / 8 Gb for ALL agents, to decrease this.

            Next step: improve autoscaling by adding beefier machines, and using spot instances

            Show
            dduportal Damien Duportal added a comment - As caught by Tim Jacomb in https://github.com/jenkins-infra/jenkins-infra/pull/1829 , the Jenkins Core PR hit timeout due to the switch to Kubernetes agents. Sounds like 2 vCPUs were not enough for these. The PR https://github.com/jenkins-infra/jenkins-infra/pull/1829 increases the pod size to 4 vCPUs / 8 Gb for ALL agents, to decrease this. Next step: improve autoscaling by adding beefier machines, and using spot instances
            dduportal Damien Duportal made changes -
            Link This issue relates to INFRA-3030 [ INFRA-3030 ]
            dduportal Damien Duportal made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            Hide
            dduportal Damien Duportal added a comment -

            Tim Jacomb I consider this issue fixed: if it is ok for you, any further improvement will be the subject of new issues (such as improving costs, performances, optimizing autoscaling, etc). If it does not sounds good to you, please feel free to reopen the issue with a message.

            Show
            dduportal Damien Duportal added a comment - Tim Jacomb I consider this issue fixed: if it is ok for you, any further improvement will be the subject of new issues (such as improving costs, performances, optimizing autoscaling, etc). If it does not sounds good to you, please feel free to reopen the issue with a message.
            dduportal Damien Duportal made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            Hide
            timja Tim Jacomb added a comment -

            It would be good to announce it to dev mailing list as a final part I think.

            Most people will think we're running on ACI still otherwise,

            and I think a couple of docs fixes as well in jenkins-infra/pipeline-library and jenkins-infra/documentation?

            Show
            timja Tim Jacomb added a comment - It would be good to announce it to dev mailing list as a final part I think. Most people will think we're running on ACI still otherwise, and I think a couple of docs fixes as well in jenkins-infra/pipeline-library and jenkins-infra/documentation?
            Hide
            dduportal Damien Duportal added a comment -

            Good point, I was too fast closing the issue.

            I take these action points for this week.

            Show
            dduportal Damien Duportal added a comment - Good point, I was too fast closing the issue. I take these action points for this week.
            dduportal Damien Duportal made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            dduportal Damien Duportal made changes -
            Status Reopened [ 4 ] In Progress [ 3 ]
            Hide
            halkeye Gavin Mogan added a comment -

            testing cause Damien Duportal told me to

            Show
            halkeye Gavin Mogan added a comment - testing cause Damien Duportal told me to
            Hide
            dduportal Damien Duportal added a comment -
            Show
            dduportal Damien Duportal added a comment - Change announced (with a typo): https://groups.google.com/g/jenkinsci-dev/c/6NLoigg7Qbo jenkins-infra/documentation update: https://github.com/jenkins-infra/documentation/pull/16 jenkins-infra/runbooks update: https://github.com/jenkins-infra/runbooks/pull/36 Pipeline library update: https://github.com/jenkins-infra/pipeline-library/pull/220
            Hide
            timja Tim Jacomb added a comment -

            no emoji in comment

            Show
            timja Tim Jacomb added a comment - no emoji in comment
            dduportal Damien Duportal made changes -
            Description The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue
            The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

            This changes leads to the depreciation of the attribute `useAci` for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220)
            dduportal Damien Duportal made changes -
            Description The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

            This changes leads to the depreciation of the attribute `useAci` for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220)
            The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

            This changes leads to the depreciation of the attribute `useAci` for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220).

            If you land on this page because of a deprecation message: please edit the file `Jenkinsfile` at the root of your repository and replace the atrribute `useAci` by `useContainerAgent`. It's only a name change: there are no functionnal changes.
            dduportal Damien Duportal made changes -
            Description The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

            This changes leads to the depreciation of the attribute `useAci` for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220).

            If you land on this page because of a deprecation message: please edit the file `Jenkinsfile` at the root of your repository and replace the atrribute `useAci` by `useContainerAgent`. It's only a name change: there are no functionnal changes.
            The goal is to add to ci.jenkins.io the same workload capability as ACI agents provides but with Kubernetes agents.

            * Ref. INFRA-2919 : Kubernetes is configured with an AWS EKS cluster
            * Why? ACI cost a kidney, and are a liability. Merging workloads to a static Kube cluster would allow a static build capability to decrease the infra cost, and keeping ACI for "peaks" in the build queue

            This changes leads to the depreciation of the attribute {{useAci}} for the pipeline library function `buildPlugin()` (see PR on the library at https://github.com/jenkins-infra/pipeline-library/pull/220).

            If you land on this page because of a deprecation message: please edit the file {{Jenkinsfile}} at the root of your repository and replace the atrribute {{useAci}} by {{useContainerAgent}}.
            * It's only a name change: there are no functionnal changes.
            * You can find an example change on the following pull request: https://github.com/jenkinsci/jenkins-infra-test-plugin/pull/12
            dduportal Damien Duportal made changes -
            Link This issue relates to INFRA-2059 [ INFRA-2059 ]
            dduportal Damien Duportal made changes -
            Rank Ranked higher
            ponch F Ponciroli made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            Hide
            ponch F Ponciroli added a comment -

            I have closed it by mistake. Sorry about that :/ 

            Show
            ponch F Ponciroli added a comment - I have closed it by mistake. Sorry about that :/ 
            ponch F Ponciroli made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            dduportal Damien Duportal made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Closed [ 6 ]

              People

              Assignee:
              dduportal Damien Duportal
              Reporter:
              dduportal Damien Duportal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: