Uploaded image for project: 'Infrastructure'
  1. Infrastructure
  2. INFRA-3097

Decrease Spendings on AWS

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: In Progress (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Component/s: aws
    • Labels:
      None
    • Epic Name:
      Decrease-AWS
    • Similar Issues:

      Description

      What

      Last month bill (September 2021) on AWS was 15K$. That is too much we have to target 8K$ monthly max.

      This issue tracks the cost division and the associated tasks.

      Costs analysis

      Time span: past 3 months (July -> Sept. 2021)

      Per Region

      • us-east-1 (with the "static" services such as trusted.ci, pkg, etc.) is around 5.1K$ per month, constant
      • us-east-2 (with the ci.jenkins.io workloads: VMs and Kubernetes) went from ~5K$ to 10K$

      Per usage type

      The spendings are splitted in 3 Usage type categories below: Data (in/out)bound transfer, EC2 instances and others (see subsections below. Please note that the "Others" on the diagram is a set of "others" but also EC2 run instances items not visible on the top of the table)

      Data (in/out)bound transfers

      • Most of the data transfer cost comes from EC2's on the region us-east-1 (Virginia):

      • The outbound transfer is ~ 3k$ per month, for around 50 Tb of outbound transfer (inbound costs 0$ for around 2 Tb in, and inter-region is ~60$ for 5Tb per month)

      EC2

      • Pure EC2 run spendings (without snapshots/gateway/additional storage) went from around 5k to 8k!

      • ci.jenkins.io (only user of the region Ohio e.g. "us-east-2") is responsible of 4k to 7.7k
      • The "static" services (trusted.ci, its static agents, pkg.io, census, bound) are quite constant at ~0.8k $ per month

      Others

      • The EBS usages (volume + snapshots) went from 0.3k$ to almost 0.5 k$, because of the increased packer activity + increase of the volume size for pkg.jenkins.io.
      • There are around ~0.3k$ of S3 usage, inter-region NAT etc.

      This diagram show the "real others spendings" by excluding only the EC2 run hours and the EC2 data in/out:

      ci.jenkins.io costs usages

      ci.jenkins.io has different cloud sources to spawn agents and handles builds (exluding the 2 static agents for s390x and ppc64le that free for us): Azure VMs, EC2 VMs, (ACI-Windows) containers and (Kubernetes-Linux) containers, as for today.
      Only EC2 VMs and (Kubernetes-Linux in an EKS cluster) containers are hosted in AWS as for today.

        Attachments

          Issue Links

            Activity

            Hide
            jglick Jesse Glick added a comment -

            Any plans to use Spot for agents?

            Show
            jglick Jesse Glick added a comment - Any plans to use Spot for agents?
            Hide
            dduportal Damien Duportal added a comment -

            Jesse Glick that's one of the roads yes \o/ thanks for the reminder!

            Show
            dduportal Damien Duportal added a comment - Jesse Glick that's one of the roads yes \o/ thanks for the reminder!
            Hide
            dduportal Damien Duportal added a comment -

            First wave of proposals to decrease costs:

            • Migrate updates.jenkins.io to another cloud where the outbound data transfer is cheaper
            • Migrate any other service that emits outbound data transfer (which ones?) from us-east-1
            • Cleanup!
              • MacMini was 0.7k per month (error from myself): it has been removed in October
              • Cleanup the snapshots/AMIs generated by packer and not used anymore
            • Fix labels on ci.jenkins.io: a lot of "docker" builds are using high mem instances (because highmem fulfills the "docker" label). Example: https://ci.jenkins.io/job/Packaging/job/docker/job/PR-1219/11/consoleFull
            • Use spot instances (thanks Jessie) for EC2 VMs
            • Decrease the Kubernetes costs
              • Use spot instances for the worker pool
              • Decrease the autoscaling limit (after hacktoberfest)
              • Add other Kubernetes cluster from DigitalOcan, the 2 OSUSL machines, Scaleway and maybe Azure
            Show
            dduportal Damien Duportal added a comment - First wave of proposals to decrease costs: Migrate updates.jenkins.io to another cloud where the outbound data transfer is cheaper Migrate any other service that emits outbound data transfer (which ones?) from us-east-1 Cleanup! MacMini was 0.7k per month (error from myself): it has been removed in October Cleanup the snapshots/AMIs generated by packer and not used anymore Fix labels on ci.jenkins.io: a lot of "docker" builds are using high mem instances (because highmem fulfills the "docker" label). Example: https://ci.jenkins.io/job/Packaging/job/docker/job/PR-1219/11/consoleFull Use spot instances (thanks Jessie) for EC2 VMs Decrease the Kubernetes costs Use spot instances for the worker pool Decrease the autoscaling limit (after hacktoberfest) Add other Kubernetes cluster from DigitalOcan, the 2 OSUSL machines, Scaleway and maybe Azure
            Show
            jglick Jesse Glick added a comment - Do not forget https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-eks-support-ec2-spot-instances-managed-node-groups/

              People

              Assignee:
              dduportal Damien Duportal
              Reporter:
              dduportal Damien Duportal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated: