Last month bill (September 2021) on AWS was 15K$. That is too much we have to target 8K$ monthly max.
This issue tracks the cost division and the associated tasks.
Time span: past 3 months (July -> Sept. 2021)
- us-east-1 (with the "static" services such as trusted.ci, pkg, etc.) is around 5.1K$ per month, constant
- us-east-2 (with the ci.jenkins.io workloads: VMs and Kubernetes) went from ~5K$ to 10K$
The spendings are splitted in 3 Usage type categories below: Data (in/out)bound transfer, EC2 instances and others (see subsections below. Please note that the "Others" on the diagram is a set of "others" but also EC2 run instances items not visible on the top of the table)
- Most of the data transfer cost comes from EC2's on the region us-east-1 (Virginia):
- The outbound transfer is ~ 3k$ per month, for around 50 Tb of outbound transfer (inbound costs 0$ for around 2 Tb in, and inter-region is ~60$ for 5Tb per month)
- Pure EC2 run spendings (without snapshots/gateway/additional storage) went from around 5k to 8k!
- The "static" services (trusted.ci, its static agents, pkg.io, census, bound) are quite constant at ~0.8k $ per month
- The EBS usages (volume + snapshots) went from 0.3k$ to almost 0.5 k$, because of the increased packer activity + increase of the volume size for pkg.jenkins.io.
- There are around ~0.3k$ of S3 usage, inter-region NAT etc.
This diagram show the "real others spendings" by excluding only the EC2 run hours and the EC2 data in/out:
ci.jenkins.io has different cloud sources to spawn agents and handles builds (exluding the 2 static agents for s390x and ppc64le that free for us): Azure VMs, EC2 VMs, (ACI-Windows) containers and (Kubernetes-Linux) containers, as for today.
Only EC2 VMs and (Kubernetes-Linux in an EKS cluster) containers are hosted in AWS as for today.
- The configuration source of truth is https://github.com/jenkins-infra/jenkins-infra/blob/production/hieradata/clients/azure.ci.jenkins.io.yaml#L36
- There is a documentation at https://github.com/jenkins-infra/documentation/blob/main/ci.adoc (but not always up to date)
- EC2 VMs are of 3 kinds:
- Ubuntu 20.04 "standard" VMs to handle builds with "docker" or not fit for running inside a container (whatever the reason is)
- Ubuntu 20.04 "highmem" VMs to handle builds that require a big amount of memory (for Jenkins Acceptance Test Harness aka. ATH, or some other performance tests). Most of this usages are also requiring Docker: we use the SAME template as Ubuntu 20.04 "standard" except bigger instance size + additional label "highmem"
- Windows 2019 VMs to handle builds with "Windows containers" or not fit to run inside container
- Linux Containers are, as for today, hosted in a single EKS cluster
- Source of truth for EKS is the following (Terraform-based) link: https://github.com/jenkins-infra/aws/blob/main/eks-cluster.tf