Here's the complete setup - account A will refer to the account where the Jenkins master operates and the EKS cluster running the slaves, account B will refer to the account we are provisioning resources into where we use a deployment role. Apologies as this is a bit lengthy.
- Master jenkins server runs in account A using a limited IAM role which has access to EKS cluster using the Jenkins k8s plugin.
- EKS nodes within nodegroups and Fargate profiles are configured with IAM permissions to be able to run Fluent Bit configurations and allow them to write their own logs to Cloudwatch which represent the API logs and console logs for the containers. This happens through regular IAM and is independent of roles assumed on pods. This is a standard Amazon logging configuration using Fluent Bit as a DaemonSet on the NodeGroup and the baked in implementation of Fluent Bit offered on the Fargate workers.
- AWS standard for giving k8s service accounts access to IAM roles is to create an OIDC connector between the target account (account B) and the k8s endpoint, not a local IAM role within account Account A at all. The role in Account B is configured with an Assume Role policy that allows a AssumeRoleWithWebIdentity for the k8s user by name, through the OIDC connector only. This role is given the deployment permissions for account B.
- Service Accounts are configured within a namespace on EKS containing an 'eks.amazonaws.com/role-arn' annotation which specifies the role that the service account should take on.
- podTemplate for the deployment job within Jenkins specifies namespace where the pod is to run and the service account name.
- When the pod is launched AWS calls (must be baked into the SDK to perform AssumeRoleWithWebIdentity following a hint) automatically match the role from the service account annotation and connect to the target account. Being challenged for a token, the k8s endpoint issues the token to the OIDC connector allowing the k8s user to assume the role. This happens somewhat 'automagically' once the configuration is right.
Note that this configuration is not something that is only needed for federated accounts, this is Amazon's model for assigning roles to EKS service accounts regardless and in a single account scenario the OIDC connector and the ROLE would be in the same account, the mechanism is the same though, one uses AssumeRoleWithWebIdentity and this happens automatically with the federated principal coming from k8s rather than IAM.
In this configuration, either cross account or within the same account we get the same experience, the run completes but, the master log is produced (lacking agent activity as noted in script areas etc.) and log streams are created for the agent runs, these are empty because the agent appears not to be able to write to them.
Noting standard AWS Fluent Bit configurations produce logs perfectly both for the node group and the EKS cluster but this does not offer job logging for Jenkins of course.
In our model the slave role, in the target account, does not have rights to connect back. There is an IAM role assumed through the use of k8s annotations that gives the slave access through OIDC connection and web federation into the target account from the master.. we do not use AWS credentials at all with this config and purely rely upon IAM instance roles for the master and OIDC web assume role for the slave so it purely operates in the target account with no access back to the account containing the Cloudwatch log groups that the master has got access to.
For additional context, the role assumed by the slave is for infrastructure deployment so really needs the minimal roles that we assigned to it, we tested fallback to Fargate instance role in the source account and that didn't work. If this had been a role on a Jenkins slave that needed no other IAM based rights in the target account then possibly passing credentials could be fine, however, a) We can't for these reasons and b) This still allows the slave to operate on costed resources (logging) in the master account so we feel this model is cleaner for us.
It also seem less value to spend the time creating a 3rd plugin to get the real feature, live streaming of events vs copy up in a stage, on another plugin where you have done such a great job
I hope this clarifies.
-Andy