-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Jenkins - 2.346.3
Kubernetes plugin - 3697.v771155683e38
Kubernetes - 1.16
AWS Elastic Container Registry
As I see it, the Jenkins plugin for K8s terminates the job when it notices the "Back-off pulling image" message for the container status.
Example of error message:
Unable to pull Docker image "registry.net/project:version". Check if image tag name is spelled correctly.
There are various reasons for this:
- Invalid image tag or registry doesn't exist
- Failed to authorize in "registry.net"
- Rate limits on "registry.net" - (it can be the reason of my issue)
- Network issues
- etc
With that in mind, I think the Jenkins plugin for Kubernetes should leverage exponential backoff retry for the image pull operation.
I've been battling this issue for months. It is particularly annoying when you have a lot of parallel branches. In my case it most likely relates to containerd occasionally offering incorrect auth to our private registry (which is an 'Open' issue with containerd as far as I can tell). While containerd retries, and does the image pull successfully, the Reaper sees the initial Image Pull failure and (in my case) fails the parallel branch. So, yes it would be good if the Reaper TerminateAgentOnImagePullBackOff Listener perhaps ignored more 'transient' image pull errors. For me, as a workaround I've been using this code (in an init.groovy.d script) to simply delete the TerminateAgentOnImagePullBackOff Listener. It's a bit a brutal but 'saves me hours' in failed pipeline runs.
{{}}