-
Bug
-
Resolution: Unresolved
-
Major
-
Jenkins 2.73.2
Kubernetes-plugin 1.1
We saw a regression when upgrading to v. 1.1 of the Kubernetes-plugin (+Jenkins 2.73.2) — where a crashing container (not the slave) in a pod would retrigger restart of the pods every ten seconds (guess this is related to the activeDeadlineSeconds).
The job had to be stopped manually to stop this pod creation from spiralling out of control and the crashed pods had to be deleted manually.
- relates to
-
JENKINS-68409 kubernetes plugin can create a new pod every 10s when something wrong
-
- Reopened
-
That's seriously your position on this bug!?!?!
Something so simple as a single commit from a developer in a feature branch or an external system being unreachable can take down Jenkins for an entire organization. This defeats all the benefits of having ephemeral, self-contained slaves to run jobs. What's more, there is no visible indication in the Jenkins UI of what is happening, so the developers who are impacted have no way to know or how to stop it.
Elaborate monitoring/alerting can help the ops team with reaction time, but doesn't prevent the issue from continuing.
If you don't want to make RestartPolicy and BackoffLimit configurable, then maybe addressing how the container cap works can at least limit the impact to the entire development team.