Problem: When using the Jenkins Kubernetes Cloud Plugin in a namespace with limited resources, jobs frequently attempt to create multiple pods. If initial pod creation fails due to resource quota limits, the plugin retries with new pod names. Each failed pod attempt results in a new node directory (${JENKINS_HOME}/nodes/<pod_name>) being created. However, these directories are never cleaned up if the pods are not successfully created.
Impact:
- Thousands of stale node directories accumulate over time.
- Jenkins startup becomes extremely slow or crashes due to the volume of entries in the nodes directory.
- Manual cleanup becomes a recurring necessity to ensure Jenkins remains operational.
Expected Behavior: The plugin should automatically remove node directories for pods that were never successfully created.
A single job that waits for resources in the mentioned namespace can generate up to 144 stale directories that are not being deleted. After a while there are thousands of such directories.
Examples:
- Error that is being logged when the resources are missing: see out-of-resources.txt
- Error that is being logged when trying to start Jenkins up (and failing because of the volume of files present in ${JENKINS_HOME}/nodes: failed-to-start.txt