-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins 2.504.2 LTS, Kubernetes Plugin 4350.va_0283de0d6d6
Before restarting Jenkins Controller, I had some builds running.
I watched one specific build before the restart and after the restart and noticed that it did not come back because after the restart, another build which was in queue had picked the pod agent of the original build.
The original build was kept as:
# WHEN BUILD STARTED 16:18:16 Still waiting to schedule task 16:18:16 Waiting for next available executor on ‘medium’ 16:19:35 Agent cluster003-medium-f1jjw is provisioned from template cluster003-medium 16:19:35 Running on cluster003-medium-f1jjw in /home/jenkins/agent/workspace/mybuild # AFTER RESTART 16:44:27 Resuming build at Fri May 30 16:44:27 EDT 2025 after Jenkins restart 16:44:28 Waiting for reconnection of cluster003-medium-f1jjw before proceeding with build
Then, I was inspecting the pod logs, and noticed it had correctly reconnected to the Jenkins after the Restart:
cluster003-medium-f1jjw jnlp May 30, 2025 8:42:55 PM hudson.remoting.Launcher$CuiListener status
cluster003-medium-f1jjw jnlp INFO: https://myjenkins.com/jenkins/login is not ready: 503
cluster003-medium-f1jjw jnlp May 30, 2025 8:42:55 PM hudson.remoting.Launcher$CuiListener status
cluster003-medium-f1jjw jnlp INFO: Waiting 10 seconds before retry
cluster003-medium-f1jjw jnlp May 30, 2025 8:43:07 PM hudson.remoting.Launcher$CuiListener status
cluster003-medium-f1jjw jnlp INFO: WebSocket connection open
cluster003-medium-f1jjw jnlp May 30, 2025 8:43:07 PM hudson.remoting.Launcher$CuiListener status
cluster003-medium-f1jjw jnlp INFO: Connected
And the agent was indeed connected in the Jenkins computers. However, to my surprise, another build had picked it up:
16:43:43 [Pipeline] Start of Pipeline 16:43:45 [Pipeline] node 16:43:45 Running on cluster003-medium-f1jjw in /home/jenkins/agent/workspace/anotherbuild
And that is why the original build was not able to recover after the restart.
I checked my controller logs and I was able to find these related:
2025-05-30 20:42:27.655+0000 [id=594] INFO o.c.j.p.k.pod.retention.Reaper#watchCloud: set up watcher on cluster003
2025-05-30 20:42:27.655+0000 [id=594] INFO o.c.j.p.k.KubernetesLauncher#launch: Agent has already been launched, activating: cluster003-medium-f1jjw
2025-05-30 20:42:27.656+0000 [id=586] INFO o.c.j.p.k.p.r.Reaper$CloudPodWatcher#stop: Stopping watch for kubernetes cloud cluster003
There's nothing beside that.
Also, I think it is worth mentioning my Kubernetes Cloud configuration:
jenkins: clouds: - kubernetes: name: cluster003 credentialsId: cluster003-kubeconfig namespace: jenkins-agents containerCap: 175 retentionTimeout: 15 webSocket: true templates: - name: cluster003-medium id: 52ad9e1d-418e-4d57-b13a-075404c50164 label: cluster003-medium medium showRawYaml: false workspaceVolume: genericEphemeralVolume: accessModes: ReadWriteOnce requestsSize: 48Gi storageClassName: local-path yaml: | apiVersion: v1 kind: Pod spec: hostNetwork: false automountServiceAccountToken: false enableServiceLinks: false terminationGracePeriodSeconds: 30 dnsPolicy: Default restartPolicy: Never containers: - name: jnlp image: ghcr.io/felipecrs/jenkins-agent-dind:2 imagePullPolicy: Always resources: limits: cpu: "3.5" memory: 14G ephemeral-storage: 8Gi requests: cpu: "3.5" memory: 14G ephemeral-storage: 8Gi securityContext: privileged: true workingDir: /home/jenkins/agent terminationMessagePolicy: FallbackToLogsOnError
Also, a reference of my Jenkinsfiles:
pipeline {
agent {
label 'medium'
}
}
I have many concurrent builds requesting the same label "medium".
This issue affects me really bad, and in fact renders me unable to restart Jenkins safely for maintenance reasons.
Any help would be deeply appreciated. Also, please let me know if there's anything else I can do to help get this fixed.