Details
-
Type:
Improvement
-
Status: Closed (View Workflow)
-
Priority:
Critical
-
Resolution: Fixed
-
Component/s: workflow-durable-task-step-plugin
-
Similar Issues:
-
Epic Link:
-
Sprint:Pipeline - July/August
Description
ExecutorPickle.rehydrate ought to be able to detect that it has been spinning in circles because the agent node it was supposed to run on is not in the Jenkins node list, and automatically abort, causing the build to fail with a comprehensible message rather than just hanging indefinitely. (As opposed to being registered but offline, which is normal enough for a JNLP agent etc.—in such cases we just want to wait for the agent to come back online.)
This would provide a better experience for the case of a build which was running on an EphemeralNode (such as from a Cloud without durable-task integration) when Jenkins was restarted. An agent using an inappropriate RetentionStrategy is trickier since it might still be defined after a restart, but will soon be terminated. Similarly, there may be cases where the agent is actually going to be redefined (with the same name) when it is attached after the restart—not sure about the Swarm plugin, but CloudBees DEV@cloud OPEs work this way. To prevent the build from being killed too aggressively, the cleanup should be delayed until some time has elapsed since rehydration began (or, ideally, since Jenkins completed initialization)—say, five minutes.
Attachments
Issue Links
- depends on
-
JENKINS-26130 Print progress of pending pickles
-
- Resolved
-
- is duplicated by
-
JENKINS-45917 [Jenkins v2.63] Build queue deadlocks
-
- Closed
-
- is related to
-
JENKINS-45917 [Jenkins v2.63] Build queue deadlocks
-
- Closed
-
-
JENKINS-41569 Pipeline hangs waiting for resume on an agent which never was
-
- Closed
-
- relates to
-
JENKINS-41569 Pipeline hangs waiting for resume on an agent which never was
-
- Closed
-
-
JENKINS-49707 Auto retry for elastic agents after channel closure
-
- Open
-
-
JENKINS-43607 Jenkins pipeline not aborted when the machine running docker container goes offline
-
- Resolved
-
-
JENKINS-33761 Ability to disable Pipeline durability and "resume" build.
-
- Closed
-
- links to
Activity
Field | Original Value | New Value |
---|---|---|
Epic Link | JENKINS-35399 [ 171192 ] |
Link |
This issue depends on |
Workflow | JNJira [ 172654 ] | JNJira + In-Review [ 184708 ] |
Component/s | pipeline-general [ 21692 ] |
Component/s | workflow-plugin [ 18820 ] |
Component/s | workflow-durable-task-step-plugin [ 21715 ] | |
Component/s | pipeline [ 21692 ] |
Link |
This issue relates to |
Link |
This issue is related to |
Link |
This issue relates to |
Link |
This issue relates to |
Labels | robustness | cloudbees-internal-pipeline robustness |
Priority | Major [ 3 ] | Critical [ 2 ] |
Sprint | Pipeline - July/August [ 371 ] |
Assignee | Jesse Glick [ jglick ] | Sam Van Oort [ svanoort ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue is related to |
Status | In Progress [ 3 ] | In Review [ 10005 ] |
Resolution | Fixed [ 1 ] | |
Status | In Review [ 10005 ] | Closed [ 6 ] |
Link |
This issue is duplicated by |
Remote Link | This issue links to "CloudBees Internal CD-179 (Web Link)" [ 18944 ] |
Remote Link | This issue links to "CloudBees Internal CLTS-2226 (Web Link)" [ 18978 ] |
Link | This issue relates to JENKINS-49707 [ JENKINS-49707 ] |
Remote Link | This issue links to "workflow-durable-task-step #47 (Web Link)" [ 22735 ] |
Remote Link | This issue links to "workflow-durable-task-step #48 (Web Link)" [ 22736 ] |
Originally suggested in
JENKINS-26130but I felt it was better to split this out.JENKINS-26130does at least provide a much more comprehensible diagnosis for the problem.