-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Jenkins 2.46.1, Latest version of pipeline plugins (pipeline-build-step: 2.5, pipeline-rest-api: 2.6, pipeline-stage-step: 2.2, etc)
-
-
durable-task 1.18
During a recent Jenkins plugin upgrade and master restart, it seems that Jenkins failed to resume at least two Pipeline jobs. The pipeline was in the middle of a sh() step when the master was restarted. Both jobs have output similar to the following in the console:
Resuming build at Thu Apr 13 15:01:50 EDT 2017 after Jenkins restart Waiting to resume part of <job name...>: ??? Ready to run at Thu Apr 13 15:01:51 EDT 2017
However this text has been displayed for several minutes now with no obvious indication on what the job is waiting for. We can see that the pipeline is still running on the correct executor that it was running on pre-restart however, if we log into the server, there is no durable task or process of the script that the sh() step was running. From logging of the script that we were running, we can tell that the command did finish successfully but can't understand how Jenkins lost track of it. From the logging, the time when the command finished was around the same time when the master was restarting (it is difficult to pinpoint exactly).
- is related to
-
JENKINS-46961 Pipelines interrupted while starting incorrectly resume after Jenkins restarts and cannot be killed
-
- Reopened
-
-
JENKINS-62248 Pipeline fails to resume after master restart
-
- Fixed but Unreleased
-
- relates to
-
JENKINS-39552 After restart, interrupted pipeline deadlocks waiting for executor
-
- Closed
-
-
JENKINS-67164 Pipelines missing from FlowExecutionList hang forever after resuming
-
- Resolved
-
piratejohnny
For that issue, the pipeline does not have the ability to resume on an agent with the same label because it needs access to the same workspace it was building on before the restart to resume properly. In this case(if it did not have the same workspace) it would try to reconnect to the same agent(which is destroyed) and it would eventually timeout after it can not find the workspace. In your case if you want it to resume on an agent with the same label you would need to persist that workspace somehow.
Either way not related to this ticket in particular(also I assume you meant durable-task 1.17 rather than 1.7
)