Details
-
Type:
New Feature
-
Status: Open (View Workflow)
-
Priority:
Critical
-
Resolution: Unresolved
-
Component/s: workflow-durable-task-step-plugin
-
Labels:None
-
Similar Issues:
Description
While my pipeline was running, the node that was executing logic terminated. I see this at the bottom of my console output:
Cannot contact ip-172-31-242-8.us-west-2.compute.internal: java.io.IOException: remote file operation failed: /ebs/jenkins/workspace/common-pipelines-nodeploy at hudson.remoting.Channel@48503f20:ip-172-31-242-8.us-west-2.compute.internal: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on ip-172-31-242-8.us-west-2.compute.internal failed. The channel is closing down or has closed down
There's a spinning arrow below it.
I have a cron script that uses the Jenkins master CLI to remove nodes which have stopped responding. When I examine this node's page in my Jenkins website, it looks like the node is still running that job and i see an orange label that says "Feb 22, 2018 5:16:02 PM Node is being removed".
I'm wondering what would be a better way to say "If the channel closes down, retry the work on another node with the same label?
Things seem stuck. Please advise.
Attachments
Issue Links
- is duplicated by
-
JENKINS-49241 pipeline hangs if slave node momentarily disconnects
-
- Open
-
-
JENKINS-47868 Pipeline durability hang when slave node disconnected
-
- Resolved
-
-
JENKINS-57675 Pipeline steps running forever when executor fails
-
- Resolved
-
-
JENKINS-47561 Pipelines wait indefinitely for kubernetes slaves to come back online
-
- Closed
-
-
JENKINS-43607 Jenkins pipeline not aborted when the machine running docker container goes offline
-
- Resolved
-
-
JENKINS-56673 Better handling of ChannelClosedException in Declarative pipeline
-
- Resolved
-
- is related to
-
JENKINS-41854 Contextualize a fresh FilePath after an agent reconnection
-
- Resolved
-
- relates to
-
JENKINS-36013 Automatically abort ExecutorPickle rehydration from an ephemeral node
-
- Closed
-
-
JENKINS-61387 SlaveComputer not cleaned up after the channel is closed
-
- Open
-
-
JENKINS-59340 Pipeline hangs when Agent pod is Terminated
-
- Resolved
-
-
INFRA-2140 All Windows agents are offline
-
- Closed
-
- links to
Olivier Boudet subcase #3 as above should be addressed in recent releases: if an agent pod is deleted then the corresponding build should abort in a few minutes. There is not currently any logic which would do the same after a PodPhase: Failed. That would be a new RFE.