[JENKINS-43607] Jenkins pipeline not aborted when the machine running docker container goes offline

Type: New Feature
Resolution: Duplicate
Priority: Major
Component/s: workflow-durable-task-step-plugin
Labels:
None
Environment:
Jenkins ver. 2.53
Pipeline job /
Pipeline: Nodes and Processes plugins : ver. 2.10

Similar Issues:
Powered by SuggestiMate

Show

Preconditions

Jenkins pipeline job is configured to run parallel actions in different docker swarm nodes.

Procedure

Run job
Force disconnect of a node running a part of this job

Actual outcome

Job will never terminate. The pipeline part will remain stuck in:

Cannot contact swarm-xxxxxxxx: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException

The exception is catched by workflow-durable-task-step-plugin and used to display the log above.

Expected outcome

The pipeline part execution should generate an exception that can be catched.

This will can allow implementing a retry strategy in Pipeline job.

duplicates

JENKINS-49707 Auto retry for elastic agents after channel closure

Resolved

relates to

JENKINS-36013 Automatically abort ExecutorPickle rehydration from an ephemeral node

Closed

Aymen Bouaziz created issue - 2017-04-14 20:14

Aymen Bouaziz made changes - 2017-04-14 20:15

Description

Original:

*Preconditions*

Jenkins pipeline job is configured to run parallel actions in different docker swarm nodes.

*Procedure*
# Run job
# Force disconnect of a node running a part of this job

*Actual outcome*

Job will never terminate. The pipeline part will remain stuck in:
Cannot contact swarm-xxxxxxxx: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException
The exception catched by workflow-durable-task-step-plugin and used to display the log above.

*Expected outcome*

The pipeline part execution should generate an exception that can be catched.

This will can allow implementing a retry strategy in Pipeline job.

New: *Preconditions*

Jenkins pipeline job is configured to run parallel actions in different docker swarm nodes.

*Procedure*
# Run job
# Force disconnect of a node running a part of this job

*Actual outcome*

Job will never terminate. The pipeline part will remain stuck in:
{noformat}
Cannot contact swarm-xxxxxxxx: hudson.remoting.RequestAbortedException: java.nio.channels.ClosedChannelException{noformat}
The exception is catched by workflow-durable-task-step-plugin and used to display the log above.

*Expected outcome*

The pipeline part execution should generate an exception that can be catched.

This will can allow implementing a retry strategy in Pipeline job.

Jesse Glick made changes - 2017-04-19 15:31

Issue Type

Original: Bug [ 1 ]

New: New Feature [ 2 ]

Jesse Glick made changes - 2017-04-19 15:33

Link

New: This issue relates to ~~JENKINS-36013~~ [ ~~JENKINS-36013~~ ]

Jesse Glick added a comment - 2017-04-19 15:33

As with ~~JENKINS-36013~~, currently the model is that a node may go offline and later be reconnected, in which case the step will quietly resume printing output and exit normally. For Swarm or other cloud-like node schemes, a disconnection may be followed by an actual permanent removal of the node definition, in which case it would be desirable for the step to abort.

Jesse Glick added a comment - 2017-04-19 15:33 As with JENKINS-36013 , currently the model is that a node may go offline and later be reconnected, in which case the step will quietly resume printing output and exit normally. For Swarm or other cloud-like node schemes, a disconnection may be followed by an actual permanent removal of the node definition, in which case it would be desirable for the step to abort.

Michael McCallum added a comment - 2018-09-24 00:25

jglick should this get more attention? there are a number of tickets and questions turning up online as ephemeral nodes are becoming way more common. GKE in particular makes its very cheap and easy.

Michael McCallum added a comment - 2018-09-24 00:25 jglick should this get more attention? there are a number of tickets and questions turning up online as ephemeral nodes are becoming way more common. GKE in particular makes its very cheap and easy.

Jesse Glick made changes - 2018-09-25 00:42

Link

New: This issue duplicates ~~JENKINS-49707~~ [ ~~JENKINS-49707~~ ]

Jesse Glick made changes - 2018-09-25 00:42

Resolution		New: Duplicate [ 3 ]
Status	Original: Open [ 1 ]	New: Resolved [ 5 ]

Assignee:: Unassigned

Reporter:: Aymen Bouaziz

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2017-04-14 20:14

Updated:: 2018-09-25 00:42

Resolved:: 2018-09-25 00:42

Jenkins

Details

Description

Attachments

Issue Links

Activity

Collapse comment: Jesse Glick added a comment - 2017-04-19 15:33

Expand comment: Jesse Glick added a comment - 2017-04-19 15:33

Collapse comment: Michael McCallum added a comment - 2018-09-24 00:25

Expand comment: Michael McCallum added a comment - 2018-09-24 00:25

People

Dates