[JENKINS-42768] Builds hang if ECS host leave cluster - Jenkins Jira

Type: New Feature
Resolution: Unresolved
Priority: Minor
Component/s: amazon-ecs-plugin
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

Not sure if this ECS specific or a more general JNLP issue.

One of the main use cases of using Docker for builds is to isolate the VM/hardware performing builds from the build system. As such if an ECS host is removed from the cluster (scaling, spot instance gets outbid, hosts replaced/upgraded) any slave jobs running on this host will never get restarted on a different host.

This manifests as a bunch of different errors mostly related to not being able to connect to the specific ECS slave that was doing the build:

Example1:
Cannot contact ecs-mycluster-3ec9178c860: java.io.IOException: remote file operation failed: /home/jenkins/workspace/nd_my-build-3FBTDD6B4SFOB7TWNY6VBPSYYH6DRQTTK6UHDJWRBBMUEXUBV2BQ at hudson.remoting.Channel@66dfe171:Channel to /172.31.4.13: hudson.remoting.ChannelClosedException: channel is already closed

Example 2:

Cannot contact ecs-majstg-3dc6b7ce91d: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected EOF while receiving the data from the channel. FIFO buffer has been already closed
Is this an ECS plugin issue or a more general issue/limitation with Jenkins slaves? How are people using this plugin with spot instances or in general coordinating ECS hosts leaving/entering the cluster?

Assignee:: Jan Roehrich

Reporter:: John Hovell

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2017-03-14 19:39

Updated:: 2017-03-14 19:39

Details

Description

Attachments

Activity

People

Dates