Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-42768

Builds hang if ECS host leave cluster

    XMLWordPrintable

Details

    Description

      Not sure if this ECS specific or a more general JNLP issue.

      One of the main use cases of using Docker for builds is to isolate the VM/hardware performing builds from the build system. As such if an ECS host is removed from the cluster (scaling, spot instance gets outbid, hosts replaced/upgraded) any slave jobs running on this host will never get restarted on a different host.

      This manifests as a bunch of different errors mostly related to not being able to connect to the specific ECS slave that was doing the build:

      Example1:
      Cannot contact ecs-mycluster-3ec9178c860: java.io.IOException: remote file operation failed: /home/jenkins/workspace/nd_my-build-3FBTDD6B4SFOB7TWNY6VBPSYYH6DRQTTK6UHDJWRBBMUEXUBV2BQ at hudson.remoting.Channel@66dfe171:Channel to /172.31.4.13: hudson.remoting.ChannelClosedException: channel is already closed

      Example 2:

      Cannot contact ecs-majstg-3dc6b7ce91d: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected EOF while receiving the data from the channel. FIFO buffer has been already closed
      Is this an ECS plugin issue or a more general issue/limitation with Jenkins slaves? How are people using this plugin with spot instances or in general coordinating ECS hosts leaving/entering the cluster?

      Attachments

        Activity

          There are no comments yet on this issue.

          People

            roehrijn2 Jan Roehrich
            jhovell John Hovell
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: