Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-38115

spot instance slaves are not removed, blocks instance termination



    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • ec2-fleet-plugin
    • Jenkins ver. 2.7.1
      ec2-fleet-plugin ver. 1.0
      Amazon Linux AMI 2015.03
      java version "1.7.0_85"
      OpenJDK Runtime Environment (amzn- u85-b01)
      OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)


      I've created a fleet under clouds in jenkins config, and selected my spot request. test returns an error, which may or may not be related. the ids i've redacted all match the ids i see in the ec2 management console.

      The SpotFleetRequestId(s) sfr-<stuff> do not exist. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidSpotFleetRequestId.NotFound; Request ID: <other stuff> )

      to test the plugin, i set it up with a max cluster size of 3, scaledown timeout of 15 minutes. the spot request capacity defaults to 1.

      when the test job runs, two additional slaves are created, backed by new spot instances. everything runs fine until the job is done and the scaledown timeout hits, at which point the slaves are not removed. one or more may show disconnected, and the log begins to show errors:

      SEVERE: I/O error in channel i-cdcef210
      java.io.IOException: Unexpected termination of the channel
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
      Caused by: java.io.EOFException
      at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
      at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
      at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
      at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
      at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
      at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
      at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

      SEVERE: Timer task hudson.slaves.ComputerRetentionWork@5654d212 failed
      java.lang.IllegalStateException: Unknown instance terminated: i-cdcef210
      at com.amazon.jenkins.ec2fleet.EC2FleetCloud.terminateInstance(EC2FleetCloud.java:266)
      at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:28)
      at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:12)
      at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
      at hudson.model.Queue._withLock(Queue.java:1315)
      at hudson.model.Queue.withLock(Queue.java:1192)
      at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
      at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:745)

      the second of those errors repeats until the slave is manually removed (which sometimes results in a jenkins stack trace in the UI with the same error), at which point the rest of the process is unblocked. the spot instances are terminated and the slaves are removed.

      the plugin is essentially nonfunctional in this state



          jvais_soasta jude vais created issue -
          markl_lagendijk Mark Lagendijk made changes -
          Field Original Value New Value
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          markl_lagendijk Mark Lagendijk made changes -
          Status Resolved [ 5 ] Closed [ 6 ]


            cyberax Aleksei Besogonov
            jvais_soasta jude vais
            2 Vote for this issue
            6 Start watching this issue