-
Bug
-
Resolution: Fixed
-
Blocker
-
Jenkins ver. 2.7.1
ec2-fleet-plugin ver. 1.0
Amazon Linux AMI 2015.03
java version "1.7.0_85"
OpenJDK Runtime Environment (amzn-2.6.1.3.61.amzn1-x86_64 u85-b01)
OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)
Scenario:
I've created a fleet under clouds in jenkins config, and selected my spot request. test returns an error, which may or may not be related. the ids i've redacted all match the ids i see in the ec2 management console.
The SpotFleetRequestId(s) sfr-<stuff> do not exist. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidSpotFleetRequestId.NotFound; Request ID: <other stuff> )
to test the plugin, i set it up with a max cluster size of 3, scaledown timeout of 15 minutes. the spot request capacity defaults to 1.
expected:
when the test job runs, two additional slaves are created, backed by new spot instances. everything runs fine until the job is done and the scaledown timeout hits, at which point the slaves are not removed. one or more may show disconnected, and the log begins to show errors:
SEVERE: I/O error in channel i-cdcef210
java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
SEVERE: Timer task hudson.slaves.ComputerRetentionWork@5654d212 failed
java.lang.IllegalStateException: Unknown instance terminated: i-cdcef210
at com.amazon.jenkins.ec2fleet.EC2FleetCloud.terminateInstance(EC2FleetCloud.java:266)
at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:28)
at com.amazon.jenkins.ec2fleet.IdleRetentionStrategy.check(IdleRetentionStrategy.java:12)
at hudson.slaves.ComputerRetentionWork$1.run(ComputerRetentionWork.java:72)
at hudson.model.Queue._withLock(Queue.java:1315)
at hudson.model.Queue.withLock(Queue.java:1192)
at hudson.slaves.ComputerRetentionWork.doRun(ComputerRetentionWork.java:63)
at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:50)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
the second of those errors repeats until the slave is manually removed (which sometimes results in a jenkins stack trace in the UI with the same error), at which point the rest of the process is unblocked. the spot instances are terminated and the slaves are removed.
the plugin is essentially nonfunctional in this state