-
Type:
Bug
-
Resolution: Cannot Reproduce
-
Priority:
Critical
-
Component/s: ec2-fleet-plugin, ssh-slaves-plugin
SSHLauncher{host='10.50.10.252', port=22, credentialsId='aaf2ee5e-32bd-4675-9793-0570922f9c66', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=5, maxNumRetries=120, retryWaitTime=2, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.ManuallyTrustedKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[11/16/18 20:19:40] [SSH] Opening SSH connection to 10.50.10.252:22.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 120 more retries left.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 119 more retries left.
Connection refused (Connection refused)
SSH Connection failed with IOException: "Connection refused (Connection refused)", retrying in 2 seconds. There are 118 more retries left.
ERROR: null
java.util.concurrent.CancellationException
{{ at java.util.concurrent.FutureTask.report(FutureTask.java:121)}}
{{ at java.util.concurrent.FutureTask.get(FutureTask.java:192)}}
{{ at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:904)}}
{{ at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)}}
{{ at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)}}
{{ at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
[11/16/18 20:19:45] Launch failed - cleaning up connection
[11/16/18 20:19:45] [SSH] Connection closed.
Ā
This happens whenever a new ec2 fleet instance is brought online. During this time cloud-init is still working it's magic to install docker/openjdk and add the new Jenkins user (and it's key). However after the Launch failed error message there are no more retries and that slave is never contacted again, even-though if we manually press the button to reconnect the slave comes online without issues.
Ā
Clearly there are more retries left, yet it is completely dead in the water.
This used to work without issues on older versions of Jenkins and this just recently started.
Ā
We are runningĀ Jenkins ver. 2.138.3Ā from the jenkinsci/blueocean docker image.