-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Jenkins ver. 2.204.2
ec2 plugin 1.49.1
When trying to provision a new agent the plugin would start and then terminate an EC2 instance several times before succeeding in the end.
I have no explanation for this behavior. Might be related to JENKINS-61343
Might be because the plugin somehow allocates "2 computer(s)" even though the instance cap is 1.
Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log bootstrap() Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Getting keypair... Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Using private key j4a-ec2-ssh-key (SHA-1 fingerprint a7:b4:70:08:35:11:e3:cf:4b:f5:92:57:b8:02:7f:c6:8e:54:52:02) Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Authenticating as admin Mar 06, 2020 3:05:22 PM INFO hudson.slaves.NodeProvisioner lambda$update$6 EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s) Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Connecting to 10.20.4.41 on port 22, with timeout 10000. Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log Connected via SSH. Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log connect fresh as root Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log Connecting to 10.20.4.41 on port 22, with timeout 10000. Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Connected via SSH. Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Creating tmp directory (/tmp) if it does not exist Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Verifying: java -fullversion Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Verifying: which scp Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Copying remoting.jar to: /tmp Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /opt/jenkins Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Terminated EC2 instance (terminated): i-021d76d0ffff3375f Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Removed EC2 instance from jenkins master: i-021d76d0ffff3375f Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Considering launching
[JENKINS-61370] EC2 instances are terminated during launch
Description |
Original:
I'm seeing this in the logs every 10 minutes. The monitor thread starts and then dies. 10 minutes later it repeats. {code}Mar 05, 2020 10:54:35 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0 Started EC2 alive slaves monitor Mar 05, 2020 10:54:35 AM SEVERE hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException A thread (EC2 alive slaves monitor thread/10354) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code. java.lang.NullPointerException at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$countQueueItemsForAgentTemplate$8(MinimumInstanceChecker.java:67) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.LongPipeline.reduce(LongPipeline.java:461) at java.util.stream.LongPipeline.sum(LongPipeline.java:419) at java.util.stream.ReferencePipeline.count(ReferencePipeline.java:593) at hudson.plugins.ec2.util.MinimumInstanceChecker.countQueueItemsForAgentTemplate(MinimumInstanceChecker.java:68) at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$null$11(MinimumInstanceChecker.java:87) at java.util.ArrayList.forEach(ArrayList.java:1257) at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1082) at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$12(MinimumInstanceChecker.java:76) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) at hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:75) at hudson.plugins.ec2.EC2SlaveMonitor.execute(EC2SlaveMonitor.java:41) at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:100) at java.lang.Thread.run(Thread.java:748) {code} |
New:
When trying to provision a new agent the plugin would start and then terminate an EC2 instance several times before succeeding in the end. I have no explanation for this behavior. Might be related to Might be because the plugin somehow allocates "2 computer(s)" even though the instance cap is 1. {code}Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log bootstrap() Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Getting keypair... Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Using private key j4a-ec2-ssh-key (SHA-1 fingerprint a7:b4:70:08:35:11:e3:cf:4b:f5:92:57:b8:02:7f:c6:8e:54:52:02) Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Authenticating as admin Mar 06, 2020 3:05:22 PM INFO hudson.slaves.NodeProvisioner lambda$update$6 EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s) Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log Connecting to 10.20.4.41 on port 22, with timeout 10000. Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log Connected via SSH. Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log connect fresh as root Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log Connecting to 10.20.4.41 on port 22, with timeout 10000. Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Connected via SSH. Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Creating tmp directory (/tmp) if it does not exist Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Verifying: java -fullversion Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Verifying: which scp Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Copying remoting.jar to: /tmp Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /opt/jenkins Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Terminated EC2 instance (terminated): i-021d76d0ffff3375f Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate Removed EC2 instance from jenkins master: i-021d76d0ffff3375f Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.EC2Cloud provision SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Considering launching {code} |
After changing to "SSH process" connection method I was able to see additional errors, which are swallowed by the Trilead java connector.
Exception in thread "main" java.nio.file.AccessDeniedException: /opt/jenkins at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) at java.nio.file.Files.createDirectory(Files.java:674) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) at java.nio.file.Files.createDirectories(Files.java:767) at org.jenkinsci.remoting.engine.WorkDirManager.initializeWorkDir(WorkDirManager.java:211
Turns out that if you mount a block device in user-data script it can sometimes not be available when the master SSH connection comes in.
The solution seems to be to use the init-script instead.