Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-61370

EC2 instances are terminated during launch

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.204.2
      ec2 plugin 1.49.1
    • Similar Issues:

      Description

      When trying to provision a new agent the plugin would start and then terminate an EC2 instance several times before succeeding in the end.
      I have no explanation for this behavior. Might be related to JENKINS-61343
      Might be because the plugin somehow allocates "2 computer(s)" even though the instance cap is 1.

      Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      bootstrap()
      
      Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Getting keypair...
      
      Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Using private key j4a-ec2-ssh-key (SHA-1 fingerprint a7:b4:70:08:35:11:e3:cf:4b:f5:92:57:b8:02:7f:c6:8e:54:52:02)
      
      Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Authenticating as admin
      
      Mar 06, 2020 3:05:22 PM INFO hudson.slaves.NodeProvisioner lambda$update$6
      
      EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)
      
      Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Connecting to 10.20.4.41 on port 22, with timeout 10000.
      
      Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Connected via SSH.
      
      Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      connect fresh as root
      
      Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Connecting to 10.20.4.41 on port 22, with timeout 10000.
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Connected via SSH.
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Creating tmp directory (/tmp) if it does not exist
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Verifying: java -fullversion
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Verifying: which scp
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Copying remoting.jar to: /tmp
      
      Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log
      
      Launching remoting agent (via Trilead SSH2 Connection):  java  -jar /tmp/remoting.jar -workDir /opt/jenkins
      
      Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
      
      Terminated EC2 instance (terminated): i-021d76d0ffff3375f
      
      Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate
      
      Removed EC2 instance from jenkins master: i-021d76d0ffff3375f
      
      Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.EC2Cloud provision
      
      SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units
      
      Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo
      
      SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Considering launching
      

        Attachments

          Activity

          jbochenski Jakub Bochenski created issue -
          jbochenski Jakub Bochenski made changes -
          Field Original Value New Value
          Description I'm seeing this in the logs every 10 minutes.
          The monitor thread starts and then dies.
          10 minutes later it repeats.

          {code}Mar 05, 2020 10:54:35 AM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0

          Started EC2 alive slaves monitor

          Mar 05, 2020 10:54:35 AM SEVERE hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtException

          A thread (EC2 alive slaves monitor thread/10354) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
          java.lang.NullPointerException
          at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$countQueueItemsForAgentTemplate$8(MinimumInstanceChecker.java:67)
          at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
          at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
          at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
          at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
          at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
          at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
          at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
          at java.util.stream.LongPipeline.reduce(LongPipeline.java:461)
          at java.util.stream.LongPipeline.sum(LongPipeline.java:419)
          at java.util.stream.ReferencePipeline.count(ReferencePipeline.java:593)
          at hudson.plugins.ec2.util.MinimumInstanceChecker.countQueueItemsForAgentTemplate(MinimumInstanceChecker.java:68)
          at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$null$11(MinimumInstanceChecker.java:87)
          at java.util.ArrayList.forEach(ArrayList.java:1257)
          at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1082)
          at hudson.plugins.ec2.util.MinimumInstanceChecker.lambda$checkForMinimumInstances$12(MinimumInstanceChecker.java:76)
          at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
          at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
          at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
          at java.util.Iterator.forEachRemaining(Iterator.java:116)
          at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
          at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
          at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
          at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
          at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
          at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
          at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
          at hudson.plugins.ec2.util.MinimumInstanceChecker.checkForMinimumInstances(MinimumInstanceChecker.java:75)
          at hudson.plugins.ec2.EC2SlaveMonitor.execute(EC2SlaveMonitor.java:41)
          at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:100)
          at java.lang.Thread.run(Thread.java:748)

          {code}
          When trying to provision a new agent the plugin would start and then terminate an EC2 instance several times before succeeding in the end.
          I have no explanation for this behavior. Might be related to JENKINS-61343
          Might be because the plugin somehow allocates "2 computer(s)" even though the instance cap is 1.

          {code}Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log

          bootstrap()

          Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log

          Getting keypair...

          Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log

          Using private key j4a-ec2-ssh-key (SHA-1 fingerprint a7:b4:70:08:35:11:e3:cf:4b:f5:92:57:b8:02:7f:c6:8e:54:52:02)

          Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log

          Authenticating as admin

          Mar 06, 2020 3:05:22 PM INFO hudson.slaves.NodeProvisioner lambda$update$6

          EC2 (ec2) - ec2 (ami-028d96c69234f9d1a) provisioning successfully completed. We have now 2 computer(s)

          Mar 06, 2020 3:05:22 PM INFO hudson.plugins.ec2.EC2Cloud log

          Connecting to 10.20.4.41 on port 22, with timeout 10000.

          Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log

          Connected via SSH.

          Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log

          connect fresh as root

          Mar 06, 2020 3:05:29 PM INFO hudson.plugins.ec2.EC2Cloud log

          Connecting to 10.20.4.41 on port 22, with timeout 10000.

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Connected via SSH.

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Creating tmp directory (/tmp) if it does not exist

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Verifying: java -fullversion

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Verifying: which scp

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Copying remoting.jar to: /tmp

          Mar 06, 2020 3:05:30 PM INFO hudson.plugins.ec2.EC2Cloud log

          Launching remoting agent (via Trilead SSH2 Connection): java -jar /tmp/remoting.jar -workDir /opt/jenkins

          Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate

          Terminated EC2 instance (terminated): i-021d76d0ffff3375f

          Mar 06, 2020 3:05:31 PM INFO hudson.plugins.ec2.EC2OndemandSlave terminate

          Removed EC2 instance from jenkins master: i-021d76d0ffff3375f

          Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.EC2Cloud provision

          SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Attempting to provision slave needed by excess workload of 1 units

          Mar 06, 2020 3:05:32 PM INFO hudson.plugins.ec2.SlaveTemplate logProvisionInfo

          SlaveTemplate{ami='ami-028d96c69234f9d1a', labels='docker docker-bakery'}. Considering launching
          {code}
          Hide
          jbochenski Jakub Bochenski added a comment -

          After changing to "SSH process" connection method I was able to see additional errors, which are swallowed by the Trilead java connector.

          Exception in thread "main" java.nio.file.AccessDeniedException: /opt/jenkins
          	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
          	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
          	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
          	at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
          	at java.nio.file.Files.createDirectory(Files.java:674)
          	at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
          	at java.nio.file.Files.createDirectories(Files.java:767)
          	at org.jenkinsci.remoting.engine.WorkDirManager.initializeWorkDir(WorkDirManager.java:211
          

          Turns out that if you mount a block device in user-data script it can sometimes not be available when the master SSH connection comes in.
          The solution seems to be to use the init-script instead.

          Show
          jbochenski Jakub Bochenski added a comment - After changing to "SSH process" connection method I was able to see additional errors, which are swallowed by the Trilead java connector. Exception in thread "main" java.nio.file.AccessDeniedException: /opt/jenkins at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384) at java.nio.file.Files.createDirectory(Files.java:674) at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781) at java.nio.file.Files.createDirectories(Files.java:767) at org.jenkinsci.remoting.engine.WorkDirManager.initializeWorkDir(WorkDirManager.java:211 Turns out that if you mount a block device in user-data script it can sometimes not be available when the master SSH connection comes in. The solution seems to be to use the init-script instead.

            People

            Assignee:
            thoulen FABRIZIO MANFREDI
            Reporter:
            jbochenski Jakub Bochenski
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated: