Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53952

Linux agents are not starting anymore

    XMLWordPrintable

Details

    Description

      Similar to https://issues.jenkins-ci.org/browse/JENKINS-53876

      Unfortunately with the latest 1.40.1 EC2 nodes are not launching anymore:

      $ cat Jenkins\ Prebuilt\ Slave\ \(sir-p9ai6v8m\)/slave.log
      Oct 08, 2018 10:43:49 PM hudson.plugins.ec2.EC2Cloud
      INFO: Launching instance: null
      Oct 08, 2018 10:43:49 PM hudson.plugins.ec2.EC2Cloud
      INFO: bootstrap()
      Oct 08, 2018 10:43:49 PM hudson.plugins.ec2.EC2Cloud
      INFO: Getting keypair...
      Oct 08, 2018 10:43:49 PM hudson.plugins.ec2.EC2Cloud
      INFO: Using key: master
      xxx
      -----BEGIN RSA PRIVATE KEY-----
      xxx
      Oct 08, 2018 10:43:49 PM hudson.plugins.ec2.EC2Cloud
      INFO: Authenticating as ubuntu
      ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
      java.lang.NullPointerException
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.getEC2HostAddress(EC2UnixLauncher.java:368)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.connectToSsh(EC2UnixLauncher.java:318)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.bootstrap(EC2UnixLauncher.java:282)
      	at hudson.plugins.ec2.ssh.EC2UnixLauncher.launchScript(EC2UnixLauncher.java:130)
      	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
      	at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:294)
      	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
      	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

       

      Reverting back to 1.39 solves the issue.

      I see m5d instances being mentioned in the changelog - which we are using - perhaps related? 

       

       

      Attachments

        Activity

          gils Gil Shinar added a comment -

          OK. Seems like it got worse. Usually it happened during the weekend or after a night. Now slaves fails to start during the day

          I'm out of ideas

          gils Gil Shinar added a comment - OK. Seems like it got worse. Usually it happened during the weekend or after a night. Now slaves fails to start during the day I'm out of ideas
          gils Gil Shinar added a comment -

          Yesterday morning I came and it looked fine. This morning, again, all instances were up and all slaves were offline while there were a few jobs in the queue waiting for a long time.

          It's not just not starting the slaves, it's even worse, the instances are running and wasting money which is the opposite of what this plugin is all about.

          gils Gil Shinar added a comment - Yesterday morning I came and it looked fine. This morning, again, all instances were up and all slaves were offline while there were a few jobs in the queue waiting for a long time. It's not just not starting the slaves, it's even worse, the instances are running and wasting money which is the opposite of what this plugin is all about.
          gils Gil Shinar added a comment -

          I have changed the configuration to use terminate/start instead of stop/start and for a week it works just fine. Maybe it might help solving this issue

          gils Gil Shinar added a comment - I have changed the configuration to use terminate/start instead of stop/start and for a week it works just fine. Maybe it might help solving this issue
          iamveritas ovi craciun added a comment - - edited

          Jenkins ver. 2.204.2
          EC2 plugin: 1.48

          we see a similar problem, it crashes on getPrivateIpAddress, or getPublicIpAddress call depending on what I have chosen as connection strategy (public IP or private IP).
          the interesting part is this: it fails for spot inst images c5.2xlarge (or any higher, I made sure we bid enough to get those instances), if I use t2.2xlarge it works well.

          here's the error we are seeing

          ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
          java.lang.NullPointerException
          	at hudson.plugins.ec2.EC2HostAddressProvider.getPrivateIpAddress(EC2HostAddressProvider.java:49)
          	at hudson.plugins.ec2.EC2HostAddressProvider.windows(EC2HostAddressProvider.java:28)
          	at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:134)
          	at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:39)
          	at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48)
          	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:291)
          	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
          	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
          	at java.util.concurrent.FutureTask.run(Unknown Source)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
          	at java.lang.Thread.run(Unknown Source)
          

          Later edit: 
          we updated the EC2 plugin to version 1.49.1 and it doesn't repro anymore.

          iamveritas ovi craciun added a comment - - edited Jenkins ver. 2.204.2 EC2 plugin: 1.48 we see a similar problem, it crashes on getPrivateIpAddress , or getPublicIpAddress call depending on what I have chosen as connection strategy (public IP or private IP). the interesting part is this: it fails for spot inst images c5.2xlarge (or any higher, I made sure we bid enough to get those instances), if I use t2.2xlarge it works well. here's the error we are seeing ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins java.lang.NullPointerException at hudson.plugins.ec2.EC2HostAddressProvider.getPrivateIpAddress(EC2HostAddressProvider.java:49) at hudson.plugins.ec2.EC2HostAddressProvider.windows(EC2HostAddressProvider.java:28) at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:134) at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:39) at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:291) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) Later edit:  we updated the EC2 plugin to version  1.49.1  and it doesn't repro anymore.
          mramonleon Ramon Leon added a comment -

          Closing as per latest comment

          mramonleon Ramon Leon added a comment - Closing as per latest comment

          People

            mramonleon Ramon Leon
            lifeofguenter Günter Grodotzki
            Votes:
            14 Vote for this issue
            Watchers:
            25 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: