Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67190

EC2-plugin not spooling up stopped nodes, starting new nodes instead

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Blocker Blocker
    • ec2-plugin
    • None
    • Jenkins 2.303.3, Ec2 plugin 1.66

      The Jenkins EC2 plugin isn't starting existing (stopped) nodes, instead, it always starts a new one. I can see the past instances as offline in the nodes tab, but instead of starting these nodes, the plugin always starts a new instance instead.

          [JENKINS-67190] EC2-plugin not spooling up stopped nodes, starting new nodes instead

          Bruno Esteves created issue -

          Bruno Esteves added a comment -

          I have set the instance cap to 1, and that just makes the queue stuck. It says the stopped instance is offline and won't try to connect to it. The same thing happens if I turn the instance on in AWS, I need to go on Jenkins and select "launch agent" for Jenkins to recognize the agent as online and the build to start.

           

          The expected behavior should be that Jenkins starts the instance automatically and tries to reconnect to it

          Bruno Esteves added a comment - I have set the instance cap to 1, and that just makes the queue stuck. It says the stopped instance is offline and won't try to connect to it. The same thing happens if I turn the instance on in AWS, I need to go on Jenkins and select "launch agent" for Jenkins to recognize the agent as online and the build to start.   The expected behavior should be that Jenkins starts the instance automatically and tries to reconnect to it

          Bruno Esteves added a comment -

          I'm getting this when I try to manually launch the agents

          Nov 23, 2021 4:48:10 PM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0
          Started EC2 alive agents monitor
          Nov 23, 2021 4:48:10 PM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0
          Finished EC2 alive agents monitor. 110 ms
          Nov 23, 2021 4:48:30 PM WARNING hudson.plugins.ec2.win.WinConnection pingFailingIfSSHHandShakeError
          Failed to verify connectivity to Windows agent java.net.SocketTimeoutException: connect timed out at java.base/java.net.PlainSocketImpl.waitForConnect(Native Method) at java.base/java.net.PlainSocketImpl.socketConnect(PlainSocketImpl.java:107) at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) at java.base/java.net.Socket.connect(Socket.java:609) at com.hierynomus.protocol.commons.socket.ProxySocketFactory.createSocket(ProxySocketFactory.java:87) at com.hierynomus.protocol.commons.socket.ProxySocketFactory.createSocket(ProxySocketFactory.java:63) at com.hierynomus.smbj.transport.tcp.direct.DirectTcpTransport.connect(DirectTcpTransport.java:88) at com.hierynomus.smbj.connection.Connection.connect(Connection.java:139) at com.hierynomus.smbj.SMBClient.getEstablishedOrConnect(SMBClient.java:96) at com.hierynomus.smbj.SMBClient.connect(SMBClient.java:71) at hudson.plugins.ec2.win.WinConnection.pingFailingIfSSHHandShakeError(WinConnection.java:135) at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:189) at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:52) at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:293) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)

           

          Bruno Esteves added a comment - I'm getting this when I try to manually launch the agents Nov 23, 2021 4:48:10 PM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0 Started EC2 alive agents monitor Nov 23, 2021 4:48:10 PM INFO hudson.model.AsyncPeriodicWork lambda$doRun$0 Finished EC2 alive agents monitor. 110 ms Nov 23, 2021 4:48:30 PM WARNING hudson.plugins.ec2.win.WinConnection pingFailingIfSSHHandShakeError Failed to verify connectivity to Windows agent java.net.SocketTimeoutException: connect timed out at java.base/java.net.PlainSocketImpl.waitForConnect(Native Method) at java.base/java.net.PlainSocketImpl.socketConnect(PlainSocketImpl.java:107) at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) at java.base/java.net.Socket.connect(Socket.java:609) at com.hierynomus.protocol.commons.socket.ProxySocketFactory.createSocket(ProxySocketFactory.java:87) at com.hierynomus.protocol.commons.socket.ProxySocketFactory.createSocket(ProxySocketFactory.java:63) at com.hierynomus.smbj.transport.tcp.direct.DirectTcpTransport.connect(DirectTcpTransport.java:88) at com.hierynomus.smbj.connection.Connection.connect(Connection.java:139) at com.hierynomus.smbj.SMBClient.getEstablishedOrConnect(SMBClient.java:96) at com.hierynomus.smbj.SMBClient.connect(SMBClient.java:71) at hudson.plugins.ec2.win.WinConnection.pingFailingIfSSHHandShakeError(WinConnection.java:135) at hudson.plugins.ec2.win.EC2WindowsLauncher.connectToWinRM(EC2WindowsLauncher.java:189) at hudson.plugins.ec2.win.EC2WindowsLauncher.launchScript(EC2WindowsLauncher.java:52) at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:48) at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:293) at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46) at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:834)  

          Bruno Esteves added a comment -

          I should also mention that the slaves are windows instances

          Bruno Esteves added a comment - I should also mention that the slaves are windows instances
          Bruno Esteves made changes -
          Priority Original: Major [ 3 ] New: Blocker [ 1 ]

          Dante Kiaunis added a comment - - edited

          Having this same issue on our jenkins instance. I'm able to unblock it by manually turning the AWS ec2 on. Then everything continues to work as normal. This happens with our windows and linux agents.

          Dante Kiaunis added a comment - - edited Having this same issue on our jenkins instance. I'm able to unblock it by manually turning the AWS ec2 on. Then everything continues to work as normal. This happens with our windows and linux agents.
          Bruno Esteves made changes -
          Assignee Original: FABRIZIO MANFREDI [ thoulen ]

          Same issue on my instance, no error, no exception, no call is sent to AWS EC2 to initiate instance startup.

          The cap instance is not respected at global or AMI level.

          We have to force instances to be deleted to control the number of subnodes.

          It's a major issue as it lead to loss cost control.

           

          florian Locqueneux added a comment - Same issue on my instance, no error, no exception, no call is sent to AWS EC2 to initiate instance startup. The cap instance is not respected at global or AMI level. We have to force instances to be deleted to control the number of subnodes. It's a major issue as it lead to loss cost control.  

          Cuong added a comment - - edited

          I also encountered this issue in my Jenkins setup.
          Can confirm that it is only on EC2 version 1.66. Downgrading to 1.65 and things start working.
          My bad. It was actually a rerun build that just use Jenkins scheduling to reuse the same node.
          Issue still persists in 1.65 when scheduling a new build.

          Cuong added a comment - - edited I also encountered this issue in my Jenkins setup. Can confirm that it is only on EC2 version 1.66. Downgrading to 1.65 and things start working. My bad. It was actually a rerun build that just use Jenkins scheduling to reuse the same node. Issue still persists in 1.65 when scheduling a new build.

          greyjackal have you found a way to make it a reproducible situation? We experience almost the same but have not been able to consistently reproduce it. It seems to happen randomly? Also wondering for completeness what is your EC2 template configuration look like. Do you have additional arguments, boot delay, etc.

          Matthias Glastra added a comment - greyjackal have you found a way to make it a reproducible situation? We experience almost the same but have not been able to consistently reproduce it. It seems to happen randomly? Also wondering for completeness what is your EC2 template configuration look like. Do you have additional arguments, boot delay, etc.

            Unassigned Unassigned
            greyjackal Bruno Esteves
            Votes:
            4 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: