Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-15319

Failure starting slave nodes

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • ec2-plugin
    • None
    • Jenkins 1.481

    Description

      Every day or so I get errors like this when starting a slave:

      ERROR: The instance ID 'i-6fdc2512' does not exist
      Status Code: 400, AWS Service: AmazonEC2, AWS Request ID: 195aae09-8d42-4f24-9a30-2d31b7447818, AWS Error Code: InvalidInstanceID.NotFound, AWS Error Message: The instance ID 'i-6fdc2512' does not exist
      at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:547)
      at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:284)
      at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:169)
      at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.java:5684)
      at com.amazonaws.services.ec2.AmazonEC2Client.describeInstances(AmazonEC2Client.java:2543)
      at hudson.plugins.ec2.EC2Computer._describeInstance(EC2Computer.java:97)
      at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:76)
      at hudson.plugins.ec2.EC2ComputerLauncher.launch(EC2ComputerLauncher.java:33)
      at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:200)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      and I have to manually delete them to unblock jobs. The odd part is that this instance actually does exists on the EC2 Mgmnt console. Any idea what this is about and how I can resolve it ?

      -John

      Attachments

        Activity

          johntdyer John Dyer created issue -
          johntdyer John Dyer made changes -
          Field Original Value New Value
          Summary Error startign slave nodes Failure starting slave nodes
          johntdyer John Dyer added a comment -

          I have found a few reports of this type of error on the Amazon forums and it seems to be some sort of race condition when calling describeInstance() within a few seconds of creating the instance. I guess that instance hasn't propagated in the AWS system.... ? I was thinking the simply solution here might be to add a catch this and add a configurable retry counter with a delay on this type of exception. Thoughts?

          johntdyer John Dyer added a comment - I have found a few reports of this type of error on the Amazon forums and it seems to be some sort of race condition when calling describeInstance() within a few seconds of creating the instance. I guess that instance hasn't propagated in the AWS system.... ? I was thinking the simply solution here might be to add a catch this and add a configurable retry counter with a delay on this type of exception. Thoughts?
          johntdyer John Dyer made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          johntdyer John Dyer added a comment -

          Is this project still active ? I ask because i have seen many tickets posted without any update....

          -John

          johntdyer John Dyer added a comment - Is this project still active ? I ask because i have seen many tickets posted without any update.... -John
          francisu Francis Upton added a comment -

          Hi John,

          The project is active; if you look at the release history you can see many things have been fixed and functionality added in the last 6 months since I have started working as the main committer (both code I have done personally and many contributions from others). However there remains a lot of work to do. I'm a very part time (and busy) committer on this as a volunteer, so I cannot respond immediately to issues.

          I realize this is a very important issue and will try to get to it when I have time.

          The quickest way to get this fixed is to make and test a pull request that resolves the issue (bonus points if you can identify all of the JIRAs that your fix resolves). If you can do this, I can have a look and commit it pretty quickly.

          Others have had success also offering to pay for my time to resolve problems, that's another option and you can contact me at francis@oaklandsoftware.com.

          Kind regards,
          Francis

          francisu Francis Upton added a comment - Hi John, The project is active; if you look at the release history you can see many things have been fixed and functionality added in the last 6 months since I have started working as the main committer (both code I have done personally and many contributions from others). However there remains a lot of work to do. I'm a very part time (and busy) committer on this as a volunteer, so I cannot respond immediately to issues. I realize this is a very important issue and will try to get to it when I have time. The quickest way to get this fixed is to make and test a pull request that resolves the issue (bonus points if you can identify all of the JIRAs that your fix resolves). If you can do this, I can have a look and commit it pretty quickly. Others have had success also offering to pay for my time to resolve problems, that's another option and you can contact me at francis@oaklandsoftware.com. Kind regards, Francis
          johntdyer John Dyer added a comment -

          Francis,

          Unfortunately I am not really a java developer, but I'll muck around and see what I can figure out I suppose.

          -John

          johntdyer John Dyer added a comment - Francis, Unfortunately I am not really a java developer, but I'll muck around and see what I can figure out I suppose. -John
          vinzpr vinz r added a comment -

          I also encounter the same problem. Any update on this?

          Thanks,

          Vinz

          vinzpr vinz r added a comment - I also encounter the same problem. Any update on this? Thanks, Vinz
          glasser David Glasser added a comment -

          Here's a pull request to retry the request a few times: https://github.com/jenkinsci/ec2-plugin/pull/38

          (I've observed on other EC2 automation projects that DescribeInstances immediately after a RunInstances often fails and have had to resort to the retry loop.)

          glasser David Glasser added a comment - Here's a pull request to retry the request a few times: https://github.com/jenkinsci/ec2-plugin/pull/38 (I've observed on other EC2 automation projects that DescribeInstances immediately after a RunInstances often fails and have had to resort to the retry loop.)

          Code changed in jenkins
          User: David Glasser
          Path:
          src/main/java/hudson/plugins/ec2/EC2Computer.java
          http://jenkins-ci.org/commit/ec2-plugin/c4a24a9bfb7993e0723953d0f155761cb10f733d
          Log:
          Retries DescribeInstance calls a few times if it fails.

          Addresses JENKINS-15319.

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: David Glasser Path: src/main/java/hudson/plugins/ec2/EC2Computer.java http://jenkins-ci.org/commit/ec2-plugin/c4a24a9bfb7993e0723953d0f155761cb10f733d Log: Retries DescribeInstance calls a few times if it fails. Addresses JENKINS-15319 .

          Code changed in jenkins
          User: Francis Upton
          Path:
          src/main/java/hudson/plugins/ec2/EC2Computer.java
          http://jenkins-ci.org/commit/ec2-plugin/76c95f9c4a02917d09e28e4e4efa705c845f30df
          Log:
          Merge pull request #38 from meteor/retry-describe-instances

          Retries DescribeInstance calls a few times if it fails. JENKINS-15319

          Compare: https://github.com/jenkinsci/ec2-plugin/compare/64a311d5dc20...76c95f9c4a02


          You received this message because you are subscribed to the Google Groups "Jenkins Commits" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com.
          For more options, visit https://groups.google.com/groups/opt_out.

          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Francis Upton Path: src/main/java/hudson/plugins/ec2/EC2Computer.java http://jenkins-ci.org/commit/ec2-plugin/76c95f9c4a02917d09e28e4e4efa705c845f30df Log: Merge pull request #38 from meteor/retry-describe-instances Retries DescribeInstance calls a few times if it fails. JENKINS-15319 Compare: https://github.com/jenkinsci/ec2-plugin/compare/64a311d5dc20...76c95f9c4a02 – You received this message because you are subscribed to the Google Groups "Jenkins Commits" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-commits+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out .
          johntdyer John Dyer added a comment -

          Any idea when this build will be available

          johntdyer John Dyer added a comment - Any idea when this build will be available
          johntdyer John Dyer added a comment -

          Anyone ?

          johntdyer John Dyer added a comment - Anyone ?
          francisu Francis Upton added a comment -

          Hi John, I will do a release in the next couple of days which will include this fix. Sorry for the delay.

          francisu Francis Upton added a comment - Hi John, I will do a release in the next couple of days which will include this fix. Sorry for the delay.
          francisu Francis Upton made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          rtyler R. Tyler Croy made changes -
          Workflow JNJira [ 146042 ] JNJira + In-Review [ 191756 ]

          People

            francisu Francis Upton
            johntdyer John Dyer
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: