Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-57795

Orphaned EC2 instances after Jenkins restart

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Component/s: ec2-plugin
    • Labels:
      None
    • Environment:
      Jenkins ver. 2.176.1, 2.204.2
      ec2 plugin 1.43, 1.44, 1.45, 1.49.1
    • Similar Issues:
    • Released As:
      ec2 1.51

      Description

      Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

      The plugin will just loop on this:

      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
      May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
      SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
      May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
      Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
      

      If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

      It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

      We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

      We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.

        Attachments

          Issue Links

            Activity

            jbochenski Jakub Bochenski created issue -
            jbochenski Jakub Bochenski made changes -
            Field Original Value New Value
            Description Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

            The plugin will just loop on this:
            {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
            May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
            SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
            May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
            Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
            {code}

            If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

            It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

            We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

            It seems the problems do not occur when I do a `/safeRestart` but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

            We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
            Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

            The plugin will just loop on this:
            {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
            May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
            SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
            May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
            Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
            {code}

            If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

            It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

            We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

            It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

            We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
            jbochenski Jakub Bochenski made changes -
            Description Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

            The plugin will just loop on this:
            {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
            May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
            SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
            May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
            Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
            {code}

            If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

            It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

            We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

            It seems the problems do not occur when I do a {{/safeRestart}} but they do if I use e.g. "restart Jenkins when no jobs are running" form the Update Center.

            We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
            Sometimes after a Jenkins restart the plugin won't be able to spawn more agents.

            The plugin will just loop on this:
            {code}SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Attempting to provision slave needed by excess workload of 1 units
            May 31, 2019 2:23:53 PM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlave
            SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}. Cannot provision - no capacity for instances: 0
            May 31, 2019 2:23:53 PM WARNING hudson.plugins.ec2.EC2Cloud provision
            Can't raise nodes for SlaveTemplate{ami='ami-0efbb291c6e8cc847', labels='docker'}
            {code}

            If I go to the EC2 console and terminate the instance manually the plugin will spawn a new one and use it.

            It seems like there is some mismatch in the plugin logic. The part responsible for calculating the number of instances and checking the cap sees the EC2 instance. However the part responsible for picking up running EC2 instances doesn't seem to be able to find it.

            We use a single subnet, security group and vpc (I've seen some reports about this causing problems).

            We use instanceCap = 1 setting as we are testing the plugin, this might make this problem more visible than with a higher cap.
            jbochenski Jakub Bochenski made changes -
            Environment Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44
            jbochenski Jakub Bochenski made changes -
            Environment Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44
            Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44, 1.45
            sirzic cedric lecoz made changes -
            Attachment jenkins_201909121030.log [ 48722 ]
            sirzic cedric lecoz made changes -
            Attachment jenkins.temp_dsl.log [ 48747 ]
            raihaan Raihaan Shouhell made changes -
            Attachment ec2.hpi [ 48761 ]
            raihaan Raihaan Shouhell made changes -
            Attachment ec2.hpi [ 48761 ]
            sirzic cedric lecoz made changes -
            jbochenski Jakub Bochenski made changes -
            Environment Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44, 1.45
            Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44, 1.45, 1.49.1
            jbochenski Jakub Bochenski made changes -
            Environment Jenkins ver. 2.176.1
            ec2 plugin 1.43, 1.44, 1.45, 1.49.1
            Jenkins ver. 2.176.1, 2.204.2
            ec2 plugin 1.43, 1.44, 1.45, 1.49.1
            raihaan Raihaan Shouhell made changes -
            Remote Link This issue links to "PR-448 (Web Link)" [ 26226 ]
            raihaan Raihaan Shouhell made changes -
            Released As ec2 1.51
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]

              People

              Assignee:
              thoulen FABRIZIO MANFREDI
              Reporter:
              jbochenski Jakub Bochenski
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: