Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-54041

EC2 Plugin no longer able to start build slaves

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved (View Workflow)
    • Blocker
    • Resolution: Fixed
    • ec2-plugin
    • None
    • Jenkins 2.138.2
      EC2 Plugin 1.40.1

    Description

      When using the EC2-plugin 1.40.1 all of my builds stop working.

       

      They are configured to only run on cloud slaves and the master node has no executors. They all queue up waiting for an executor, however all of the executors are offline and never start.

       

      Downgrading to EC2 plugin 1.39 fixes the issue.

       

       

      Log files:

       

      Failed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl
      java.lang.ClassNotFoundException: org.jenkinsci.plugins.blockqueuedjob.condition.BlockQueueCondition$BlockQueueConditionDescriptor
      	at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1374)
      	at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1327)
      	at jenkins.util.AntClassLoader.loadClass(AntClassLoader.java:1080)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      Caused: java.lang.NoClassDefFoundError: org/jenkinsci/plugins/blockqueuedjob/condition/BlockQueueCondition$BlockQueueConditionDescriptor
      	at java.lang.ClassLoader.defineClass1(Native Method)
      	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
      	at jenkins.util.AntClassLoader.defineClassFromData(AntClassLoader.java:1140)
      	at hudson.ClassicPluginStrategy$AntClassLoader2.defineClassFromData(ClassicPluginStrategy.java:858)
      	at jenkins.util.AntClassLoader.getClassFromStream(AntClassLoader.java:1311)
      	at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1364)
      	at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1327)
      	at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at jenkins.ClassLoaderReflectionToolkit.invoke(ClassLoaderReflectionToolkit.java:44)
      	at jenkins.ClassLoaderReflectionToolkit._findClass(ClassLoaderReflectionToolkit.java:81)
      	at hudson.PluginManager$UberClassLoader.findClass(PluginManager.java:1893)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
      	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
      	at org.jvnet.hudson.annotation_indexer.Index$2$1.fetch(Index.java:99)
      	at org.jvnet.hudson.annotation_indexer.Index$2$1.hasNext(Index.java:73)
      	at 
      .plugins.ec2.EC2Cloud provisionSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Attempting to provision slave needed by excess workload of 1 units
      Oct 12, 2018 7:52:06 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Cannot provision - no capacity for instances: 0
      Oct 12, 2018 7:52:06 AM WARNING hudson.plugins.ec2.EC2Cloud provisionCan't raise nodes for SlaveTemplate{ami='ami-xxx', labels='cloud-slave'}
      Oct 12, 2018 7:52:15 AM INFO hudson.plugins.ec2.EC2Cloud provisionSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Attempting to provision slave needed by excess workload of 1 units
      Oct 12, 2018 7:52:16 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Cannot provision - no capacity for instances: 0
      Oct 12, 2018 7:52:16 AM WARNING hudson.plugins.ec2.EC2Cloud provisionCan't raise nodes for SlaveTemplate{ami='ami-xxx', labels='cloud-slave'}
      

        

       

      Full stacktrace: https://pastebin.com/3UrfQ1vu

       

       

      Attachments

        Issue Links

          Activity

            gils Gil Shinar added a comment -

            I'm facing something similar. Sometimes instances are up and running and all the slaves are offline. Only removing the slaves from the jenkins or launching them again fixes that issue.

            I saw that there might be a fix in version 1.45 but there's a warning message saying that upgrading to this version might result in configuration issues. Is that right?

            I've tried to comment in JENKINS-53952 but for some reason I cannot (browser freezes). No matter which browser I'm using.

            gils Gil Shinar added a comment - I'm facing something similar. Sometimes instances are up and running and all the slaves are offline. Only removing the slaves from the jenkins or launching them again fixes that issue. I saw that there might be a fix in version 1.45 but there's a warning message saying that upgrading to this version might result in configuration issues. Is that right? I've tried to comment in  JENKINS-53952 but for some reason I cannot (browser freezes). No matter which browser I'm using.

            Fixed in 1.41 and 1.42

            thoulen FABRIZIO MANFREDI added a comment - Fixed in 1.41 and 1.42

            The problem is related to the CAP calculation that counts the stopped node as running. I am preparing the fix

            thoulen FABRIZIO MANFREDI added a comment - The problem is related to the CAP calculation that counts the stopped node as running. I am preparing the fix
            davidgoate David Goate added a comment -

            Is there any other enhanced log level/appender I could enable or settings I could export which might help with this. I found a way to get builds running (manually log into the EC2 console and start the slaves which are in the stopped state and then once they are running go into the jenkins nodes section and start the agent)

             

             

            When I manually start the agent, I do get a few warnings/errors, but ultimately it does attach and run and builds work.

             

            @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
            @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
            @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
            IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
            
            
            >ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins
            java.lang.IllegalStateException: Already connected
            
            
            Agent successfully connected and online
            Oct 30, 2018 8:48:13 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn
            WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https://jenkins.io/redirect/serialization-of-anonymous-classes/
            
            
            davidgoate David Goate added a comment - Is there any other enhanced log level/appender I could enable or settings I could export which might help with this. I found a way to get builds running (manually log into the EC2 console and start the slaves which are in the stopped state and then once they are running go into the jenkins nodes section and start the agent)     When I manually start the agent, I do get a few warnings/errors, but ultimately it does attach and run and builds work.   @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! >ERROR: Unexpected error in launching an agent. This is probably a bug in Jenkins java.lang.IllegalStateException: Already connected Agent successfully connected and online Oct 30, 2018 8:48:13 AM org.jenkinsci.remoting.util.AnonymousClassWarnings warn WARNING: Attempt to (de-)serialize anonymous class org.jenkinsci.plugins.gitclient.Git$1; see: https: //jenkins.io/redirect/serialization-of-anonymous-classes/
            davidgoate David Goate added a comment -

            Further note on something I see in the logs which may be related/help debug the issue:

             

            Oct 30, 2018 8:34:38 AM INFO hudson.plugins.ec2.EC2Cloud provisionSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Attempting to provision slave needed by excess workload of 2 units
            Oct 30, 2018 8:34:38 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami='ami-xxx', labels='cloud-slave'}. Cannot provision - no capacity for instances: 0
            Oct 30, 2018 8:34:38 AM WARNING hudson.plugins.ec2.EC2Cloud provisionCan't raise nodes for SlaveTemplate{ami='ami-xxx', labels='cloud-slave'}
            
            davidgoate David Goate added a comment - Further note on something I see in the logs which may be related/help debug the issue:   Oct 30, 2018 8:34:38 AM INFO hudson.plugins.ec2.EC2Cloud provisionSlaveTemplate{ami= 'ami-xxx' , labels= 'cloud-slave' }. Attempting to provision slave needed by excess workload of 2 units Oct 30, 2018 8:34:38 AM INFO hudson.plugins.ec2.EC2Cloud getNewOrExistingAvailableSlaveSlaveTemplate{ami= 'ami-xxx' , labels= 'cloud-slave' }. Cannot provision - no capacity for instances: 0 Oct 30, 2018 8:34:38 AM WARNING hudson.plugins.ec2.EC2Cloud provisionCan 't raise nodes for SlaveTemplate{ami=' ami-xxx ', labels=' cloud-slave'}
            davidgoate David Goate added a comment - - edited

            On latest version (1.41) I still get the warning:

             

             

            WARNING: Failed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl
            java.lang.NoClassDefFoundError: org/jenkinsci/plugins/blockqueuedjob/condition/BlockQueueCondition$BlockQueueConditionDescriptor
            

             

             

            Reverting back to 1.39 now also gives:

             

            Connecting to 10.1.1.90 on port 22, with timeout 10000.
            Oct 29, 2018 11:19:39 AM WARNING org.jvnet.hudson.annotation_indexer.Index$2$1 fetchFailed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl
            java.lang.ClassNotFoundException: org.jenkinsci.plugins.blockqueuedjob.condition.BlockQueueCondition$BlockQueueConditionDescriptor
            	at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1374)
            	at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1327)
            	at jenkins.util.AntClassLoader.loadClass(AntClassLoader.java:1080)
            	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            

             

            So my previous "fix" for this (downgrading) no longer fixes the problem. I think after upgrading to 1.41 some configuration files might have changed as the plugin manager in Jenkins displayed a red warning saying that it might not be possible to downgrade.

            Currently none of my Jenkins builds will run automatically. 

             

            davidgoate David Goate added a comment - - edited On latest version (1.41) I still get the warning:     WARNING: Failed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl java.lang.NoClassDefFoundError: org/jenkinsci/plugins/blockqueuedjob/condition/BlockQueueCondition$BlockQueueConditionDescriptor     Reverting back to 1.39 now also gives:   Connecting to 10.1.1.90 on port 22, with timeout 10000. Oct 29, 2018 11:19:39 AM WARNING org.jvnet.hudson.annotation_indexer.Index$2$1 fetchFailed to load org.jenkinsci.plugins.github.pullrequest.extra.GitHubPRLabelUnblockQueueCondition$DescriptorImpl java.lang.ClassNotFoundException: org.jenkinsci.plugins.blockqueuedjob.condition.BlockQueueCondition$BlockQueueConditionDescriptor at jenkins.util.AntClassLoader.findClassInComponents(AntClassLoader.java:1374) at jenkins.util.AntClassLoader.findClass(AntClassLoader.java:1327) at jenkins.util.AntClassLoader.loadClass(AntClassLoader.java:1080) at java.lang. ClassLoader .loadClass( ClassLoader .java:357)   So my previous "fix" for this (downgrading) no longer fixes the problem. I think after upgrading to 1.41 some configuration files might have changed as the plugin manager in Jenkins displayed a red warning saying that it might not be possible to downgrade. Currently none of my Jenkins builds will run automatically.   
            davidgoate David Goate added a comment -

            Excellent, thanks for that. Looking forward to the new release 

             

            davidgoate David Goate added a comment - Excellent, thanks for that. Looking forward to the new release   
            thoulen FABRIZIO MANFREDI added a comment - - edited

            Unfortunately the stop instance option is not working properly in the 1.40.x.

            I am going to prepare the patch should be ready in the next days. 

            This ia duplicated of :

            JENKINS-53920

            thoulen FABRIZIO MANFREDI added a comment - - edited Unfortunately the stop instance option is not working properly in the 1.40.x. I am going to prepare the patch should be ready in the next days.  This ia duplicated of : JENKINS-53920

            People

              thoulen FABRIZIO MANFREDI
              davidgoate David Goate
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: