Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70645

failed to startup jenkins server with ec2 plugin

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None

      After installing the latest ec2-plugin (17-feb-2023) 
      I get an error

      // java.lang.NullPointerException
        at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:188)
        at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:279)
        at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:53)
        at hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:192)
        at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
        at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255)
        at hudson.model.Queue._withLock(Queue.java:1395)
        at hudson.model.Queue.withLock(Queue.java:1269)
        at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
        at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1683)
        at jenkins.model.Nodes$6.run(Nodes.java:351)
        at hudson.model.Queue._withLock(Queue.java:1395)
        at hudson.model.Queue.withLock(Queue.java:1269)
        at jenkins.model.Nodes.load(Nodes.java:346)
        at jenkins.model.Jenkins$12.run(Jenkins.java:3429)
        at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:177)
        at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
        at jenkins.model.Jenkins$5.runTask(Jenkins.java:1164)
        at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
        at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
      Caused: org.jvnet.hudson.reactor.ReactorException
        at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290)
        at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
        at jenkins.model.Jenkins.executeReactor(Jenkins.java:1199)
        at jenkins.model.Jenkins.<init>(Jenkins.java:987)
        at hudson.model.Hudson.<init>(Hudson.java:86)
        at hudson.model.Hudson.<init>(Hudson.java:82)
        at hudson.WebAppMain$3.run(WebAppMain.java:247)
      Caused: hudson.util.HudsonFailedToLoad
        at hudson.WebAppMain$3.run(WebAppMain.java:264)

       

      The only way for me to get out of this issue is 
      sudo systemctl stop jenkins
      from /var/lib/jenkins/plugins
      remove the ec2 folder
      remove the ec2.jpi file
      remove the cloud configuration from the config.xml file

      Don't know how I can re-enable the plugin anymore...!

          [JENKINS-70645] failed to startup jenkins server with ec2 plugin

          M added a comment -

          the plugin is at version 2.0.5

          in one environment I have running version 2.0.5 correctly, in another environment I get this issue. 
          It seems it is trying to determine the state of a prior configured ec2 node configuration, but I don't know how to clear that, or to just skip this.

          in the java code I read

          • This method returns a cached state, so it's not suitable to check {@link Instance#getState()}

            from the returned

          • instance (but all the other fields are valid as it won't change.)
            *
          • The cache can be flushed using {@link #updateInstanceDescription()}

          what can I do to flush this cache. Where is that stored.

          M added a comment - the plugin is at version 2.0.5 in one environment I have running version 2.0.5 correctly, in another environment I get this issue.  It seems it is trying to determine the state of a prior configured ec2 node configuration, but I don't know how to clear that, or to just skip this. in the java code I read This method returns a cached state, so it's not suitable to check {@link Instance#getState()} from the returned instance (but all the other fields are valid as it won't change.) * The cache can be flushed using {@link #updateInstanceDescription()} what can I do to flush this cache. Where is that stored.

          M added a comment -

          I found it. On the jenkins server there is a folder

          /var/lib/jenkins/nodes/<dynamic nodes folders>

          if you make certain changes to the ec2-node configuration, the files in that folder can be out of sync and it is trying to load up something that does not exist.

          The worst part is that jenkins server is not handling that too smootly. The ec2 plugin throws an error during startup and the jenkins server does not want to launch.

          The solution in my case was to delete that particular folder under /var/lib/jenkins/nodes/

          What would be good is to handle such a situation a bit more gracefully, so that the jenkins server can still start up....

          (I lowered the priority)

          M added a comment - I found it. On the jenkins server there is a folder /var/lib/jenkins/nodes/<dynamic nodes folders> if you make certain changes to the ec2-node configuration, the files in that folder can be out of sync and it is trying to load up something that does not exist. The worst part is that jenkins server is not handling that too smootly. The ec2 plugin throws an error during startup and the jenkins server does not want to launch. The solution in my case was to delete that particular folder under /var/lib/jenkins/nodes/ What would be good is to handle such a situation a bit more gracefully, so that the jenkins server can still start up.... (I lowered the priority)

          Josh Branham added a comment -

          I hit this same issue, deleting the `$JENKINS_HOME/nodes/$name` folder and restarting fixed it.

          Josh Branham added a comment - I hit this same issue, deleting the `$JENKINS_HOME/nodes/$name` folder and restarting fixed it.

          Allan BURDAJEWICZ added a comment - - edited

          Not sure if the Instance#getState may be null from the aws-java-sdk though CloudHelper.getInstanceWithRetry may return null in some circumstances:

          So this needs to be addressed.

          A simple reproducer to illustrate this. Provision EC2 agents. Once provisioned, rename your EC2 Cloud. Then restart Jenkins.

          If anybody is able to reproduce this, please collect the `$JENKINS_HOME/nodes/*/config.xml` of a node that causes this. And check in particular the <instanceId></instanceId> and <cloudName></cloudName>.

          Allan BURDAJEWICZ added a comment - - edited Not sure if the Instance#getState may be null from the aws-java-sdk though CloudHelper.getInstanceWithRetry may return null in some circumstances: https://github.com/jenkinsci/ec2-plugin/blob/ec2-2.0.5/src/main/java/hudson/plugins/ec2/CloudHelper.java#L49 https://github.com/jenkinsci/ec2-plugin/blob/ec2-2.0.5/src/main/java/hudson/plugins/ec2/EC2Computer.java#L187 So this needs to be addressed. A simple reproducer to illustrate this. Provision EC2 agents. Once provisioned, rename your EC2 Cloud. Then restart Jenkins. If anybody is able to reproduce this, please collect the `$JENKINS_HOME/nodes/*/config.xml` of a node that causes this. And check in particular the <instanceId></instanceId> and <cloudName></cloudName> .

          Denis Bel added a comment - - edited

          I was able to fix it by following steps:

          1. Update EC2 plugin but not restart
          2. Update CasC config if it is in use (change <JENKINS_HOME>/jenkins.yaml):
           change all occurrences:

          clouds:
            - amazonEC2:
                cloudName: "some cloud name"
          ...

          to

          clouds:
            - amazonEC2:
                name: "some cloud name"
          ... 

          3. Update existing nodes configuration

          cd <JENKINS_HOME>
          find nodes -type f -name 'config.xml' -exec sed -i 's/<cloudName>ec2-/<cloudName>/g' {} \; 

          4. Restart Jenkins

           

           

          Denis Bel added a comment - - edited I was able to fix it by following steps: 1. Update EC2 plugin but not restart 2. Update CasC config if it is in use (change <JENKINS_HOME>/jenkins.yaml):  change all occurrences: clouds: - amazonEC2: cloudName: "some cloud name" ... to clouds: - amazonEC2: name: "some cloud name" ... 3. Update existing nodes configuration cd <JENKINS_HOME> find nodes -type f -name 'config.xml' -exec sed -i 's/<cloudName>ec2-/<cloudName>/g' {} \; 4. Restart Jenkins    

            thoulen FABRIZIO MANFREDI
            sentient M
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: