Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-72192

Jenkins does not start until "nodes" dir is cleared

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • ec2-plugin
    • None

      Hello!  We just experienced a failure, where Jenkins would not come up after restart.

       

      ```

      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: 2023-10-16 22:48:49.072+0000 [id=26]#011SEVERE#011hudson.util.BootFailure#publish: Failed to initialize Jenkins
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: java.lang.InterruptedException
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at java.base/java.lang.Object.wait(Native Method)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at java.base/java.lang.Object.wait(Object.java:328)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:288)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at jenkins.model.Jenkins.executeReactor(Jenkins.java:1199)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at jenkins.model.Jenkins.<init>(Jenkins.java:987)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at hudson.model.Hudson.<init>(Hudson.java:86)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at hudson.model.Hudson.<init>(Hudson.java:82)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at hudson.WebAppMain$3.run(WebAppMain.java:247)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: Caused: hudson.util.HudsonFailedToLoad
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: at hudson.WebAppMain$3.run(WebAppMain.java:264)
      Oct 16 22:48:49 ip-172-16-53-13 jenkins_logs[10332]: 2023-10-16 22:48:49.075+0000 [id=26]#011SEVERE#011h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (Jenkins initialization thread/26) died unexpectedly due to an uncaught exception. This may leave your server corrupted and usually indicates a software bug.

      ```

      We are on Jenkins 2.401.3

       

      We were able to get Jenkins to start by renaming the "nodes" folder and creating a new empty "nodes" folder.

       

      I then checked all the config.xml files in the old folder, and they are all well-formed and valid XML.

      We did change the slave instance type earlier today, not sure if that had anything to do with it.

          [JENKINS-72192] Jenkins does not start until "nodes" dir is cleared

          We are experiencing the same problem where the removal of the EC2 nodes from the node folder helps.

          Startup error:

          java.lang.NullPointerException
          	at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:188)
          	at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:279)
          	at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:53)
          	at hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:192)
          	at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153)
          	at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255)
          	at hudson.model.Queue._withLock(Queue.java:1401)
          	at hudson.model.Queue.withLock(Queue.java:1275)
          	at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238)
          	at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1705)
          	at jenkins.model.Nodes$6.run(Nodes.java:351)
          	at hudson.model.Queue._withLock(Queue.java:1401)
          	at hudson.model.Queue.withLock(Queue.java:1275)
          	at jenkins.model.Nodes.load(Nodes.java:346)
          	at jenkins.model.Jenkins$12.run(Jenkins.java:3497)
          	at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:177)
          	at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305)
          	at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170)
          	at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221)
          	at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120)
          	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          	at java.base/java.lang.Thread.run(Thread.java:829)
          Caused: org.jvnet.hudson.reactor.ReactorException
          	at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290)
          	at jenkins.InitReactorRunner.run(InitReactorRunner.java:49)
          	at jenkins.model.Jenkins.executeReactor(Jenkins.java:1205)
          	at jenkins.model.Jenkins.<init>(Jenkins.java:992)
          	at hudson.model.Hudson.<init>(Hudson.java:86)
          	at hudson.model.Hudson.<init>(Hudson.java:82)
          	at hudson.WebAppMain$3.run(WebAppMain.java:247)
          Caused: hudson.util.HudsonFailedToLoad
          	at hudson.WebAppMain$3.run(WebAppMain.java:264) 

           

          Georg Blumenschein added a comment - We are experiencing the same problem where the removal of the EC2 nodes from the node folder helps. Startup error: java.lang.NullPointerException at hudson.plugins.ec2.EC2Computer.getState(EC2Computer.java:188) at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:279) at hudson.plugins.ec2.EC2RetentionStrategy.start(EC2RetentionStrategy.java:53) at hudson.model.AbstractCIBase.createNewComputerForNode(AbstractCIBase.java:192) at hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:153) at hudson.model.AbstractCIBase$1.run(AbstractCIBase.java:255) at hudson.model.Queue._withLock(Queue.java:1401) at hudson.model.Queue.withLock(Queue.java:1275) at hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:238) at jenkins.model.Jenkins.updateComputerList(Jenkins.java:1705) at jenkins.model.Nodes$6.run(Nodes.java:351) at hudson.model.Queue._withLock(Queue.java:1401) at hudson.model.Queue.withLock(Queue.java:1275) at jenkins.model.Nodes.load(Nodes.java:346) at jenkins.model.Jenkins$12.run(Jenkins.java:3497) at org.jvnet.hudson.reactor.TaskGraphBuilder$TaskImpl.run(TaskGraphBuilder.java:177) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:305) at jenkins.model.Jenkins$5.runTask(Jenkins.java:1170) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:221) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:120) at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang. Thread .run( Thread .java:829) Caused: org.jvnet.hudson.reactor.ReactorException at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:290) at jenkins.InitReactorRunner.run(InitReactorRunner.java:49) at jenkins.model.Jenkins.executeReactor(Jenkins.java:1205) at jenkins.model.Jenkins.<init>(Jenkins.java:992) at hudson.model.Hudson.<init>(Hudson.java:86) at hudson.model.Hudson.<init>(Hudson.java:82) at hudson.WebAppMain$3.run(WebAppMain.java:247) Caused: hudson.util.HudsonFailedToLoad at hudson.WebAppMain$3.run(WebAppMain.java:264)  

          Trying to use a startup script to delete the nodes also did not work:

          def jenkins = Jenkins.instance
          def computers = jenkins.computers
          computers.findAll {
              it.getClass().getName().contains('EC2')
          }.each {
              println it.displayName
              it.doDoDelete()
          }
           

          So it really has to be removed from the disk/pvc it seems

          Georg Blumenschein added a comment - Trying to use a startup script to delete the nodes also did not work: def jenkins = Jenkins.instance def computers = jenkins.computers computers.findAll {     it.getClass().getName().contains( 'EC2' ) }.each {     println it.displayName     it.doDoDelete() } So it really has to be removed from the disk/pvc it seems

            thoulen FABRIZIO MANFREDI
            atsaloli Aleksey Tsalolikhin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: