Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-7707

Multiple dead executors on slaves post 1.379 upgrade

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • remoting
    • None
    • CentOS Linux 5.x kernel 2.6.18-194.3.1.el5
      hudson.war 1.379 under Tomcat 5.5.28
      Slave OSs: CentOS Linux 5.x, Windows XP 32bit, Windows Server 2008 64bit

      Post upgrade to 1.379 we are experiencing increased ocurrances of dead executors on our slave systems. Prior to this release we had never encountered a dead executor on any system, master or slave. Immediately after deploying the 1.379 WAR, 6 executors spread out among a variety of slave platforms (Linux, WinXP 32bit, Win2k8 64bit) died. Today one more died on a Linux slave. Restarting Hudson clears out the dead executors, but disconnecting and reconnecting the slaves does not. I have not tried rebooting the slaves themselves yet. The stack trace below has consistently been the output associated with the dead executors.

      java.lang.AbstractMethodError
      at hudson.model.Executor.getEstimatedRemainingTimeMillis(Executor.java:340)
      at hudson.model.queue.LoadPredictor$CurrentlyRunningTasks.predict(LoadPredictor.java:77)
      at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:303)
      at hudson.model.Queue.pop(Queue.java:753)
      at hudson.model.Executor.grabJob(Executor.java:175)
      at hudson.model.Executor.run(Executor.java:113)

          [JENKINS-7707] Multiple dead executors on slaves post 1.379 upgrade

          dru_n created issue -

          davidkarlsen added a comment -

          Any news on this? I can give a snapshot a spin if needed.

          davidkarlsen added a comment - Any news on this? I can give a snapshot a spin if needed.

          kutzi added a comment -

          Can you comfirm that all your slaves use Hudson 1.379, too?

          kutzi added a comment - Can you comfirm that all your slaves use Hudson 1.379, too?

          Andrew Bayer added a comment -

          I believe this is fixed in 1.380.

          Andrew Bayer added a comment - I believe this is fixed in 1.380.

          Hello,

          I have exactly the same problem on my forge.
          I have 1 master with 3 slaves.
          I upgrade my server from 1.374 to 1.384 and after I have a lot of Exceptions. This is the same of top.

          I try to restart queues with a groovy script but it's unsuccess. I can just list crash queues

          import hudson.model.*;
          hudson = Hudson.instance

          def computers= hudson.computers
          computers*.executors*.each {

          if (it.causeOfDeath != null) {
          println "${it.owner.caption} : ${it.displayName}=============="
          println it.causeOfDeath

          }
          }

          And my result :

          Maître : Executor #0==============
          java.lang.NullPointerException
          Esclave Agent-1 : Executor #1==============
          java.lang.AbstractMethodError
          Esclave Agent-2 : Executor #1==============
          java.lang.AbstractMethodError
          Esclave Agent-2 : Executor #3==============
          java.lang.AbstractMethodError
          Esclave AgentC-1 : Executor #0==============
          java.lang.AbstractMethodError
          Esclave AgentC-1 : Executor #1==============
          java.lang.AbstractMethodError
          Esclave AgentC-1 : Executor #2==============
          java.lang.AbstractMethodError

          Have you an idea to restart queue without restart hudson ???

          The fix can be in the next release ?

          I can downgrade hudson to 1.378 pending ?

          bertrandgressier added a comment - Hello, I have exactly the same problem on my forge. I have 1 master with 3 slaves. I upgrade my server from 1.374 to 1.384 and after I have a lot of Exceptions. This is the same of top. I try to restart queues with a groovy script but it's unsuccess. I can just list crash queues import hudson.model.*; hudson = Hudson.instance def computers= hudson.computers computers*.executors*.each { if (it.causeOfDeath != null) { println "${it.owner.caption} : ${it.displayName}==============" println it.causeOfDeath } } And my result : Maître : Executor #0============== java.lang.NullPointerException Esclave Agent-1 : Executor #1============== java.lang.AbstractMethodError Esclave Agent-2 : Executor #1============== java.lang.AbstractMethodError Esclave Agent-2 : Executor #3============== java.lang.AbstractMethodError Esclave AgentC-1 : Executor #0============== java.lang.AbstractMethodError Esclave AgentC-1 : Executor #1============== java.lang.AbstractMethodError Esclave AgentC-1 : Executor #2============== java.lang.AbstractMethodError Have you an idea to restart queue without restart hudson ??? The fix can be in the next release ? I can downgrade hudson to 1.378 pending ?

          dru_n added a comment -

          I have not observed this issue with any version after 1.380

          dru_n added a comment - I have not observed this issue with any version after 1.380

          it's weird beacause I have always this issue on my forge.

          I jump the version from 374 to 384 directly.
          I tried to rollback on 378 but with all plugins and configurations, it's worse ...

          I'm back to the latest version and I have exaclty the same bug.

          Almost of all my queues are shown "Dead" ... the master and slaves

          The stack trace is :

          java.lang.AbstractMethodError
          at hudson.model.Executor.getEstimatedRemainingTimeMillis(Executor.java:342)
          at hudson.model.queue.LoadPredictor$CurrentlyRunningTasks.predict(LoadPredictor.java:99)
          at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:323)
          at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:294)
          at hudson.model.Queue.pop(Queue.java:760)
          at hudson.model.Executor.grabJob(Executor.java:177)
          at hudson.model.Executor.run(Executor.java:115)

          My only solution is to reboot the service.

          Are you sure that is fixed in previous release ?

          bertrandgressier added a comment - it's weird beacause I have always this issue on my forge. I jump the version from 374 to 384 directly. I tried to rollback on 378 but with all plugins and configurations, it's worse ... I'm back to the latest version and I have exaclty the same bug. Almost of all my queues are shown "Dead" ... the master and slaves The stack trace is : java.lang.AbstractMethodError at hudson.model.Executor.getEstimatedRemainingTimeMillis(Executor.java:342) at hudson.model.queue.LoadPredictor$CurrentlyRunningTasks.predict(LoadPredictor.java:99) at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:323) at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:294) at hudson.model.Queue.pop(Queue.java:760) at hudson.model.Executor.grabJob(Executor.java:177) at hudson.model.Executor.run(Executor.java:115) My only solution is to reboot the service. Are you sure that is fixed in previous release ?

          I have always this bug !
          Same stack trace. Right now, 1 executor of 2 died

          have you any idea ?

          bertrandgressier added a comment - I have always this bug ! Same stack trace. Right now, 1 executor of 2 died have you any idea ?

          Finally it's ok !
          it's due to old Batch mode plugin in my conf ...
          I upgrade this and now all seems good

          bertrandgressier added a comment - Finally it's ok ! it's due to old Batch mode plugin in my conf ... I upgrade this and now all seems good

          I am noticing this on 1385. I suspect that it might be related to the SCM Trigger after the polling.
          @bertrandgressier: which plugin (Batch) are you taking about?

          carlo_bonamico added a comment - I am noticing this on 1385. I suspect that it might be related to the SCM Trigger after the polling. @bertrandgressier: which plugin (Batch) are you taking about?

            Unassigned Unassigned
            dru_n dru_n
            Votes:
            6 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: