Details
-
Bug
-
Status: Resolved (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
CentOS Linux 5.x kernel 2.6.18-194.3.1.el5
hudson.war 1.379 under Tomcat 5.5.28
Slave OSs: CentOS Linux 5.x, Windows XP 32bit, Windows Server 2008 64bit
Description
Post upgrade to 1.379 we are experiencing increased ocurrances of dead executors on our slave systems. Prior to this release we had never encountered a dead executor on any system, master or slave. Immediately after deploying the 1.379 WAR, 6 executors spread out among a variety of slave platforms (Linux, WinXP 32bit, Win2k8 64bit) died. Today one more died on a Linux slave. Restarting Hudson clears out the dead executors, but disconnecting and reconnecting the slaves does not. I have not tried rebooting the slaves themselves yet. The stack trace below has consistently been the output associated with the dead executors.
java.lang.AbstractMethodError
at hudson.model.Executor.getEstimatedRemainingTimeMillis(Executor.java:340)
at hudson.model.queue.LoadPredictor$CurrentlyRunningTasks.predict(LoadPredictor.java:77)
at hudson.model.queue.MappingWorksheet.(MappingWorksheet.java:303)
at hudson.model.Queue.pop(Queue.java:753)
at hudson.model.Executor.grabJob(Executor.java:175)
at hudson.model.Executor.run(Executor.java:113)
Attachments
Issue Links
- is related to
-
JENKINS-7546 Getting AbstractMethodError on all top-level (non job) pages after 1.377 upgrade
-
- Resolved
-
I just noticed that at the time the issue appeared, I had both upgraded to 1384, AND set the maximum thread number for SCM polling to 20. Apparently, removing the thread polling limit made the issue disappear. Also, the issue in fact appeared to happen just after the SCM polling for a big project had taken place. I have about 40 projects on the server, and 4 slaves.