Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-56137

EC2-slave jar cache vs class initialization race.




      Time to time an exception is experienced:

      FATAL: Remote call on Slave failed
      {{Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to Slave}}
      {{java.lang.NoClassDefFoundError: Could not initialize class hudson.slaves.SlaveComputer}}

      SlaveComputer class is delivered to slave from master in jar file that is stored at slave in its jar cache - usually ~/.jenkins/cache/jars. I noticed that this jar is sent by master several times during single job, which is suspected behavior - probably a bug. This behavior is related only to ec2 slaves. Non-ec2 slaves receives this jar only once. Above exception happens when jar is delivered and at the same time something from SlaveComputer class is called. At this moment jvm at slave tries to load jar to initialize SlaveComputer class and it fails. This is unrecoverable exception Ongoing job fails. Moreover, all jobs executed later on such slave also fail due to lack of instance of SlaveComputer. Only thing that helps is to reconnect slave, which actually restarts main java slave process. I found out that it is ProccessTreeKiller that is trying to call methods from SlaveComputer when exception happens. There is a workaround for this. It is possible to disable ProccessTreeKiller functionality by setting '-Dhudson.util.ProcessTree.disable=true' as a slave jvm option. Sending jars to slave more than once is a bug or lack of synchronization around this is (sending jars vs methods call).

      It is not easy to reproduce this issue. On 20 slaves working ~15h per day it happens ~10 times a week.




        Issue Links


            There are no comments yet on this issue.


              thoulen FABRIZIO MANFREDI
              ppingot Pawel Pingot
              3 Vote for this issue
              3 Start watching this issue