• Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Blocker Blocker
    • None
    • Master: Win Server 2003 R2 x64 SP2, Hudson 1.377, Slave 1: Mac OS X 10.6.5, Slave 8: Win Server 2008 R2 x64

      One of our CI builds is bound by labels to run either on Slave 10 (Mac) or on Slave 8 (Win). They are not supposed to run on our Master.

      Nevertheless, from time to time you can observe that one of the jobs gets stuck in the assignment phase where it looks like that it is bound to the Master but the job is sitting on one of the Slaves. (See attachment)

      If you leave the job untouched it somehow finishes after 28min even though the task just needs 4sec. (see attachment). That means a job which would need 4sec does nothing for 27min until it solved its assignment issue.

      However, if you "stop" the job in the weird assignment phase then this job is bound forever to nothing. That means if you want to run this job again you need to reboot the entire farm the get rid of the zombie-job.

          [JENKINS-8223] flaky slave assignment

          robsimon added a comment -

          Environment changed - issue stayed.

          • Master: Mac 10.7.1, java 1.6.0_26, Jenkins v1.424
          • Slave1: Mac 10.6.8, java 1.6.0_26, JNLP connected
          • Slave2: Windows Server 2008 R2 x64, jre 1.6.0_23_x64, JNLP connected

          The problem is that the assignment of a job which lasts a couple of minutes takes almost 1.5hours. (see slave_assignment_issue.png)

          Furthermore, if you cancel this job during its assignment phase then this job will be a zombie job tied to the master and no more jobs of this job can be executed until Jenkins has been restarted.

          robsimon added a comment - Environment changed - issue stayed. Master: Mac 10.7.1, java 1.6.0_26, Jenkins v1.424 Slave1: Mac 10.6.8, java 1.6.0_26, JNLP connected Slave2: Windows Server 2008 R2 x64, jre 1.6.0_23_x64, JNLP connected The problem is that the assignment of a job which lasts a couple of minutes takes almost 1.5hours. (see slave_assignment_issue.png) Furthermore, if you cancel this job during its assignment phase then this job will be a zombie job tied to the master and no more jobs of this job can be executed until Jenkins has been restarted.

          robsimon added a comment -

          Adding Locks-and-Latches to it.

          Environment: Jenkins v1.432, Lock-and-Latches v0.6

          The reason for this is that when jobs wait for a lock then they have the same master state while being already on a slave-executer-lane as the example above. That makes me belief that for some reasons the lock did not get released on time. But I'm not sure how to investigate it...

          robsimon added a comment - Adding Locks-and-Latches to it. Environment: Jenkins v1.432, Lock-and-Latches v0.6 The reason for this is that when jobs wait for a lock then they have the same master state while being already on a slave-executer-lane as the example above. That makes me belief that for some reasons the lock did not get released on time. But I'm not sure how to investigate it...

          Daniel Beck added a comment -

          Needs to be reproduced on recent Jenkins versions.

          Daniel Beck added a comment - Needs to be reproduced on recent Jenkins versions.

            Unassigned Unassigned
            robsimon robsimon
            Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: