-
Bug
-
Resolution: Fixed
-
Major
-
Jenkins 1.463 under Tomcat6 on Linux (SLES 11), Windows XP slave VMs controlled via vSphere Cloud plugin
I'm using the following setup:
- WinXP slaves A,B,C
- jobs jA, jB, jC, tied to slaves A,B,C respectively using "Restrict where this job can run"
Assume all slaves are disconnected and powered off, no builds are queued.
When starting a build manually, say jC, the following will happen:
- job jC will be scheduled and also displayed accordingly in the build queue
- tooltip will say it's waiting because slave C is offline
- next, slave A is powered on by Jenkins and connection is established
- jC will not be started, Jenkins seems to honor the restriction correctly
- after some idle time, Jenkins realizes the slave is idle and causes shut down
- then, same procedure happens with slave B
- on occasion, next one is slave A again
- finally (on good luck?) slave C happens to be started
- jC is executed
It is possible that jC is waiting for hours (indefinitely?), because the required
slave is not powered on. I also observed this behaviour using a time-trigger
instead of manual trigger, so I assume it is independent of the type of trigger.
Occasionally it also happens that the correct slave is powered up right away,
but that seems to happen by chance. The concrete pattern is not obvious to me.
Note that the component selection above is just my best guess.
Cheers, Marco
I've been encountering the same problem. I thought it was in the code of the vSphere Plugin, but it turns out that it's not. Jenkins is issuing a connect() call on slaves that have no reason to be starting up due to the queued jobs that I can see.
Part of the problem IS the vSphere Plugin itself. Originally, when a job was fired up, any slave that was down that could the job would be started by the vSphere Plugin because the connect() method would get called on all those slaves, which resulted in a large number of VMs being powered on for a single job. I added code to the plugin to throttle that behavior. Unfortunately, the throttling is causing this problem to get worse. Where as originally, jA, jB, and jC might have been started up, jC now MIGHT get started up due to the vSphere plugin throttling the VM startups.
Initial investigation seems to indicate that the Slave.canTake() function might not be functioning as expected. If I find anything further during my investigation, I'll post here.