-
Bug
-
Resolution: Fixed
-
Critical
-
Jenkins 1.599, jClouds-plugin 2.9-SNAPSHOT (private-3bd37c37-jenkins)
When a matrix build begins, Jenkins creates a "flyweight" build that checks out the code and decides which (if any) of the project axes need to be rebuilt. The "flyweight" build starts the subordinate jobs and waits until they are complete before it finishes. However, Jenkins doesn't allocate an executor to the "flyweight" builds or consider them when reporting whether a node is "idle". Because JCloudsRetentionStrategy.check() only uses Computer.isIdle() to determine if the node is busy, it kills off nodes running "flyweight" builds.
Our cloud is set to create instances with one executor each. In the scenario where a matrix build starts two subordinate builds, the "flyweight" will run alongside one build while the other goes to a second node. If the job on the second node takes more than the retention time to complete, the first node will be considered "idle" and jClouds-plugin will terminate it, which stops the "flyweight" build with an error, which causes the entire build to fail.
"Flyweight" builds don't occupy executors, so the only way I've found to detect them one is by enumerating every job in the system, testing to see if it's building, then checking the name of the node it's building on. (If there's a better way, I would very much like to know about it!) Some Groovy code:
for (node in Jenkins.getInstance().getComputers()) {
num_jobs = 0;
for (job in Jenkins.getInstance().getAllItems()) {
if (job.isBuilding()) {
if (job.getLastBuild().getBuiltOnStr() == node.getName())
}
}
if (num_jobs > 0)
{ println("Node " + node.getName() + " is running " + num_jobs + " - don't kill it!"); }}
(I know this is really a bug in upstream Jenkins, but the developers there seem determined to treat "flyweight" builds as undetectable non-entities, so I think fixing jCloud-plugin's behavior will be a much easier fix.)
Thanks for all your great work!