I created a job which contains a groovy script. The groovy script's job is to put a large number of builds into the Jenkins queue which are set to wait until machines with a specific label are found to run them. Only one of these jobs is set to run at a time on each machine.
By default, machines are set with the label "uninstalled". Once this job runs on them, that label is changed from "uninstalled" to "installing" and when finished "installed". The job is only allowed to run on nodes with the label "uninstalled" on them.
Right now, if a single machine has the "uninstalled" label attached to it, all of the jobs in the queue will set themselves to run on that node. Once a single job in the queue starts on that machine, that "uninstalled" label is set to "installing", which SHOULD take it out of the running for the other jobs to run on it. When that job completes, however, the jobs in the queue will still run on that node, even though it no longer has an "uninstalled" label.
Jenkins' queue should be changed so that before a job is transferred from the "Buildable" queue into the executor, it should check first to make sure that it still complies with the labels on the node it plans to build on. At random times Jenkins will catch on before the build actually goes off, but more often than not, all of the jobs in the queue will run on this one node instead of waiting for other ones to come in.
I have been looking at the Jenkins code for the queue and the maintain() method seems to be what needs changing here. It appears as though the JobOffer occurs before a last minute check for blockers that occurs on line 62 of the following code. Can anyone confirm if what I am looking at is right?
public synchronized void maintain() { LOGGER.log(Level.FINE, "Queue maintenance started {0}", this); // The executors that are currently waiting for a job to run. Map<Executor,JobOffer> parked = new HashMap<Executor,JobOffer>(); {// update parked for (Computer c : Jenkins.getInstance().getComputers()) { for (Executor e : c.getExecutors()) { if (e.isParking()) { parked.put(e,new JobOffer(e)); } } } } {// blocked -> buildable for (BlockedItem p : new ArrayList<BlockedItem>(blockedProjects.values())) {// copy as we'll mutate the list if (!isBuildBlocked(p) && allowNewBuildableTask(p.task)) { // ready to be executed Runnable r = makeBuildable(new BuildableItem(p)); if (r != null) { p.leave(this); r.run(); } } } } // waitingList -> buildable/blocked while (!waitingList.isEmpty()) { WaitingItem top = peek(); if (top.timestamp.compareTo(new GregorianCalendar())>0) break; // finished moving all ready items from queue top.leave(this); Task p = top.task; if (!isBuildBlocked(top) && allowNewBuildableTask(p)) { // ready to be executed immediately Runnable r = makeBuildable(new BuildableItem(top)); if (r != null) { r.run(); } else { new BlockedItem(top).enter(this); } } else { // this can't be built now because another build is in progress // set this project aside. new BlockedItem(top).enter(this); } } final QueueSorter s = sorter; if (s != null) s.sortBuildableItems(buildables); // allocate buildable jobs to executors for (BuildableItem p : new ArrayList<BuildableItem>(buildables)) {// copy as we'll mutate the list in the loop // one last check to make sure this build is not blocked. if (isBuildBlocked(p)) { p.leave(this); new BlockedItem(p).enter(this); LOGGER.log(Level.FINE, "Catching that {0} is blocked in the last minute", p); continue; } List<JobOffer> candidates = new ArrayList<JobOffer>(parked.size()); for (JobOffer j : parked.values()) if (j.canTake(p)) candidates.add(j); MappingWorksheet ws = new MappingWorksheet(p, candidates); Mapping m = loadBalancer.map(p.task, ws); if (m == null) { // if we couldn't find the executor that fits, // just leave it in the buildables list and // check if we can execute other projects LOGGER.log(Level.FINER, "Failed to map {0} to executors. candidates={1} parked={2}", new Object[]{p, candidates, parked.values()}); continue; } // found a matching executor. use it. WorkUnitContext wuc = new WorkUnitContext(p); m.execute(wuc); p.leave(this); if (!wuc.getWorkUnits().isEmpty()) makePending(p); else LOGGER.log(Level.FINE, "BuildableItem {0} with empty work units!?", p); } }
If anyone knows any ways around this in the meantime through the use of Groovy or whatever, your advice would be greatly appreciated.
Thank You!
-Kevin