-
Bug
-
Resolution: Fixed
-
Critical
-
Linux localhost.localdomain 2.6.32-431.1.2.el6.x86_64 #1 SMP Thu Dec 12 13:59:19 CST 2013 x86_64 x86_64 x86_64 GNU/Linux
-
Powered by SuggestiMate
As of version 1.560, polling my private mercurial repository hosted by BitBucket no longer works. Before version 1.560 this worked as expected. Other than Jenkins itself, there were no other changes to the system. The poll configuration looks like:
H/5 * * * *
The "poll log" says the following:
Started on Apr 24, 2014 8:08:05 AM
We need to schedule a new build to get a workspace, but deferring 1,149ms in the hope that one will become available soon (all_suitable_nodes_are_offline)
Done. Took 1 ms
No changes
The build does have a valid workspace. Doing a manual build to "get a workspace" does work but does not resolve the polling issue.
This is a single Jenkins instance without any slaves.
I'd be happy to provide any other logs, information, or tests required to resolve this issue.
- is blocking
-
JENKINS-21394 Avoid queuing request for workspace while node is offline
-
- Resolved
-
- is related to
-
JENKINS-34965 Polling no longer triggering builds after Jenkins upgrade to 1.651.2
-
- Open
-
[JENKINS-22750] Polling no longer triggers builds (regression 1.560)
Happening for me us as well. Only a master, no slaves. Polling with Git no longer works. Get the same error message.
{{{
We need to schedule a new build to get a workspace, but deferring 11,220ms in the hope that one will become available soon (all_suitable_nodes_are_offline)
}}}
Definitely critical issue for us.
We have submitted a pull request found here: https://github.com/jenkinsci/jenkins/pull/1215. This fixes the issue reported here, with jobs not getting getting enqued on master.
Slightly modified pull with less intrusive changes and a unit test that shows the failure: https://github.com/jenkinsci/jenkins/pull/1218
Somewhat ugly workaround: Create a dummy node, then edit your project and force it to run on master through the "Restrict where the project can run" option
Rolling back to 1.559 has emerged as the least ugly of the available workarounds.
Can you safely roll back from 1.562 to 1.559?
Triggering GitHub push builds has been broken for 2+ weeks now (after updating to 1.560). I'm checking for an update every day, hoping it'll get fixed soon.
Thanks.
Created a slave node here to workaround the issue. Works fine. Looking forward for a proper fix.
I ran a cron with jenkins-cli and it works. The slave node workaround seems more complicated. Hope it gets fixed soon.
crontab -e
*/15 * * * * /var/lib/jenkins/build_batchers.sh
cd ~
wget http://jenkins:8080/jnlpJars/jenkins-cli.jar
java -jar ~/jenkins-cli.jar -s http://jenkins:8080/ build take_over_the_world_project
We just ran into this problem today as well. In the end we downgraded to 1.559 to resolve our issue.
Is there any change of getting this fixed soon? It's a blocker for many people considering the amount of comments here, on issue 21394, the two PRs and the mailing list.
Pull request #1215 (https://github.com/jenkinsci/jenkins/pull/1215) fixes the issue, this pull request should be chosen and #1218 closed without being merged, because the latter has issues when all your slaves are offline.
Yes, the PR is ready the maintainers says they have no more comments, so it should be merged very soon.
Can you give any further information when we can expect this fix to be rolled out to the rpm distributions repository?
We'd very much like to have automated builds again, but are hesitant to work around this problem by downgrading and pinning versions on our servers.
Unfortunately not, that is in the hands of the maintainers. Meanwhile you can use the "dummy node" workaround. Create a slave, which actually is on the master itself, then polling will work again.
Never done this before but figure if this helps then great. If not then whatever:
I care about getting this fixed, so I'm offering USD 30.00000000 via FreedomSponsors to the first person who fix it.
Offer link: http://www.freedomsponsors.org/core/issue/500/polling-no-longer-triggers-builds-regression-1560
You can also join me and throw in a few bucks there and we'll get it fixed faster
If you fix this issue (see my acceptance criteria there) please use that site to request your payment.
The fix is already there, we just need the maintainers of Jenkins to merge the pull request
Another possible workaround - change the job's config.xml like this:
<assignedNode>master</assignedNode>
<canRoam>false</canRoam>
<!-<canRoam>true</canRoam>->
Limitation - a single build machine CI environment.
My projects have <canRoam>false</canRoam> and polling still doesn't work for me.
Why are people fighting against getting this fixed? By refusing to do builds, Jenkins has defeated its entire purpose for existing. Thanks for killing Jenkins.
People are not fighting against it. The fix is on it's way and it will be with the next release of jenkins. Note the merge of pull request #1215 https://github.com/jenkinsci/jenkins/pull/1215
The regression only broke polling for users with no slaves and using an SCM that requires a workspace for polling.
Code changed in jenkins
User: Mads Nielsen
Path:
core/src/main/java/hudson/model/AbstractProject.java
test/src/test/java/hudson/model/ProjectTest.java
http://jenkins-ci.org/commit/jenkins/492a60a69c2bac35578b7d93d6cb7a9f593da2f8
Log:
Fixing JENKINS-22750, regression introduced in JENKINS-21394
Code changed in jenkins
User: Oliver Gondža
Path:
core/src/main/java/hudson/model/AbstractProject.java
test/src/test/java/hudson/model/ProjectTest.java
http://jenkins-ci.org/commit/jenkins/f371bc99685bab5a8bd70a3d461b91024727a5ed
Log:
Merge pull request #1215 from MadsNielsen/JENKINS-21394-Regression
Fixing JENKINS-22750, regression introduced in JENKINS-21394
Can we please get this into the next release? Cannot upgrade Jenkins until this issue is resolved.
The regression only broke polling for users with no slaves and using an SCM that requires a workspace for polling.
So, most setups? The average installation of Jenkins is probably not using slaves and most are probably using git. This statement is ridiculous.
FWIW, for performance and security reasons it is recommended to not have any executors on the master (use slaves exclusively); and current versions of the Git plugin default to remote polling (no workspace required). Some commercial SCMs unfortunately require workspace-based polling.
At any rate, this has already been put in for 1.565. Could be backported to the rc branch for 1.564 as well; not sure who “owns” this.
If that's the recommended approach then the default installation of Jenkins should be configured with a master and a single slave by default.
I mean that without any steps other than 'yum install jenkins' I should get the behaviour that is best practice.
The average user is not going to know about slaves. I myself only found out about slaves because of this bug. If we're saying I should have been using slaves all along (and we've been using Jenkins for over a year) then there's a problem with the default setup.
Unfortunately by its nature the default setup cannot include slaves. (Well, it could add a local slave, but that does not have any benefit.)
How about an administrative monitor similar to the one shown when security is disabled? Link to /computer/new as well as a wiki page describing node best practices.
What can be done, is not to install any executors by default, allow users to install master executors by hand, but with visible not that this is not recommended and link to "add a node" screen.
I disagree that the master should not be able to execute. It's a build server - why does it need to be more complicated by default? Slaves? I don't need to manage yet another machine (in this case a slave req not to be local). Hopefully not too brash but I'm just stating what I think most people might also agree with.
On another note, thanks to all who got this into 1.650 and for the work that went into fixing this issue.
Why is this not a blocker? I don't understand why the priority of this was lowered?
mockturtl: Please be more specific as to how your Jenkins setup looks like. Does master have executors? Do you have slaves? How many? What are their retention strategies (schedule, online as much as possible, on demand, ...)? Are they cloud slaves? Some of them? Are they set to Tied jobs only or Use as much as possible? Do you have plugins changing how the queue or node assignment works?
Jen Wilson: From the reports, it seems there is a workaround (tie jobs to 'master' or add slaves).
Daniel Beck: Forgive me for not providing more info; I'm a Jenkins newbie. 2 executors. Single instance, no slaves. My job page links to a "GitHub Hook Log" as in the OP description.
Enabled plugins: ant, credentials, cvs, embeddable-build-status, environment injector, external monitor job type, git client, git, github api, github authentication, github, github pull request builder, hipchat, javadoc, ldap, mailer, mapdb api, matrix authorization strategy, matrix project, maven integration, msbuild, nunit, owasp markup formatter, pam authentication, scm api, ssh credentials, ssh slaves, subversion, translation assistance, windows slaves
I've confirmed setting `<canRoam>` to false in `.jenkins/jobs/myjob/config.xml` in 1.564 does resolve the issue.
I have a similar issue with jobs that are configured for git push notification.
After upgrading from 1.557 to 1.563/1.564, these jobs no longer get triggered on repository checkins.
There is an entry in the system log indicating that Jenkins received the notification:
Mai 21, 2014 8:57:15 AM INFO hudson.plugins.git.GitStatus doNotifyCommit
Triggering the polling of services
As git push notification involves configuring the job for scheduled polling, I believe I'm facing the same issue as reported here.
My setup:
The master (Debian 6) has 3 executors. There are 3 slaves configured (1 ssh unix slave and two jnlp Windows slaves) that are offline by default.
The jobs in question are not bound to any node or label.
Installed plugins: Parameterized Trigger plugin, Credentials Plugin, Hudson Fitnesse plugin, Matrix Project Plugin, SSH Slaves plugin, Mailer Plugin, Javadoc Plugin, OWASP Markup Formatter Plugin, Matrix Authorization Strategy Plugin, CVS Plug-in, Translation Assistance plugin, SCM API Plugin, cucumber-reports, LDAP Plugin, Maven Integration plugin, Green Balls, PAM Authentication plugin, Artifactory Plugin, GIT plugin, Ant Plugin, Windows Slaves Plugin, External Monitor Job Type Plugin, Copy Artifact Plugin, Extra Columns Plugin, MapDB API Plugin, SSH Credentials Plugin, promoted builds plugin, Multiple SCMs plugin, Subversion Plug-in.
GIT plugin is 1.1.26, all other plugins have the most recent version.
Code changed in jenkins
User: Oliver Gondža
Path:
changelog.html
http://jenkins-ci.org/commit/jenkins/218676d01960455b6247ab2d4a4fd35ff2a7efad
Log:
JENKINS-22750 was actually fixed in 1.565
@Oliver Gondža: Thanks for clarifying this. So I'll wait for the 1.565 debian package (the rc debian package is currently not available) and check again.
I can confirm that the problem with push notification is resolved in 1.565.
Actually, we are using several slaves, and all the nodes are online... but still seeing:
Started on Jul 25, 2014 12:46:00 PM
We need to schedule a new build to get a workspace, but deferring 2,143ms in the hope that one will become available soon (all_suitable_nodes_are_offline)
Done. Took 4 ms
No changes
Not this bug. I was having issues because the master node was set to tied jobs only. Once I opened it up, polling began to work again. Not sure why it behaves in this way, but regardless, it was not this bug. Re-Closing.
The bug is still there, easily reproducible in following configuration:
Server: most recent Bitnami Appliance (Ubuntu-based), jenkins.war manually upgraded to 1.577
Slaves: 1 'cloud' in-demand slave, managed by virtualbox plugin, Ubuntu 14.04 LTS x86_64 with OS 'native' LAMP stack.
SCM: git in local network on a separate server, accessible by ssh & git
Job: default 'free-style job', with Build Trigger set to 'Poll SCM' once per hour (doesn't matter fixed or randomized), tied by node name to the particular slave (in-demand cloud VB slave)
Manual job triggering brings the node online and after the configured delay (3 min in my case) jenkins proceeds with the rest of the job. Changes in SCM cannot trigger the build because of typical error message mentioned above: "We need to schedule a new build to get a workspace, but deferring in the hope that one will become available soon (all_suitable_nodes_are_offline)".
This is not just a bug, it's SEVERE failure breaking typical workflow in any cloud environment. Jenkins has to make the decision to power up slave(s) based on the results of SCM Polling, but unable to do that, because the very SCM Polling can be done only on the node(s) which are currently offline.
The bug is still there, easily reproducible in following configuration:
Server: most recent Bitnami Appliance (Ubuntu-based), jenkins.war manually upgraded to 1.577
Slaves: 1 'cloud' in-demand slave, managed by virtualbox plugin, Ubuntu 14.04 LTS x86_64 with OS 'native' LAMP stack.
SCM: git in local network on a separate server, accessible by ssh & git
Job: default 'free-style job', with Build Trigger set to 'Poll SCM' once per hour (doesn't matter fixed or randomized), tied by node name to the particular slave (in-demand cloud VB slave)
Manual job triggering brings the node online and after the configured delay (3 min in my case) jenkins proceeds with the rest of the job. Changes in SCM cannot trigger the build because of typical error message mentioned above: "We need to schedule a new build to get a workspace, but deferring in the hope that one will become available soon (all_suitable_nodes_are_offline)".
This is not just a bug, it's SEVERE failure breaking typical workflow in any cloud environment. Jenkins has to make the decision to power up slave(s) based on the results of SCM Polling, but unable to do that, because the very SCM Polling can be done only on the node(s) which are currently offline.
Hi Kiril,
Okay that is not good, have you tried with another cloud provider plugin like amazon-aws? Since your usecase is fairly common it is pretty strange that we haven't had multiple outcries on this one since the change is more than 3 months old. Also, i do not know why that message is displayed, we have this, and in your case you should get the latter message.
if (isAllSuitableNodesOffline(build)) { Collection<Cloud> applicableClouds = label == null ? Jenkins.getInstance().clouds : label.getClouds(); return applicableClouds.isEmpty() ? WorkspaceOfflineReason.all_suitable_nodes_are_offline : WorkspaceOfflineReason.use_ondemand_slave; }
Best regards,
Mads
kirill_aga: Please file a new issue. Yours is only superficially similar to this one, and we don't need to keep the ~80 people watching this one informed about the investigation related to your (different) issue.
I am seeing this issue on an upgrade to Jenkins ver. 1.651.2 – For us, this is still a critical bug.
danielbeck Sadly not, exact same error message and SCM polling is not working on just a few builds, randomly. This seemed to occur after a node died due to hw issues. Still having the same issue.
danielbeck Should I make a new issue for this – I can provide any information needed, though the issues are very similar.
Hi.
Yes, the key issue is the combination of "polling requires workspace" and "no slave exists". I have been tied to other tasks the last couple of days, but will write a fix now.