[JENKINS-5413] SCM polling getting hung

Hans-Juergen Hafner added a comment - 2010-08-13 04:49

@vjuranek
Thanks a lot!
The script worked very well . (With on small change: A ";" was missed before item.interrupt() )

Hans-Juergen Hafner added a comment - 2010-08-13 04:49 @vjuranek Thanks a lot! The script worked very well . (With on small change: A ";" was missed before item.interrupt() )

erwan_q added a comment - 2010-09-13 15:23

I have the same issue every week when I restart the master. The slave is unusable (scm polling stuck, fail to join the slave....). Workaround is to disconnect, kill slave manually, restart slaves. It could be great to have a batch script to implement this restart action at reboot time

erwan_q added a comment - 2010-09-13 15:23 I have the same issue every week when I restart the master. The slave is unusable (scm polling stuck, fail to join the slave....). Workaround is to disconnect, kill slave manually, restart slaves. It could be great to have a batch script to implement this restart action at reboot time

lkishalmi added a comment - 2010-11-04 00:37

~~JENKINS-5977~~ seems to be the root cause of this issue.
Try to upgrade to 1.380+

lkishalmi added a comment - 2010-11-04 00:37 JENKINS-5977 seems to be the root cause of this issue. Try to upgrade to 1.380+

Dean Yu added a comment - 2010-11-18 10:53

I've just applied the patch to perform polling on the master for the Subversion plugin. Apologies to everyone who was waiting for this patch. Updating the plugin wiki with instructions.

Dean Yu added a comment - 2010-11-18 10:53 I've just applied the patch to perform polling on the master for the Subversion plugin. Apologies to everyone who was waiting for this patch. Updating the plugin wiki with instructions.

eguess74 added a comment - 2011-03-09 19:19 - edited

I just started to experience this problem.
we have two instances of Jenkins running. One of them started to have this polling error. It is not only that the polling gets stuck but also the CPU is overloaded for no reason. I have restarted the server, but it came back to this error state in no time.
I have changed the concurrent poll amount from 5 to 10 and restarted again to kill hangup polls. Watching...

We are using jenkins 1.399 + git plugin 1.1.5 + no slaves involved
We have about 450 jobs with polling interval set to 10 min

eguess74 added a comment - 2011-03-09 19:19 - edited I just started to experience this problem. we have two instances of Jenkins running. One of them started to have this polling error. It is not only that the polling gets stuck but also the CPU is overloaded for no reason. I have restarted the server, but it came back to this error state in no time. I have changed the concurrent poll amount from 5 to 10 and restarted again to kill hangup polls. Watching... We are using jenkins 1.399 + git plugin 1.1.5 + no slaves involved We have about 450 jobs with polling interval set to 10 min

eguess74 added a comment - 2011-03-09 20:33 - edited

BTW the script provided by vjuranek didn't work for me throwing MissingMethodException
But i have found and ability to kill threads from GUI using monitoring plugin! The thread details section has this nice little kill button

eguess74 added a comment - 2011-03-09 20:33 - edited BTW the script provided by vjuranek didn't work for me throwing MissingMethodException But i have found and ability to kill threads from GUI using monitoring plugin! The thread details section has this nice little kill button

eguess74 added a comment - 2011-03-10 15:09

for the record:
I was able to narrow it down to three jobs that were consistently getting stuck on the polling/fetching step.
I was trying different approaches but the only thing that actually resolved the problem was to recreate thos jobs from scratch. I.e. i blew away all related folders and workspace and recreated the job. This brought the CPU usage down and there is no stuck threads anymore already for a full day...

eguess74 added a comment - 2011-03-10 15:09 for the record: I was able to narrow it down to three jobs that were consistently getting stuck on the polling/fetching step. I was trying different approaches but the only thing that actually resolved the problem was to recreate thos jobs from scratch. I.e. i blew away all related folders and workspace and recreated the job. This brought the CPU usage down and there is no stuck threads anymore already for a full day...

Stephen Connolly added a comment - 2011-06-07 08:38

CloudBees have raised http://issues.tmatesoft.com/issue/SVNKIT-15

Stephen Connolly added a comment - 2011-06-07 08:38 CloudBees have raised http://issues.tmatesoft.com/issue/SVNKIT-15

Thomas Mielke added a comment - 2011-07-19 16:23 - edited

@vjuranek, @Hans-Juergen Hafner

My team is using Git, and we started experiencing the problem after adding several extra builds to our jenkins server. We resolved the issue by running the script vjuranek provided as a cronjob:

cron
0 * * * *       /var/lib/hudson/killscm.sh

# cat /var/lib/hudson/killscm.sh 
java -jar /var/lib/hudson/hudson-cli.jar -s http://myserver:8090/ groovy /var/lib/hudson/threadkill.groovy

# cat /var/lib/hudson/threadkill.groovy 
Thread.getAllStackTraces().keySet().each() { 
	item ->
	if (item.getName().contains("SCM polling") && item.getName().contains("waiting for hudson.remoting")) { 
		println "Interrupting thread " + item.getId(); 
		item.interrupt() 
	}
}

Since running these scripts, our nightly builds haven't hung for the last 5 consecutive days.

Thomas Mielke added a comment - 2011-07-19 16:23 - edited @vjuranek, @Hans-Juergen Hafner My team is using Git, and we started experiencing the problem after adding several extra builds to our jenkins server. We resolved the issue by running the script vjuranek provided as a cronjob: cron 0 * * * * / var /lib/hudson/killscm.sh # cat / var /lib/hudson/killscm.sh java -jar / var /lib/hudson/hudson-cli.jar -s http: //myserver:8090/ groovy / var /lib/hudson/threadkill.groovy # cat / var /lib/hudson/threadkill.groovy Thread .getAllStackTraces().keySet().each() { item -> if (item.getName().contains( "SCM polling" ) && item.getName().contains( "waiting for hudson.remoting" )) { println "Interrupting thread " + item.getId(); item.interrupt() } } Since running these scripts, our nightly builds haven't hung for the last 5 consecutive days.

Frederic Pesquet added a comment - 2011-09-08 07:54

We are still experiencing this (with 1.420 and 1.414).
The workarround to kill scm threads periodically does not work for me, not sure why (blocked threads are not killed by the scripts, and can not be killed either by the monitoring plugin).
We have a large number of jobs (>1000). The Hudson is blocking every day, and the only way to unlock it is to restart it.
The issue does not seem to be specific to one SCM: we are using SVN and GIT. When I tried to implement the workarround to poll from master with SVN (hudson.scm.SubversionSCM.pollFromMaster), the blocking occured on the GIT polling.
There are 45 voters on this issue, I guess I'm not alone here... Can we raise the priority of this? Seems a real core issue.

Frederic Pesquet added a comment - 2011-09-08 07:54 We are still experiencing this (with 1.420 and 1.414). The workarround to kill scm threads periodically does not work for me, not sure why (blocked threads are not killed by the scripts, and can not be killed either by the monitoring plugin). We have a large number of jobs (>1000). The Hudson is blocking every day, and the only way to unlock it is to restart it. The issue does not seem to be specific to one SCM: we are using SVN and GIT. When I tried to implement the workarround to poll from master with SVN (hudson.scm.SubversionSCM.pollFromMaster), the blocking occured on the GIT polling. There are 45 voters on this issue, I guess I'm not alone here... Can we raise the priority of this? Seems a real core issue.

carlspring added a comment - 2011-09-08 08:13

We used to hit this while we were still using Hudson (ca. 1.3xx). We also have a large number of jobs 300-400. We haven't run into this since we moved to Jenkins.

(Just a suggestion).

carlspring added a comment - 2011-09-08 08:13 We used to hit this while we were still using Hudson (ca. 1.3xx). We also have a large number of jobs 300-400. We haven't run into this since we moved to Jenkins. (Just a suggestion).

Frederic Pesquet added a comment - 2011-09-08 14:05

We are already on Jenkins...
More infos:
We have also a medium number of slaves (>25). It is not uncommon that a slave cease to respond temporarily, or reboot, etc.
As described in this bug initial description, "Furthermore, if the slave dies, the locked channel object still exists in the master JVM.".
I guess we are probably experiencing something like this. The SCM polling getting hung is just the most obvious symptom here.

Frederic Pesquet added a comment - 2011-09-08 14:05 We are already on Jenkins... More infos: We have also a medium number of slaves (>25). It is not uncommon that a slave cease to respond temporarily, or reboot, etc. As described in this bug initial description, "Furthermore, if the slave dies, the locked channel object still exists in the master JVM.". I guess we are probably experiencing something like this. The SCM polling getting hung is just the most obvious symptom here.

Lars Kruse added a comment - 2011-10-19 09:59

Have a look here for at step-by-step description of the work-around http://howto.praqma.net/hudson/jenkins-5413-workaround

Lars Kruse added a comment - 2011-10-19 09:59 Have a look here for at step-by-step description of the work-around http://howto.praqma.net/hudson/jenkins-5413-workaround

Jes Struck added a comment - 2011-10-19 10:06

I just started to do a studie of why this happens. If there are any, there have bullet prof scenario to force this behaviore please do tell. Because i cannot reproduce this consistent

Jes Struck added a comment - 2011-10-19 10:06 I just started to do a studie of why this happens. If there are any, there have bullet prof scenario to force this behaviore please do tell. Because i cannot reproduce this consistent

Jes Struck added a comment - 2011-12-05 15:17

we had some issues in our scm plugin, that resulted in the polling thread hanging.
we had three things
1)Our poll-log that was send between our slaves and the master, gave us multiple threads wrting to the same appender.
2)We also saw that some uncaught exception resulted in hanging thread.
3)We have also seen threads hang becaus of a field that where declared transient, which could not be declared transient.

after we solved thees issue in our plugin we have not seen threads hang anymore
and therfore we cant reproduce this scenarioe

Jes Struck added a comment - 2011-12-05 15:17 we had some issues in our scm plugin, that resulted in the polling thread hanging. we had three things 1)Our poll-log that was send between our slaves and the master, gave us multiple threads wrting to the same appender. 2)We also saw that some uncaught exception resulted in hanging thread. 3)We have also seen threads hang becaus of a field that where declared transient, which could not be declared transient. after we solved thees issue in our plugin we have not seen threads hang anymore and therfore we cant reproduce this scenarioe

Vitalii Grygoruk added a comment - 2011-12-05 15:57

Can you share fixed version of plugin for test?

Vitalii Grygoruk added a comment - 2011-12-05 15:57 Can you share fixed version of plugin for test?

Jes Struck added a comment - 2011-12-05 19:12

vi are doing stress tests on the fixed version, and exspect to release the new version of our plugin wednesday or thursday

Jes Struck added a comment - 2011-12-05 19:12 vi are doing stress tests on the fixed version, and exspect to release the new version of our plugin wednesday or thursday

Christian Wolfgang added a comment - 2011-12-16 21:24

Hello.

We have released our plugin, the source can be found at https://github.com/jenkinsci/clearcase-ucm-plugin.

Some of the things, that made the polling hang on the slaves were mainly uncaught exception. If they weren't caught, they sometimes resulted in slaves hanging.
And, as Jes wrote, we experienced a transient SimpleDateFormat causing the slaves to hang and this only happened in the polling phase.

Our main problem, which does not only concern the polling, is cleartool, which, from time to time, stops working due to many reasons.
It exits with the error message "albd_contact call failed: RPC: Unable to receive; errno = [WINSOCK] Connection reset by peer",
but it never returns the control to the master, which results in slaves hanging.

Our plugin is currently only available for the windows platform and we've had a lot of issues with the desktop heap size.
ClearCase needs a larger heap size than the default setting(512kb), so setting it to a larger value decreases the number of simultaneous desktops,
which sometimes caused the slave OS to freeze. If the value is too low, cleartool sometimes fails with the winsock error as mentioned before.

The conclusion is, make sure the thrown exceptions are caught and the serializable classes have proper transient fields.

I am not saying this is bullet proof, but as far as our tests goes, we haven't experienced the issue yet.

Our test setup is 15 jobs polling every minute using one slave with two executors. ClearCase crashes before anything else happens.

Christian Wolfgang added a comment - 2011-12-16 21:24 Hello. We have released our plugin, the source can be found at https://github.com/jenkinsci/clearcase-ucm-plugin . Some of the things, that made the polling hang on the slaves were mainly uncaught exception. If they weren't caught, they sometimes resulted in slaves hanging. And, as Jes wrote, we experienced a transient SimpleDateFormat causing the slaves to hang and this only happened in the polling phase. Our main problem, which does not only concern the polling, is cleartool, which, from time to time, stops working due to many reasons. It exits with the error message "albd_contact call failed: RPC: Unable to receive; errno = [WINSOCK] Connection reset by peer", but it never returns the control to the master, which results in slaves hanging. Our plugin is currently only available for the windows platform and we've had a lot of issues with the desktop heap size. ClearCase needs a larger heap size than the default setting(512kb), so setting it to a larger value decreases the number of simultaneous desktops, which sometimes caused the slave OS to freeze. If the value is too low, cleartool sometimes fails with the winsock error as mentioned before. The conclusion is, make sure the thrown exceptions are caught and the serializable classes have proper transient fields. I am not saying this is bullet proof, but as far as our tests goes, we haven't experienced the issue yet. Our test setup is 15 jobs polling every minute using one slave with two executors. ClearCase crashes before anything else happens.

Markus added a comment - 2012-01-12 09:18

We've been using the above suggested Groovy script for some time to avoid this problem, but as of 1.446 the script fails with "Remote call on CLI channel from /[ip] failed" (~~JENKINS-12302~~). Anyone else have the that problem?

Markus added a comment - 2012-01-12 09:18 We've been using the above suggested Groovy script for some time to avoid this problem, but as of 1.446 the script fails with "Remote call on CLI channel from / [ip] failed" ( JENKINS-12302 ). Anyone else have the that problem?

Franck Gilliers added a comment - 2012-02-08 10:14

Hello,

I experience the same issue.
If it can help, i describe my configuration:

master : linux - jenkins 1.448
slaves : windows XP and seven via a service - msysgit 1.7.4
plugin git : 1.1.15
projects ~ 100

I trigger a build via the push notification as describe in git plugin 1.1.14. But, as it does not always trigger the build (i do not know why), i keep a SCM polling every two hours.
To avoid the hanging, every night I restart the server and reboot slaves. During daytime, i have to kill the children git processes to free the slave

Franck Gilliers added a comment - 2012-02-08 10:14 Hello, I experience the same issue. If it can help, i describe my configuration: master : linux - jenkins 1.448 slaves : windows XP and seven via a service - msysgit 1.7.4 plugin git : 1.1.15 projects ~ 100 I trigger a build via the push notification as describe in git plugin 1.1.14. But, as it does not always trigger the build (i do not know why), i keep a SCM polling every two hours. To avoid the hanging, every night I restart the server and reboot slaves. During daytime, i have to kill the children git processes to free the slave

Christian Wolfgang added a comment - 2012-02-08 11:21

I see two branches of this issue:

1) The fact that slaves gets hung(or threads winds up waiting for lengthy polling) and how the master Jenkins instance should handle this, and
2) How to prevent slaves from getting hung.

The initial issue suggests 1), but some of the replies suggests 2).
I guess both are valid issues. Should they be treated as one or should this issue be split up in two?

Christian Wolfgang added a comment - 2012-02-08 11:21 I see two branches of this issue: 1) The fact that slaves gets hung(or threads winds up waiting for lengthy polling) and how the master Jenkins instance should handle this, and 2) How to prevent slaves from getting hung. The initial issue suggests 1), but some of the replies suggests 2). I guess both are valid issues. Should they be treated as one or should this issue be split up in two?

Guido Serra added a comment - 2012-03-23 11:46

Hi, I got the same issue:

Jenkins GIT 1.1.16
Slave: Windows 7, msysgit (Git-1.7.9-preview20120201.exe)

After I moved the SCM from SVN to the Git solution, the poll/build stopped working

This is how I got the windows machine being able to checkout git with ssh/publickey:

http://guidoserra.it/archivi/2012/03/22/jenkins-msysgit-publickey/

p.s. I was even thinking of using Fisheye to trigger the build on code change detection

Guido Serra added a comment - 2012-03-23 11:46 Hi, I got the same issue: Jenkins GIT 1.1.16 Slave: Windows 7, msysgit (Git-1.7.9-preview20120201.exe) After I moved the SCM from SVN to the Git solution, the poll/build stopped working This is how I got the windows machine being able to checkout git with ssh/publickey: http://guidoserra.it/archivi/2012/03/22/jenkins-msysgit-publickey/ p.s. I was even thinking of using Fisheye to trigger the build on code change detection

Christian Wolfgang added a comment - 2012-04-16 12:36

We have solved our problems now.

It turned out, that the underlying framework for the plugin threw RuntimeExceptions which were not catched all the time. After we handled those exceptions the slaves stopped hanging.

Christian Wolfgang added a comment - 2012-04-16 12:36 We have solved our problems now. It turned out, that the underlying framework for the plugin threw RuntimeExceptions which were not catched all the time. After we handled those exceptions the slaves stopped hanging.

Joe Hansche added a comment - 2012-06-08 19:05

wolfgang: by "plugin" you're referring to the clearcase plugin, right? So that was the issue with your plugin, but not necessarily the issue with the slaves hanging in general? Although potentially related, I guess?

So if the SCM polling plugin raises a RuntimeException, the slave thread will die off without notifying the master, and therefore the master continues waiting for it to finish, even though it never will?

Joe Hansche added a comment - 2012-06-08 19:05 wolfgang : by "plugin" you're referring to the clearcase plugin, right? So that was the issue with your plugin, but not necessarily the issue with the slaves hanging in general? Although potentially related, I guess? So if the SCM polling plugin raises a RuntimeException, the slave thread will die off without notifying the master, and therefore the master continues waiting for it to finish, even though it never will?

Alex Lorenz added a comment - 2012-06-27 11:34

This does not only happen on slaves, but also on single machine Jenkins systems.
With us here at TomTom, it happens regularly and makes us lose valuable builds.

Escalate -> Critical

Alex Lorenz added a comment - 2012-06-27 11:34 This does not only happen on slaves, but also on single machine Jenkins systems. With us here at TomTom, it happens regularly and makes us lose valuable builds. Escalate -> Critical

Christian Wolfgang added a comment - 2012-06-27 14:07

Joe: Yes, the ClearCase UCM plugin. We experienced the slaves to hang when having uncaught runtime exceptions and thus the master's polling thread will never be joined.

Christian Wolfgang added a comment - 2012-06-27 14:07 Joe: Yes, the ClearCase UCM plugin. We experienced the slaves to hang when having uncaught runtime exceptions and thus the master's polling thread will never be joined.

Mandeep Rai added a comment - 2012-09-04 17:04 - edited

I modified the script a little bit:

Jenkins.instance.getTrigger("SCMTrigger").getRunners().each()
{
  item ->
  println(item.getTarget().name)
  println(item.getDuration())
  println(item.getStartTime())
  long millis = Calendar.instance.time.time - item.getStartTime()

  if(millis > (1000 * 60 * 3)) // 1000 millis in a second * 60 seconds in a minute * 3 minutes
  {
    Thread.getAllStackTraces().keySet().each()
    { 
      tItem ->
      if (tItem.getName().contains("SCM polling") && tItem.getName().contains(item.getTarget().name))
      { 
        println "Interrupting thread " + tItem.getName(); 
        tItem.interrupt()
      }
    }
  }
}

Mandeep Rai added a comment - 2012-09-04 17:04 - edited I modified the script a little bit: Jenkins.instance.getTrigger( "SCMTrigger" ).getRunners().each() { item -> println(item.getTarget().name) println(item.getDuration()) println(item.getStartTime()) long millis = Calendar.instance.time.time - item.getStartTime() if (millis > (1000 * 60 * 3)) // 1000 millis in a second * 60 seconds in a minute * 3 minutes { Thread .getAllStackTraces().keySet().each() { tItem -> if (tItem.getName().contains( "SCM polling" ) && tItem.getName().contains(item.getTarget().name)) { println "Interrupting thread " + tItem.getName(); tItem.interrupt() } } } }

lacostej added a comment - 2013-01-14 20:15

I encountered a very similar issue, yet I have a slightly different setup:

1 master 1 slave
yet the polling was stuck on the master only
SCM polling hanging (warning displayed in Jenkins configure screen). Oldest hanging thread is more than 2 days old.
it seems it all started with a Unix process that in some way never returned:

 ps -aef| grep jenkins 
  300 12707     1   0 26Nov12 ??       1336:33.09 /usr/bin/java -Xmx1024M -XX:MaxPermSize=128M -jar /Applications/Jenkins/jenkins.war
  300 98690 12707   0 Sat03PM ??         0:00.00 git fetch -t https://github.com/jenkinsci/testflight-plugin.git +refs/heads/*:refs/remotes/origin/*
  300 98692 98690   0 Sat03PM ??         4:39.72 git-remote-https https://github.com/jenkinsci/testflight-plugin.git https://github.com/jenkinsci/testflight-plugin.git
    0  3371  3360   0  8:20PM ttys000    0:00.02 su jenkins
  300  4017  3372   0  8:52PM ttys000    0:00.00 grep jenkins
    0 10920 10896   0 19Nov12 ttys001    0:00.03 login -pfl jenkins /bin/bash -c exec -la bash /bin/bash

Running Jenkins 1.479

I killed the processes and associated threads, and it started being better.

Polling doesn't enfore timeouts ?

lacostej added a comment - 2013-01-14 20:15 I encountered a very similar issue, yet I have a slightly different setup: 1 master 1 slave yet the polling was stuck on the master only SCM polling hanging (warning displayed in Jenkins configure screen). Oldest hanging thread is more than 2 days old. it seems it all started with a Unix process that in some way never returned: ps -aef| grep jenkins 300 12707 1 0 26Nov12 ?? 1336:33.09 /usr/bin/java -Xmx1024M -XX:MaxPermSize=128M -jar /Applications/Jenkins/jenkins.war 300 98690 12707 0 Sat03PM ?? 0:00.00 git fetch -t https: //github.com/jenkinsci/testflight-plugin.git +refs/heads/*:refs/remotes/origin/* 300 98692 98690 0 Sat03PM ?? 4:39.72 git-remote-https https: //github.com/jenkinsci/testflight-plugin.git https://github.com/jenkinsci/testflight-plugin.git 0 3371 3360 0 8:20PM ttys000 0:00.02 su jenkins 300 4017 3372 0 8:52PM ttys000 0:00.00 grep jenkins 0 10920 10896 0 19Nov12 ttys001 0:00.03 login -pfl jenkins /bin/bash -c exec -la bash /bin/bash Running Jenkins 1.479 I killed the processes and associated threads, and it started being better. Polling doesn't enfore timeouts ?

Raja Aluri added a comment - 2013-02-08 18:31

For people who are on windows and want to setup a scheduled task. Here is a oneliner in powershell.

PS C:\Users\jenkins>
PS C:\Users\jenkins>
PS C:\Users\jenkins>
PS C:\Users\jenkins> tasklist /FI "IMAGENAME eq ssh.exe" /FI "Status eq Unknown" /NH | %{ $_.Split(' *',[StringSplitOptions]"RemoveEmptyEntries")[1]}  |ForEach-Object {taskkill /F /PID $_}
PS C:\Users\jenkins>
PS C:\Users\jenkins>
PS C:\Users\jenkins>
PS C:\Users\jenkins>

Raja Aluri added a comment - 2013-02-08 18:31 For people who are on windows and want to setup a scheduled task. Here is a oneliner in powershell. PS C:\Users\jenkins> PS C:\Users\jenkins> PS C:\Users\jenkins> PS C:\Users\jenkins> tasklist /FI "IMAGENAME eq ssh.exe" /FI "Status eq Unknown" /NH | %{ $_.Split( ' *' ,[StringSplitOptions] "RemoveEmptyEntries" )[1]} |ForEach- Object {taskkill /F /PID $_} PS C:\Users\jenkins> PS C:\Users\jenkins> PS C:\Users\jenkins> PS C:\Users\jenkins>

Linards L added a comment - 2013-03-08 09:17 - edited

Have not noticed that for long time now. But .. in the v1.48x version series I had similar problems, but then they dissapeared. Remember we also did some scheduled rebooting temperings ... now using v1.494.

Linards L added a comment - 2013-03-08 09:17 - edited Have not noticed that for long time now. But .. in the v1.48x version series I had similar problems, but then they dissapeared. Remember we also did some scheduled rebooting temperings ... now using v1.494.

Derek Seibert added a comment - 2013-03-25 13:37 - edited

Our team encounters this issue almost daily using the Dimensions SCM plugin. We run a single instance Jenkins server which polls a Stream ever 30 minutes. I wanted to comment just to explain how we first noticed the issue was occurring in case anybody searching this issue starts in the same place.

We select the Dimensions Polling Log for our job and see the following at maybe 9AM or 10AM. The polling has hung at this point and we need to restart our application server.

Started on Mar 25, 2013 8:30:35 AM

We expect to see something like.

Started on Mar 25, 2013 8:30:35 AM
Done. Took 19 sec
No changes

This is why this issue is so troubling. There is no notification trigger when "Started on..." has just been sitting there hung for a while, and no further polling can be done by that job without a restart of the application server.

Derek Seibert added a comment - 2013-03-25 13:37 - edited Our team encounters this issue almost daily using the Dimensions SCM plugin. We run a single instance Jenkins server which polls a Stream ever 30 minutes. I wanted to comment just to explain how we first noticed the issue was occurring in case anybody searching this issue starts in the same place. We select the Dimensions Polling Log for our job and see the following at maybe 9AM or 10AM. The polling has hung at this point and we need to restart our application server. Started on Mar 25, 2013 8:30:35 AM We expect to see something like. Started on Mar 25, 2013 8:30:35 AM Done. Took 19 sec No changes This is why this issue is so troubling. There is no notification trigger when "Started on..." has just been sitting there hung for a while, and no further polling can be done by that job without a restart of the application server.

lacostej added a comment - 2013-03-25 14:41

Derek,

Not sure if Dimensions plugin is using a native call or not hunder the hood.

Could you make a threaddump and/or a list of processes ?

J

lacostej added a comment - 2013-03-25 14:41 Derek, Not sure if Dimensions plugin is using a native call or not hunder the hood. Could you make a threaddump and/or a list of processes ? J

Manuel de la Peña added a comment - 2013-04-16 06:24

We are using "Github Pull Request Builder" plugin and we encounter this issue daily :S

Manuel de la Peña added a comment - 2013-04-16 06:24 We are using "Github Pull Request Builder" plugin and we encounter this issue daily :S

Brian Smith added a comment - 2013-10-28 22:07

We haven't seen this issue in quite a while. Just recently I have seen it again.

The only difference of note is that for the past several months we have been specifically not renaming jobs (we have instead been creating new jobs with the new name using the old job to copy from, then deleting the old job) as renaming jobs seemed to cause things to not be "stable".

Could this be related? Maybe there is a race condition when the name is being changed and the polling activity is going on? Just a thought.

Brian Smith added a comment - 2013-10-28 22:07 We haven't seen this issue in quite a while. Just recently I have seen it again. The only difference of note is that for the past several months we have been specifically not renaming jobs (we have instead been creating new jobs with the new name using the old job to copy from, then deleting the old job) as renaming jobs seemed to cause things to not be "stable". Could this be related? Maybe there is a race condition when the name is being changed and the polling activity is going on? Just a thought.

Dmitry Maslakov added a comment - 2014-04-09 02:40 - edited

Just got this error after upgrade from 1.556 to 1.558.

Using suggested scripts to kill hung threads did not help, they start again and hung.

Using VisualVM I get the dump of threads and here is the one which hunged for more than 8 hours:

"SCM polling for hudson.maven.MavenModuleSet@4f6ded0d[project-name]" - Thread t@357
   java.lang.Thread.State: WAITING
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for <28ff58dd> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
	at java.util.concurrent.FutureTask.get(FutureTask.java:187)
	at hudson.remoting.Request.call(Request.java:157)
	- locked <2de9b3db> (a hudson.remoting.UserRequest)
	at hudson.remoting.Channel.call(Channel.java:722)
	at hudson.scm.SubversionSCM.compareRemoteRevisionWith(SubversionSCM.java:1451)
	at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:356)
	at hudson.scm.SCM.poll(SCM.java:373)
	at hudson.model.AbstractProject._poll(AbstractProject.java:1490)
	at hudson.model.AbstractProject.poll(AbstractProject.java:1399)
	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:462)
	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:491)
	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

   Locked ownable synchronizers:
	- locked <70cbc89f> (a java.util.concurrent.ThreadPoolExecutor$Worker)

The project the SCM is polling is a maven project. Maven plug-in was also upgraded (probably this is a culprit) from 2.1 to 2.2.

Dmitry Maslakov added a comment - 2014-04-09 02:40 - edited Just got this error after upgrade from 1.556 to 1.558. Using suggested scripts to kill hung threads did not help, they start again and hung. Using VisualVM I get the dump of threads and here is the one which hunged for more than 8 hours: "SCM polling for hudson.maven.MavenModuleSet@4f6ded0d[project-name]" - Thread t@357 java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Native Method) - parking to wait for <28ff58dd> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at hudson.remoting.Request.call(Request.java:157) - locked <2de9b3db> (a hudson.remoting.UserRequest) at hudson.remoting.Channel.call(Channel.java:722) at hudson.scm.SubversionSCM.compareRemoteRevisionWith(SubversionSCM.java:1451) at hudson.scm.SCM._compareRemoteRevisionWith(SCM.java:356) at hudson.scm.SCM.poll(SCM.java:373) at hudson.model.AbstractProject._poll(AbstractProject.java:1490) at hudson.model.AbstractProject.poll(AbstractProject.java:1399) at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:462) at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:491) at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - locked <70cbc89f> (a java.util.concurrent.ThreadPoolExecutor$Worker) The project the SCM is polling is a maven project. Maven plug-in was also upgraded (probably this is a culprit) from 2.1 to 2.2.

Dmitry Maslakov added a comment - 2014-04-09 06:53

Finally I found 3 projects with hung SCM pooling, two of them are maven projects.

As w/a I have a) applied job renaming and creating new as a copy from old one (something like that is mentioned above), job history was lost but not critical in my case; b) reverted maven plugin to version 2.1; c) restarted Jenkins.

Dmitry Maslakov added a comment - 2014-04-09 06:53 Finally I found 3 projects with hung SCM pooling, two of them are maven projects. As w/a I have a) applied job renaming and creating new as a copy from old one (something like that is mentioned above), job history was lost but not critical in my case; b) reverted maven plugin to version 2.1; c) restarted Jenkins.

Nathaniel Irons added a comment - 2014-05-29 01:20

I see this same bug, using a Jenkins 1.565 machine with 14 git jobs, about 2/3 of which use randomized ten-minute polling ("H/10 * * * *"). The machine sits behind an inflexible corporate firewall, and the git repo is hosted outside the firewall, so there is no opportunity to use per-commit notifications, much as I'd like to.

I started noticing missed builds late last week, when we were using Jenkins 1.559. Now that I know what I'm looking for, I start seeing stuck jobs reported by hudson.triggers.SCMTrigger within an hour or two of restarting the daemon. We've been adding 1-2 jobs a week for a couple of months, and seem to have hit some kind of tipping point.

The Groovy scripts that people have posted in the past appear to be ineffective. This one:

Thread.getAllStackTraces().keySet().each(){ item ->
if(item.getName().contains("SCM polling") && item.getName().contains("waiting for hudson.remoting")){ println "Interrupting thread " + item.getId() item.interrupt() }
}

... claims to be interrupting the right SCM polling threads, and returns success, but the stuck threads persist, as reported by hudson.triggers.SCMTrigger. The longer script, starting with "Jenkins.instance.getTrigger" fails with "FATAL: No such property: Jenkins for class: Script1".

Jenkins' warning message says, "Check if your polling is hanging, and/or increase the number of threads if necessary", but as far as I can determine there is no way to increase the number of threads in current versions of Jenkins. Is that really the case?

Thanks for your time.

Nathaniel Irons added a comment - 2014-05-29 01:20 I see this same bug, using a Jenkins 1.565 machine with 14 git jobs, about 2/3 of which use randomized ten-minute polling ("H/10 * * * *"). The machine sits behind an inflexible corporate firewall, and the git repo is hosted outside the firewall, so there is no opportunity to use per-commit notifications, much as I'd like to. I started noticing missed builds late last week, when we were using Jenkins 1.559. Now that I know what I'm looking for, I start seeing stuck jobs reported by hudson.triggers.SCMTrigger within an hour or two of restarting the daemon. We've been adding 1-2 jobs a week for a couple of months, and seem to have hit some kind of tipping point. The Groovy scripts that people have posted in the past appear to be ineffective. This one: Thread .getAllStackTraces().keySet().each(){ item -> if (item.getName().contains( "SCM polling" ) && item.getName().contains( "waiting for hudson.remoting" )){ println "Interrupting thread " + item.getId() item.interrupt() } } ... claims to be interrupting the right SCM polling threads, and returns success, but the stuck threads persist, as reported by hudson.triggers.SCMTrigger. The longer script, starting with "Jenkins.instance.getTrigger" fails with "FATAL: No such property: Jenkins for class: Script1". Jenkins' warning message says, "Check if your polling is hanging, and/or increase the number of threads if necessary", but as far as I can determine there is no way to increase the number of threads in current versions of Jenkins. Is that really the case? Thanks for your time.

YenChiu Ku added a comment - 2014-07-01 11:17

Nathaniel,

Add "import jenkins.model.Jenkins" in the beginning. I think it will solve ""FATAL: No such property: Jenkins for class: Script1" issue.

YenChiu Ku added a comment - 2014-07-01 11:17 Nathaniel, Add "import jenkins.model.Jenkins" in the beginning. I think it will solve ""FATAL: No such property: Jenkins for class: Script1" issue.

Nathaniel Irons added a comment - 2014-07-10 01:42

Thanks. Adding that line does fix the execution error. However, the full script, while it also reports successful-looking output, fails to interrupt any threads.

[EnvInject] - Loading node environment variables.
Building in workspace /Users/sbuxagent/Jenkins/Home/jobs/Zap Polling Threads/workspace
android-malaysia
1 day 16 hr
1404809940627
Interrupting thread SCM polling for hudson.matrix.MatrixProject@2e2cb699[android-malaysia]
ios-hongkong
1 day 16 hr
1404810120578
Interrupting thread SCM polling for hudson.matrix.MatrixProject@128e42d[ios-hongkong]
ios-china
1 day 16 hr
1404809940628
Interrupting thread SCM polling for hudson.matrix.MatrixProject@68f60dc8[ios-china]
Script returned: [hudson.triggers.SCMTrigger$Runner@2e2cb699, hudson.triggers.SCMTrigger$Runner@128e42d, hudson.triggers.SCMTrigger$Runner@68f60dc8]
Finished: SUCCESS

I can run the script over and over, and it continues to report those same three thread IDs.

I don't know why, but this stuck-thread problem disappeared for a couple of weeks. Now it's back. I'm going to update to the latest jenkins from 1.565 and see if anything's improved.

Nathaniel Irons added a comment - 2014-07-10 01:42 Thanks. Adding that line does fix the execution error. However, the full script , while it also reports successful-looking output, fails to interrupt any threads. [EnvInject] - Loading node environment variables. Building in workspace /Users/sbuxagent/Jenkins/Home/jobs/Zap Polling Threads/workspace android-malaysia 1 day 16 hr 1404809940627 Interrupting thread SCM polling for hudson.matrix.MatrixProject@2e2cb699[android-malaysia] ios-hongkong 1 day 16 hr 1404810120578 Interrupting thread SCM polling for hudson.matrix.MatrixProject@128e42d[ios-hongkong] ios-china 1 day 16 hr 1404809940628 Interrupting thread SCM polling for hudson.matrix.MatrixProject@68f60dc8[ios-china] Script returned: [hudson.triggers.SCMTrigger$Runner@2e2cb699, hudson.triggers.SCMTrigger$Runner@128e42d, hudson.triggers.SCMTrigger$Runner@68f60dc8] Finished: SUCCESS I can run the script over and over, and it continues to report those same three thread IDs. I don't know why, but this stuck-thread problem disappeared for a couple of weeks. Now it's back. I'm going to update to the latest jenkins from 1.565 and see if anything's improved.

Nathaniel Irons added a comment - 2014-07-10 14:53

In 1.571, about ten hours after updating to Jenkins 1.571 (and increasing the number of polling threads from 4 to 8), I now see four polling threads that have been stuck for a little over eight hours. The big difference is that I used to be able to see which jobs had gotten stuck, but now none of the stuck threads are named: http://cl.ly/image/0W461v33053f

The same thread-interrupter script which was claiming success in 1.565 (but not actually cleaning up any threads) fails to run at all in 1.571:

FATAL: No such property: name for class: jenkins.triggers.SCMTriggerItem$SCMTriggerItems$Bridge

The full stack trace is available at https://gist.github.com/irons/1f804e69c0cd6d0b7f20, and the script, unchanged from last night, is at https://gist.github.com/irons/09090503150e119f7096

The shorter script, posted above on May 29, continues to execute and return success, but doesn't result in a net reduction of stuck threads. Now that I can no longer tell which jobs are affected, this Jenkins upgrade appears to have deepened the problem.

Nathaniel Irons added a comment - 2014-07-10 14:53 In 1.571, about ten hours after updating to Jenkins 1.571 (and increasing the number of polling threads from 4 to 8), I now see four polling threads that have been stuck for a little over eight hours. The big difference is that I used to be able to see which jobs had gotten stuck, but now none of the stuck threads are named: http://cl.ly/image/0W461v33053f The same thread-interrupter script which was claiming success in 1.565 (but not actually cleaning up any threads) fails to run at all in 1.571: FATAL: No such property: name for class: jenkins.triggers.SCMTriggerItem$SCMTriggerItems$Bridge The full stack trace is available at https://gist.github.com/irons/1f804e69c0cd6d0b7f20 , and the script, unchanged from last night, is at https://gist.github.com/irons/09090503150e119f7096 The shorter script, posted above on May 29, continues to execute and return success, but doesn't result in a net reduction of stuck threads. Now that I can no longer tell which jobs are affected, this Jenkins upgrade appears to have deepened the problem.

Daniel Beck added a comment - 2014-07-10 15:34

UI issue described by ndirons likely caused in this commit when the type was changed without adjusting the polling page to make sure to call asItem().

Daniel Beck added a comment - 2014-07-10 15:34 UI issue described by ndirons likely caused in this commit when the type was changed without adjusting the polling page to make sure to call asItem().

Daniel Beck added a comment - 2014-08-08 14:48

Possible solution to issue with SCMTrigger status page described by ndirons proposed: https://github.com/jenkinsci/jenkins/pull/1355

Daniel Beck added a comment - 2014-08-08 14:48 Possible solution to issue with SCMTrigger status page described by ndirons proposed: https://github.com/jenkinsci/jenkins/pull/1355

sharon xia added a comment - 2014-08-27 02:55

We are also seeing this issue.

sharon xia added a comment - 2014-08-27 02:55 We are also seeing this issue.

sharon xia added a comment - 2014-08-27 03:41

02:36:16 Started by upstream project "echidna-patch-quality" build number 335
02:36:16 originally caused by:
02:36:16 Started by command line by xxx
02:36:16 [EnvInject] - Loading node environment variables.
02:36:17 Building remotely on ECHIDNA-QUALITY (6.1 windows-6.1 windows amd64-windows amd64-windows-6.1 amd64) in workspace c:\buildfarm-slave\workspace\echidna-patch-compile
02:36:18 > git rev-parse --is-inside-work-tree
02:36:19 Fetching changes from the remote Git repository
02:36:19 > git config remote.origin.url ssh://*@...:*/ghts/ta
02:36:20 Fetching upstream changes from ssh://*@...:*/ghts/ta
02:36:20 > git --version
02:36:20 > git fetch --tags --progress ssh://*@...:/ghts/ta +refs/heads/:refs/remotes/origin/*
02:56:20 ERROR: Timeout after 20 minutes
02:56:20 FATAL: Failed to fetch from ssh://*@...:*/ghts/ta
02:56:20 hudson.plugins.git.GitException: Failed to fetch from ssh://bmcdiags@10.110.61.117:30000/ghts/ta
02:56:20 at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:623)
02:56:20 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:855)
02:56:20 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:880)
02:56:20 at hudson.model.AbstractProject.checkout(AbstractProject.java:1414)
02:56:20 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671)
02:56:20 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
02:56:20 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580)
02:56:20 at hudson.model.Run.execute(Run.java:1684)
02:56:20 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
02:56:20 at hudson.model.ResourceController.execute(ResourceController.java:88)
02:56:20 at hudson.model.Executor.run(Executor.java:231)
02:56:20 Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress ssh://*@...:/ghts/ta +refs/heads/:refs/remotes/origin/*" returned status code -1:
02:56:20 stdout:
02:56:20 stderr: Could not create directory 'c/Users/Administrator/.ssh'.
02:56:20
02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1325)
02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1186)
02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$200(CliGitAPIImpl.java:87)
02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:257)
02:56:20 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
02:56:20 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
02:56:20 at hudson.remoting.UserRequest.perform(UserRequest.java:118)
02:56:20 at hudson.remoting.UserRequest.perform(UserRequest.java:48)
02:56:20 at hudson.remoting.Request$2.run(Request.java:326)
02:56:20 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
02:56:20 at java.util.concurrent.FutureTask.run(Unknown Source)
02:56:20 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
02:56:20 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
02:56:20 at hudson.remoting.Engine$1$1.run(Engine.java:63)
02:56:20 at java.lang.Thread.run(Unknown Source)

sharon xia added a comment - 2014-08-27 03:41 02:36:16 Started by upstream project "echidna-patch-quality" build number 335 02:36:16 originally caused by: 02:36:16 Started by command line by xxx 02:36:16 [EnvInject] - Loading node environment variables. 02:36:17 Building remotely on ECHIDNA-QUALITY (6.1 windows-6.1 windows amd64-windows amd64-windows-6.1 amd64) in workspace c:\buildfarm-slave\workspace\echidna-patch-compile 02:36:18 > git rev-parse --is-inside-work-tree 02:36:19 Fetching changes from the remote Git repository 02:36:19 > git config remote.origin.url ssh://* @ . . . : */ghts/ta 02:36:20 Fetching upstream changes from ssh://* @ . . . : */ghts/ta 02:36:20 > git --version 02:36:20 > git fetch --tags --progress ssh://* @ . . . : /ghts/ta +refs/heads/ :refs/remotes/origin/* 02:56:20 ERROR: Timeout after 20 minutes 02:56:20 FATAL: Failed to fetch from ssh://* @ . . . : */ghts/ta 02:56:20 hudson.plugins.git.GitException: Failed to fetch from ssh://bmcdiags@10.110.61.117:30000/ghts/ta 02:56:20 at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:623) 02:56:20 at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:855) 02:56:20 at hudson.plugins.git.GitSCM.checkout(GitSCM.java:880) 02:56:20 at hudson.model.AbstractProject.checkout(AbstractProject.java:1414) 02:56:20 at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:671) 02:56:20 at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88) 02:56:20 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:580) 02:56:20 at hudson.model.Run.execute(Run.java:1684) 02:56:20 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) 02:56:20 at hudson.model.ResourceController.execute(ResourceController.java:88) 02:56:20 at hudson.model.Executor.run(Executor.java:231) 02:56:20 Caused by: hudson.plugins.git.GitException: Command "git fetch --tags --progress ssh://* @ . . . : /ghts/ta +refs/heads/ :refs/remotes/origin/*" returned status code -1: 02:56:20 stdout: 02:56:20 stderr: Could not create directory 'c/Users/Administrator/.ssh'. 02:56:20 02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1325) 02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1186) 02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$200(CliGitAPIImpl.java:87) 02:56:20 at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:257) 02:56:20 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153) 02:56:20 at org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146) 02:56:20 at hudson.remoting.UserRequest.perform(UserRequest.java:118) 02:56:20 at hudson.remoting.UserRequest.perform(UserRequest.java:48) 02:56:20 at hudson.remoting.Request$2.run(Request.java:326) 02:56:20 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) 02:56:20 at java.util.concurrent.FutureTask.run(Unknown Source) 02:56:20 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 02:56:20 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 02:56:20 at hudson.remoting.Engine$1$1.run(Engine.java:63) 02:56:20 at java.lang.Thread.run(Unknown Source)

Daniel Beck added a comment - 2014-08-27 09:02

sharon_xia: That's a completely different issue. This issue is about polling that NEVER finishes, yours aborts after 20 minutes. It even seems to tell you what the problem is: Could not create directory 'c/Users/Administrator/.ssh'.

To request further assistance, please ask on the jenkinsci-users mailing list or in #jenkins on Freenode. This thread is long enough already.

Daniel Beck added a comment - 2014-08-27 09:02 sharon_xia : That's a completely different issue. This issue is about polling that NEVER finishes, yours aborts after 20 minutes. It even seems to tell you what the problem is: Could not create directory 'c/Users/Administrator/.ssh'. To request further assistance, please ask on the jenkinsci-users mailing list or in #jenkins on Freenode. This thread is long enough already.

Brian Smith added a comment - 2014-09-09 21:21

I haven't had this issue since we started doing weekly reboots of the whole system (master and nodes).

Brian Smith added a comment - 2014-09-09 21:21 I haven't had this issue since we started doing weekly reboots of the whole system (master and nodes).

mark 3000 added a comment - 2014-10-07 14:20

We encountered this issue for the first time (that I'm aware of) after upgrading to 1.583 from 1.578.

mark 3000 added a comment - 2014-10-07 14:20 We encountered this issue for the first time (that I'm aware of) after upgrading to 1.583 from 1.578.

marlene cote added a comment - 2014-11-19 13:29

We are seeing this too! It is having a huge impact on our productivity!! We too upgraded to 1.583.

Please help.

marlene cote added a comment - 2014-11-19 13:29 We are seeing this too! It is having a huge impact on our productivity!! We too upgraded to 1.583. Please help.

Morten Engelhardt Olsen added a comment - 2015-01-13 07:24 - edited

At Atmel we're now managing this issue by having the following system groovy script run every couple of minutes to monitor the processor load:

import java.lang.management.*;

def threadBean = ManagementFactory.getThreadMXBean();
def osBean     = ManagementFactory.getOperatingSystemMXBean();

println "\n\n\n[Checking state of (master)]";

println "Current CPU Time used by Jenkins: " + threadBean.getCurrentThreadCpuTime() + "ns";

double processLoad = (osBean.getProcessCpuLoad() * 100).round(2);
double cpuLoad = (osBean.getSystemCpuLoad() * 100).round(2);
println "Process CPU Load: " + processLoad + "%";
println "CPU Load: " + cpuLoad + "%";

if (processLoad < 90) {
  println "\n\n\n === Load is less than 90%, nothing to do ===\n\n\n";
  println "\n\n\n[Done checking: CPU Load: " + cpuLoad + "%]\n\n\n";
  return;
} else {
  println "\n\n\n === Load is more than 90%, checking for stuck threads! ===\n\n\n";
}


println "\n\n\n[Checking all threads]\n\n\n";
def threadNum = 0;
def killThreadNum = 0;

def stacktraces = Thread.getAllStackTraces();
stacktraces.each { thread, stack ->
  if (thread.getName().contains("trigger/TimerTrigger/check") ) {
    println "=== Interrupting thread " + thread.getName()+ " ===";
    thread.interrupt();
    killThreadNum++;
  }
  threadNum++;
}

println "\n\n\n[Done checking: " + threadNum + " threads, killed " + killThreadNum + "]\n\n\n";

return; // Suppress groovy state dump

Note that we had to check for TimerTrigger, not SCM Polling as the original code did. This is currently running on 1.580.2.

Morten Engelhardt Olsen added a comment - 2015-01-13 07:24 - edited At Atmel we're now managing this issue by having the following system groovy script run every couple of minutes to monitor the processor load: import java.lang.management.*; def threadBean = ManagementFactory.getThreadMXBean(); def osBean = ManagementFactory.getOperatingSystemMXBean(); println "\n\n\n[Checking state of (master)]" ; println "Current CPU Time used by Jenkins: " + threadBean.getCurrentThreadCpuTime() + "ns" ; double processLoad = (osBean.getProcessCpuLoad() * 100).round(2); double cpuLoad = (osBean.getSystemCpuLoad() * 100).round(2); println " Process CPU Load: " + processLoad + "%" ; println "CPU Load: " + cpuLoad + "%" ; if (processLoad < 90) { println "\n\n\n === Load is less than 90%, nothing to do ===\n\n\n" ; println "\n\n\n[Done checking: CPU Load: " + cpuLoad + "%]\n\n\n" ; return ; } else { println "\n\n\n === Load is more than 90%, checking for stuck threads! ===\n\n\n" ; } println "\n\n\n[Checking all threads]\n\n\n" ; def threadNum = 0; def killThreadNum = 0; def stacktraces = Thread .getAllStackTraces(); stacktraces.each { thread, stack -> if (thread.getName().contains( "trigger/TimerTrigger/check" ) ) { println "=== Interrupting thread " + thread.getName()+ " ===" ; thread.interrupt(); killThreadNum++; } threadNum++; } println "\n\n\n[Done checking: " + threadNum + " threads, killed " + killThreadNum + "]\n\n\n" ; return ; // Suppress groovy state dump Note that we had to check for TimerTrigger , not SCM Polling as the original code did. This is currently running on 1.580.2.

Nathaniel Irons added a comment - 2015-01-23 23:34

The script provided on Jan 13 seems to be solving a different problem. On our instance, we see stuck SCM polling threads even when the CPU load is zero. With three SCM polling processes stuck as of this moment, the thread names reported by Thread.getAllStackTraces() are main, Finalizer, Signal Dispatcher, and Reference Handler.

I'm pig-ignorant of groovy, and have yet to figure out where its access to Jenkins thread innards are documented, but previous iterations of scripts that did identify a stuck thread to interrupt were ineffective for us — we've yet to find an effective workaround that doesn't rely on restarting the jenkins daemon.

We're using 1.590, and looking to switch to LTS releases as soon as they pass us by.

Nathaniel Irons added a comment - 2015-01-23 23:34 The script provided on Jan 13 seems to be solving a different problem. On our instance, we see stuck SCM polling threads even when the CPU load is zero. With three SCM polling processes stuck as of this moment, the thread names reported by Thread.getAllStackTraces() are main, Finalizer, Signal Dispatcher, and Reference Handler. I'm pig-ignorant of groovy, and have yet to figure out where its access to Jenkins thread innards are documented, but previous iterations of scripts that did identify a stuck thread to interrupt were ineffective for us — we've yet to find an effective workaround that doesn't rely on restarting the jenkins daemon. We're using 1.590, and looking to switch to LTS releases as soon as they pass us by.

Andrew Hoffmann added a comment - 2015-07-30 19:24

We are experiencing git polling getting hung as well. We have ~15 jobs that poll every 5 minutes. It gets hung roughly 24 hours after a service restart. We also have the BitBucket pull request builder polling every 5 minutes for another ~15 jobs.

Jenkins v1.622
git plugin 2.4.0
git-client plugin 1.18.0
bitbucket-pullrequest-builder plugin 1.4.7

0-30 minutes prior to being hung, I see this exception:

WARNING: Process leaked file descriptors. See http://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information
java.lang.Exception
	at hudson.Proc$LocalProc.join(Proc.java:329)
	at hudson.Proc.joinWithTimeout(Proc.java:168)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1596)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1576)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1572)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1233)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$4.execute(CliGitAPIImpl.java:583)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1310)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1261)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1252)
	at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.getHeadRev(CliGitAPIImpl.java:2336)
	at hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:583)
	at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527)
	at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381)
	at hudson.scm.SCM.poll(SCM.java:398)
	at hudson.model.AbstractProject._poll(AbstractProject.java:1461)
	at hudson.model.AbstractProject.poll(AbstractProject.java:1364)
	at jenkins.triggers.SCMTriggerItem$SCMTriggerItems$Bridge.poll(SCMTriggerItem.java:119)
	at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:510)
	at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:539)
	at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

I will be happy to provide more configuration details and logs if requested.

Andrew Hoffmann added a comment - 2015-07-30 19:24 We are experiencing git polling getting hung as well. We have ~15 jobs that poll every 5 minutes. It gets hung roughly 24 hours after a service restart. We also have the BitBucket pull request builder polling every 5 minutes for another ~15 jobs. Jenkins v1.622 git plugin 2.4.0 git-client plugin 1.18.0 bitbucket-pullrequest-builder plugin 1.4.7 0-30 minutes prior to being hung, I see this exception: WARNING: Process leaked file descriptors. See http: //wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build for more information java.lang.Exception at hudson.Proc$LocalProc.join(Proc.java:329) at hudson.Proc.joinWithTimeout(Proc.java:168) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1596) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1576) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1572) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.java:1233) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$4.execute(CliGitAPIImpl.java:583) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1310) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1261) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1252) at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.getHeadRev(CliGitAPIImpl.java:2336) at hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:583) at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:527) at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:381) at hudson.scm.SCM.poll(SCM.java:398) at hudson.model.AbstractProject._poll(AbstractProject.java:1461) at hudson.model.AbstractProject.poll(AbstractProject.java:1364) at jenkins.triggers.SCMTriggerItem$SCMTriggerItems$Bridge.poll(SCMTriggerItem.java:119) at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:510) at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:539) at hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang. Thread .run(Unknown Source) I will be happy to provide more configuration details and logs if requested.

Yves Martin added a comment - 2015-10-28 07:52 - edited

When investigating a Subversion SCM polling issue (~~JENKINS-31192~~), I find out that a global lock hudson.scm.SubversionSCM$ModuleLocation prevents threads to work concurrently. Is that "big lock" really relevant ? Maybe it is possible to reduce the code section when the lock is held.

Yves Martin added a comment - 2015-10-28 07:52 - edited When investigating a Subversion SCM polling issue ( JENKINS-31192 ), I find out that a global lock hudson.scm.SubversionSCM$ModuleLocation prevents threads to work concurrently. Is that "big lock" really relevant ? Maybe it is possible to reduce the code section when the lock is held.

Meng Xin Zhu added a comment - 2016-12-14 02:44

Still happening on Jenkins LTS 2.19.4.

My job is periodically polling from git repo(every 5 minutes). However the scm polling might hang infinitely without timeout. The subsequent manually job building also is blocked by scm polling. It's definitely a critical issue to impact the usability of jenkins.

Meng Xin Zhu added a comment - 2016-12-14 02:44 Still happening on Jenkins LTS 2.19.4. My job is periodically polling from git repo(every 5 minutes). However the scm polling might hang infinitely without timeout. The subsequent manually job building also is blocked by scm polling. It's definitely a critical issue to impact the usability of jenkins.

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Hans-Juergen Hafner added a comment - 2010-08-13 04:49

Expand comment: Hans-Juergen Hafner added a comment - 2010-08-13 04:49

Collapse comment: erwan_q added a comment - 2010-09-13 15:23

Expand comment: erwan_q added a comment - 2010-09-13 15:23

Collapse comment: lkishalmi added a comment - 2010-11-04 00:37

Expand comment: lkishalmi added a comment - 2010-11-04 00:37

Collapse comment: Dean Yu added a comment - 2010-11-18 10:53

Expand comment: Dean Yu added a comment - 2010-11-18 10:53

Collapse comment: eguess74 added a comment - 2011-03-09 19:19, Edited by eguess74 - 2011-03-09 19:20

Expand comment: eguess74 added a comment - 2011-03-09 19:19, Edited by eguess74 - 2011-03-09 19:20

Collapse comment: eguess74 added a comment - 2011-03-09 20:33, Edited by eguess74 - 2011-03-09 22:51

Expand comment: eguess74 added a comment - 2011-03-09 20:33, Edited by eguess74 - 2011-03-09 22:51

Collapse comment: eguess74 added a comment - 2011-03-10 15:09

Expand comment: eguess74 added a comment - 2011-03-10 15:09

Collapse comment: Stephen Connolly added a comment - 2011-06-07 08:38

Expand comment: Stephen Connolly added a comment - 2011-06-07 08:38

Collapse comment: Thomas Mielke added a comment - 2011-07-19 16:23, Edited by Thomas Mielke - 2011-07-19 16:32

Expand comment: Thomas Mielke added a comment - 2011-07-19 16:23, Edited by Thomas Mielke - 2011-07-19 16:32

Collapse comment: Frederic Pesquet added a comment - 2011-09-08 07:54

Expand comment: Frederic Pesquet added a comment - 2011-09-08 07:54

Collapse comment: carlspring added a comment - 2011-09-08 08:13

Expand comment: carlspring added a comment - 2011-09-08 08:13

Collapse comment: Frederic Pesquet added a comment - 2011-09-08 14:05

Expand comment: Frederic Pesquet added a comment - 2011-09-08 14:05

Collapse comment: Lars Kruse added a comment - 2011-10-19 09:59

Expand comment: Lars Kruse added a comment - 2011-10-19 09:59

Collapse comment: Jes Struck added a comment - 2011-10-19 10:06

Expand comment: Jes Struck added a comment - 2011-10-19 10:06

Collapse comment: Jes Struck added a comment - 2011-12-05 15:17

Expand comment: Jes Struck added a comment - 2011-12-05 15:17

Collapse comment: Vitalii Grygoruk added a comment - 2011-12-05 15:57

Expand comment: Vitalii Grygoruk added a comment - 2011-12-05 15:57

Collapse comment: Jes Struck added a comment - 2011-12-05 19:12

Expand comment: Jes Struck added a comment - 2011-12-05 19:12

Collapse comment: Christian Wolfgang added a comment - 2011-12-16 21:24

Expand comment: Christian Wolfgang added a comment - 2011-12-16 21:24

Collapse comment: Markus added a comment - 2012-01-12 09:18

Expand comment: Markus added a comment - 2012-01-12 09:18

Collapse comment: Franck Gilliers added a comment - 2012-02-08 10:14

Expand comment: Franck Gilliers added a comment - 2012-02-08 10:14

Collapse comment: Christian Wolfgang added a comment - 2012-02-08 11:21

Expand comment: Christian Wolfgang added a comment - 2012-02-08 11:21

Collapse comment: Guido Serra added a comment - 2012-03-23 11:46

Expand comment: Guido Serra added a comment - 2012-03-23 11:46

Collapse comment: Christian Wolfgang added a comment - 2012-04-16 12:36

Expand comment: Christian Wolfgang added a comment - 2012-04-16 12:36

Collapse comment: Joe Hansche added a comment - 2012-06-08 19:05

Expand comment: Joe Hansche added a comment - 2012-06-08 19:05

Collapse comment: Alex Lorenz added a comment - 2012-06-27 11:34

Expand comment: Alex Lorenz added a comment - 2012-06-27 11:34

Collapse comment: Christian Wolfgang added a comment - 2012-06-27 14:07

Expand comment: Christian Wolfgang added a comment - 2012-06-27 14:07

Collapse comment: Mandeep Rai added a comment - 2012-09-04 17:04, Edited by Mandeep Rai - 2012-09-04 17:05

Expand comment: Mandeep Rai added a comment - 2012-09-04 17:04, Edited by Mandeep Rai - 2012-09-04 17:05

Collapse comment: lacostej added a comment - 2013-01-14 20:15

Expand comment: lacostej added a comment - 2013-01-14 20:15

Collapse comment: Raja Aluri added a comment - 2013-02-08 18:31

Expand comment: Raja Aluri added a comment - 2013-02-08 18:31

Collapse comment: Linards L added a comment - 2013-03-08 09:17, Edited by Linards L - 2013-03-08 09:19

Expand comment: Linards L added a comment - 2013-03-08 09:17, Edited by Linards L - 2013-03-08 09:19

Collapse comment: Derek Seibert added a comment - 2013-03-25 13:37, Edited by Derek Seibert - 2013-03-25 13:44

Expand comment: Derek Seibert added a comment - 2013-03-25 13:37, Edited by Derek Seibert - 2013-03-25 13:44

Collapse comment: lacostej added a comment - 2013-03-25 14:41

Expand comment: lacostej added a comment - 2013-03-25 14:41

Collapse comment: Manuel de la Peña added a comment - 2013-04-16 06:24

Expand comment: Manuel de la Peña added a comment - 2013-04-16 06:24

Collapse comment: Brian Smith added a comment - 2013-10-28 22:07

Expand comment: Brian Smith added a comment - 2013-10-28 22:07

Collapse comment: Dmitry Maslakov added a comment - 2014-04-09 02:40, Edited by Dmitry Maslakov - 2014-04-09 02:41

Expand comment: Dmitry Maslakov added a comment - 2014-04-09 02:40, Edited by Dmitry Maslakov - 2014-04-09 02:41

Collapse comment: Dmitry Maslakov added a comment - 2014-04-09 06:53

Expand comment: Dmitry Maslakov added a comment - 2014-04-09 06:53