[JENKINS-5413] SCM polling getting hung

Type: Bug
Resolution: Unresolved
Priority: Critical
Component/s: remoting
Labels:
None

Similar Issues:
Powered by SuggestiMate

Show

This is to track the problem originally reported here: http://n4.nabble.com/Polling-hung-td1310838.html#a1310838
The referenced thread is relocated to http://jenkins.361315.n4.nabble.com/Polling-hung-td1310838.html

What the problem boils down to is that many remote operations are performed synchronously causing the channel object to be locked while a response returns. In situations where a lengthy remote operations is using the channel, SCM polling can be blocked waiting for the monitor on the channel to be released. In extreme situations, all the polling threads can wind up waiting on object monitors for the channel objects, preventing further processing of polling tasks.

Furthermore, if the slave dies, the locked channel object still exists in the master JVM. If no IOException is thrown to indicate the termination of the connection to the pipe, the channel can never be closed because Channel.close() itself is a sychronized operation.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

DUMP1.txt
57 kB
2013-01-14 20:15
hung_scm_pollers_02.PNG
145 kB
2010-06-28 09:03
thread_dump_02.txt
92 kB
2010-06-28 09:03
threads.vetted.txt
163 kB
2010-02-26 14:38

is related to

JENKINS-5760 Locked/hanged remote channel causing freezed job and lots of blocked threads.

Resolved

JENKINS-12302 Remote call on CLI channel from [ip] failed

Closed

JENKINS-19055 In case of connection loss, slave JVM should restart itself if it can

Resolved

links to

CloudBees Internal OSS-902

Dean Yu created issue - 2010-01-28 12:53

rsteele added a comment - 2010-02-01 11:30

I believe I'm seeing this problem as well and have been for a couple (maybe more?) weeks. One difference: I'm using the ClearCase plugin as my SCM provider, but otherwise the symptoms seem to be the same: one of my slaves, though still "online", seems to get stuck while polling for changes (though I don't see any ClearCase processes running in Process Explorer). Furthermore, killing the slave doesn't seem to do any good and the master doesn't even notice the slave has died.

rsteele added a comment - 2010-02-01 11:30 I believe I'm seeing this problem as well and have been for a couple (maybe more?) weeks. One difference: I'm using the ClearCase plugin as my SCM provider, but otherwise the symptoms seem to be the same: one of my slaves, though still "online", seems to get stuck while polling for changes (though I don't see any ClearCase processes running in Process Explorer). Furthermore, killing the slave doesn't seem to do any good and the master doesn't even notice the slave has died.

Carl Quinn added a comment - 2010-02-25 22:46

And I'm seeing it as well with a Perforce SCM.

Carl Quinn added a comment - 2010-02-25 22:46 And I'm seeing it as well with a Perforce SCM.

mdillon added a comment - 2010-02-26 14:38

Here is a stack dump from a Hudson master we were running after all 10 asychronous polling threads were hung. The job names, executor names, and internal classes names have been munged just in case. This thread dump appears to be missing the stack for the main thread for some reason, but I don't think that is a big deal.

Once our server got itself into this state, we were not able to unstick polling without a restart. Disconnecting the affected executor did not cause these threads to go away and reconnecting the executor did not cause polling to resume.

Our workaround has been to add a setting to revert the subversion plugin back to master-only polling on the affected installations. FWIW, we seem to see this on high-load Hudson installations.

mdillon added a comment - 2010-02-26 14:38 Here is a stack dump from a Hudson master we were running after all 10 asychronous polling threads were hung. The job names, executor names, and internal classes names have been munged just in case. This thread dump appears to be missing the stack for the main thread for some reason, but I don't think that is a big deal. Once our server got itself into this state, we were not able to unstick polling without a restart. Disconnecting the affected executor did not cause these threads to go away and reconnecting the executor did not cause polling to resume. Our workaround has been to add a setting to revert the subversion plugin back to master-only polling on the affected installations. FWIW, we seem to see this on high-load Hudson installations.

mdillon made changes - 2010-02-26 14:38

Attachment

New: threads.vetted.txt [ 19185 ]

mdillon added a comment - 2010-02-26 14:39

BTW, that thread dump was from a Hudson master running the equivalent of Hudson 1.322. I don't know if anyone else in the company has a thread dump from a more recent Hudson version.

mdillon added a comment - 2010-02-26 14:39 BTW, that thread dump was from a Hudson master running the equivalent of Hudson 1.322. I don't know if anyone else in the company has a thread dump from a more recent Hudson version.

dshields777 added a comment - 2010-03-09 11:22

I'm seeing the same behavior.

mdillon, you mention reverting to master-only polling as a workaround. How did you do that? Is there a config setting that I'm missing? Or do you mean you went back to an earlier version of the SVN plugin?

dshields777 added a comment - 2010-03-09 11:22 I'm seeing the same behavior. mdillon, you mention reverting to master-only polling as a workaround. How did you do that? Is there a config setting that I'm missing? Or do you mean you went back to an earlier version of the SVN plugin?

Dean Yu added a comment - 2010-03-09 22:42

@dshields777: We build our Hudson installation from source with some modifications. We added a switch to the Subversion plugin to poll from the master. We can certainly contribute this patch upstream so other people can use the same workaround.

Dean Yu added a comment - 2010-03-09 22:42 @dshields777: We build our Hudson installation from source with some modifications. We added a switch to the Subversion plugin to poll from the master. We can certainly contribute this patch upstream so other people can use the same workaround.

wgracelee added a comment - 2010-03-29 08:31

Hi, we have the same problem on Hudson 1.352 on using subversion. Is the patch available for download now?

wgracelee added a comment - 2010-03-29 08:31 Hi, we have the same problem on Hudson 1.352 on using subversion. Is the patch available for download now?

daniel_franzen added a comment - 2010-04-06 23:03

Hi, I've been seeing this issue too (since late January - I wish I could give you an exact version number). I haven't found any way to reliably reproduce it (it happens at random every other day). It occurred as late as yesterday, running Hudson 1.353 with Subversion Plugin 1.16.

The typical scenario in our case is as follows
1) A job hangs.*
2) The node becomes unusable. A job starting on the node stops at "Building remotely on MySlave".
3) SVN polling gets stuck.
4) The node can be made usable by disconnecting and reconnecting in Hudson's node management.
5) Polling only resumes after a Hudson restart.

*) Some background on how these job hang-ups manifest themselves:
There's one particular job that hangs often, but I can't determine what's special about it. It typically hangs at "Recording plot data". In other words, it's not during actual job execution, but just at the end. This occurs on any of our slave nodes - nodes that are running other Hudson jobs without a hitch. When I've removed plotting it hangs at archiving/fingerprinting instead. I suspect one of the code analysis plugins we run at the end (e.g. Findbugs) might be responsible. If you believe this is of interest to the SVN polling issue I'll be happy to provide more detailed information.

daniel_franzen added a comment - 2010-04-06 23:03 Hi, I've been seeing this issue too (since late January - I wish I could give you an exact version number). I haven't found any way to reliably reproduce it (it happens at random every other day). It occurred as late as yesterday, running Hudson 1.353 with Subversion Plugin 1.16. The typical scenario in our case is as follows 1) A job hangs.* 2) The node becomes unusable. A job starting on the node stops at "Building remotely on MySlave". 3) SVN polling gets stuck. 4) The node can be made usable by disconnecting and reconnecting in Hudson's node management. 5) Polling only resumes after a Hudson restart. *) Some background on how these job hang-ups manifest themselves: There's one particular job that hangs often, but I can't determine what's special about it. It typically hangs at "Recording plot data". In other words, it's not during actual job execution, but just at the end. This occurs on any of our slave nodes - nodes that are running other Hudson jobs without a hitch. When I've removed plotting it hangs at archiving/fingerprinting instead. I suspect one of the code analysis plugins we run at the end (e.g. Findbugs) might be responsible. If you believe this is of interest to the SVN polling issue I'll be happy to provide more detailed information.

Assignee:: Unassigned

Reporter:: Dean Yu

Votes:: 141 Vote for this issue

Watchers:: 147 Start watching this issue

Created:: 2010-01-28 12:53

Updated:: 2024-08-11 03:09

Jenkins

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: rsteele added a comment - 2010-02-01 11:30

Expand comment: rsteele added a comment - 2010-02-01 11:30

Collapse comment: Carl Quinn added a comment - 2010-02-25 22:46

Expand comment: Carl Quinn added a comment - 2010-02-25 22:46

Collapse comment: mdillon added a comment - 2010-02-26 14:38

Expand comment: mdillon added a comment - 2010-02-26 14:38

Collapse comment: mdillon added a comment - 2010-02-26 14:39

Expand comment: mdillon added a comment - 2010-02-26 14:39

Collapse comment: dshields777 added a comment - 2010-03-09 11:22

Expand comment: dshields777 added a comment - 2010-03-09 11:22

Collapse comment: Dean Yu added a comment - 2010-03-09 22:42

Expand comment: Dean Yu added a comment - 2010-03-09 22:42

Collapse comment: wgracelee added a comment - 2010-03-29 08:31

Expand comment: wgracelee added a comment - 2010-03-29 08:31

Collapse comment: daniel_franzen added a comment - 2010-04-06 23:03

Expand comment: daniel_franzen added a comment - 2010-04-06 23:03

People

Dates