-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Platform: All, OS: All
A few days ago we had a Mercurial server outage, with the result that all Hg
processes running at the time hung. (For technical reasons relating to network
config, the connections do not time out - they just hang forever.)
For those jobs running on master, the Hg polling was killed after an hour due to
issue #4461.
But for those jobs running on a slave,
SCMTrigger.DescriptorImpl.queue.inProgress shows them still active, even though
their polling log claims they were killed after an hour. A thread dump on master
confirms this:
"SCM polling for hudson.model.FreeStyleProject@164e3e2[apitest]" prio=10
tid=0xa0e0a400 nid=0x746e in Object.wait() [0xf77ff000..0xf77ff554]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at hudson.remoting.Request$1.get(Request.java:185)
- locked <0x69424868> (a hudson.remoting.UserRequest)
at hudson.remoting.Request$1.get(Request.java:165)
at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
at hudson.Proc$RemoteProc.join(Proc.java:290)
at
hudson.plugins.mercurial.MercurialSCM.joinWithTimeout(MercurialSCM.java:233)
at hudson.plugins.mercurial.MercurialSCM.pollChanges(MercurialSCM.java:192)
at hudson.model.AbstractProject.pollSCMChanges(AbstractProject.java:1032)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:317)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:344)
at
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:114)
It seems that even though proc.kill() was called in another thread, proc.join()
is still waiting.
Looking at the implementation, it is no wonder kill() does not work:
Request.callAsynch's Future.cancel just returns false and does nothing!
Shouldn't it call channel.send(new Cancel(id)) or abort(...) or something like this?
- is blocking
-
JENKINS-4461 Hg polling can hang indefinitely
- Closed