-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Centos 6.6
Jenkins 1.621
See attachment system_info.txt for full dump.
-
Powered by SuggestiMate
This issue was originally opened off the stash-build-plugin but after further investigation we found that the cron system of core jenkins stop working as expected.
Approximately weekly as of now with the introduction of more cron based plugins like stash-pullrequest-builder; the cron system stops working. With jenkins_debug_10072015.tar attachment; the system indicated for most jobs a last time polling of around 9:05-9:10am. All polling cron type jobs no longer responded.
Very close to JENKINS-25704; almost a duplicate.
[JENKINS-30558] Cron based jobs are no longer triggered
Adding jenkins_cron_problem-10-7-2015 with all support, lsof dump, sysconfig, ect... As much as I could initially gather.
Hit the issue on 10/8/2015 too. 3:39:00 PM was the last log that was logged by the logger. This correlates with all the polling jobs no longer being scheduled.
Only interesting item I found in the jenkins log was an exception at 3:37:00
Oct 08, 2015 3:37:13 PM org.eclipse.jetty.util.log.JavaUtilLog warn
WARNING:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:479)
at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:337)
at org.eclipse.jetty.io.nio.SslConnection.access$900(SslConnection.java:48)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.flush(SslConnection.java:738)
at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.shutdownOutput(SslConnection.java:641)
at org.eclipse.jetty.io.nio.SslConnection.onIdleExpired(SslConnection.java:260)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.onIdleExpired(SelectChannelEndPoint.java:349)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:326)
at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Oct 08, 2015 3:38:00 PM stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder run
The issue may possibly be a result of stashbuilder after looking at the jstack trace on the master. I attached 2 jstack traces 5 minutes apart, see thread 23311. Its sitting at java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int). I'm not very familiar with the Timer code; but was wondering if this single job's socket not completelying in the timertask could cause impact to the whole system.
Thread 23311: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
- java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
- sun.security.ssl.InputRecord.readFully(java.io.InputStream, byte[], int, int) @bci=21, line=442 (Compiled frame)
- sun.security.ssl.InputRecord.read(java.io.InputStream, java.io.OutputStream) @bci=32, line=480 (Compiled frame)
- sun.security.ssl.SSLSocketImpl.readRecord(sun.security.ssl.InputRecord, boolean) @bci=44, line=934 (Compiled frame)
- sun.security.ssl.SSLSocketImpl.readDataRecord(sun.security.ssl.InputRecord) @bci=15, line=891 (Compiled frame)
- sun.security.ssl.AppInputStream.read(byte[], int, int) @bci=72, line=102 (Compiled frame)
- java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
- java.io.BufferedInputStream.read() @bci=12, line=254 (Compiled frame)
- org.apache.commons.httpclient.HttpParser.readRawLine(java.io.InputStream) @bci=19, line=78 (Compiled frame)
- org.apache.commons.httpclient.HttpParser.readLine(java.io.InputStream, java.lang.String) @bci=11, line=106 (Interpreted frame)
- org.apache.commons.httpclient.HttpConnection.readLine(java.lang.String) @bci=19, line=1116 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodBase.readStatusLine(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=36, line=1973 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodBase.readResponse(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=21, line=1735 (Compiled frame)
- org.apache.commons.httpclient.HttpMethodBase.execute(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=68, line=1098 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(org.apache.commons.httpclient.HttpMethod) @bci=135, line=398 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodDirector.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=288, line=171 (Interpreted frame)
- org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HostConfiguration, org.apache.commons.httpclient.HttpMethod, org.apache.commons.httpclient.HttpState) @bci=114, line=397 (Interpreted frame)
- org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=14, line=323 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getRequest(java.lang.String) @bci=69, line=140 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getPullRequests() @bci=5, line=44 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.StashRepository.getTargetPullRequests() @bci=12, line=57 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder.run() @bci=19, line=30 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.StashBuildTrigger.run() @bci=28, line=168 (Interpreted frame)
- hudson.triggers.Trigger.checkTriggers(java.util.Calendar) @bci=253, line=278 (Compiled frame)
- hudson.triggers.Trigger$Cron.doRun() @bci=43, line=217 (Interpreted frame)
- hudson.triggers.SafeTimerTask.run() @bci=8, line=51 (Interpreted frame)
- java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 (Compiled frame)
- java.util.concurrent.FutureTask.runAndReset() @bci=47, line=304 (Interpreted frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=178 (Interpreted frame)
- java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=293 (Interpreted frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
I still haven’t gotten into the heart of the Jenkins Core code but apparently 1 bad plug-in, as pretty obvious the below debugs, can have a detrimental impact on the system if it hangs within its own timer task code. Using jstack on the Jenkins process; I found the hang happens deep at native socket code causing the Stash Trigger to never complete. Thread dump looks like this:
Thread 23311: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
- java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
- sun.security.ssl.InputRecord.readFully(java.io.InputStream, byte[], int, int) @bci=21, line=442 (Compiled frame)
- sun.security.ssl.InputRecord.read(java.io.InputStream, java.io.OutputStream) @bci=32, line=480 (Compiled frame)
- sun.security.ssl.SSLSocketImpl.readRecord(sun.security.ssl.InputRecord, boolean) @bci=44, line=934 (Compiled frame)
- sun.security.ssl.SSLSocketImpl.readDataRecord(sun.security.ssl.InputRecord) @bci=15, line=891 (Compiled frame)
- sun.security.ssl.AppInputStream.read(byte[], int, int) @bci=72, line=102 (Compiled frame)
- java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
- java.io.BufferedInputStream.read() @bci=12, line=254 (Compiled frame)
- org.apache.commons.httpclient.HttpParser.readRawLine(java.io.InputStream) @bci=19, line=78 (Compiled frame)
- org.apache.commons.httpclient.HttpParser.readLine(java.io.InputStream, java.lang.String) @bci=11, line=106 (Interpreted frame)
- org.apache.commons.httpclient.HttpConnection.readLine(java.lang.String) @bci=19, line=1116 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodBase.readStatusLine(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=36, line=1973 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodBase.readResponse(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=21, line=1735 (Compiled frame)
- org.apache.commons.httpclient.HttpMethodBase.execute(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=68, line=1098 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(org.apache.commons.httpclient.HttpMethod) @bci=135, line=398 (Interpreted frame)
- org.apache.commons.httpclient.HttpMethodDirector.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=288, line=171 (Interpreted frame)
- org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HostConfiguration, org.apache.commons.httpclient.HttpMethod, org.apache.commons.httpclient.HttpState) @bci=114, line=397 (Interpreted frame)
- org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=14, line=323 (Interpreted frame)
- stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getRequest(java.lang.String) @bci=69, line=140 (Interpreted frame)
On the last hang-up I was able to kill the open socket and boom.. all cron jobs were back in business:
[root@triad-jenkins ~]# lsof -p 23220 | grep TCP | grep stash
java 23220 jenkins 894u IPv6 40628689 0t0 TCP triad-jenkins.cisco.com:60443->rtp-apl-stash1.cisco.com:https (ESTABLISHED)
echo -e "call close(894)\nquit" > gdb_commands
gdb -p 23220 --batch -x gdb_commands
After analyzing the code for the stash pull request builder I found some areas where some defensive code could be added to prevent this (https://github.com/jenkinsci/stash-pullrequest-builder-plugin/blob/master/src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java):
1. For the HttpClient set two parameters:
a. ConnectionTimeout : This denotes the time elapsed before the connection established or Server responded to connection request.
b. SoTimeout : Maximum period inactivity between two consecutive data packets arriving at client side after connection is established.
2. Related to item 1; apparently these two settings may not always work properly in some conditions in accordance to this bug filed against the jdk, https://bugs.openjdk.java.net/browse/JDK-8049846. Thus for further defense; I added each api call to a FutureTask and start the task in a new thread with a limit time of 30 seconds. After 30 seconds; a concurrent Timeout exception is thrown allowing the parent thread to abort the task and the http request.
This modification has been running on a private Jenkins server instance in our organization with positive results:
1. The sockets are being closed after each request from the client; causing less resources on both client / server
2. If a socket does get hung; we see the timer concurrent exception and the cron job is not being blocked
I'll be opening a pull request within the next 1-2 days once the changes are documented properly inline of the code.
jwstric2 I believe I'm experiencing the same issue. Can you send me the changes you made to the code to resolve this?
Thanks -james
jcnorman48, sorry for the late reply. See https://github.com/jenkinsci/stash-pullrequest-builder-plugin/pull/9. I fixed merge conflicts 3 times due to other commit breakages actively happening but the administrator of the plugin has yet to accept the fix.
jwstric2 thanks so much for the PR. I may just build off this and upload the plugin. This issue had been killing cron entirely a few times.
jcnorman48 The Stash Pull Requester plugin was a great initial choice for us to get started. Check out https://marketplace.atlassian.com/plugins/se.bjurr.prnfs.pull-request-notifier-for-stash/server/overview for the push model; this has been serving us well for over a month now (less load on Jenkins and the stash server too). https://christiangalsterer.wordpress.com/2015/04/23/continuous-integration-for-pull-requests-with-jenkins-and-stash/ gives a good walk-through.
Code changed in jenkins
User: Jonathan Strickland
Path:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/80431f422dce9ea763e59c28e35fef75113203b1
Log:
JENKINS-30558 - Defensive code to prevent native deadlock and cleanup socket code
Code changed in jenkins
User: Jonathan Strickland
Path:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/ed565dc0ea5cf42443c7ed00d802ec81ca2b6add
Log:
JENKINS-30558 - Add some inline comments to code changes
Code changed in jenkins
User: Jonathan Strickland
Path:
README.md
pom.xml
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashBuildTrigger.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashBuilds.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashCause.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashRepository.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashPullRequestResponse.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashPullRequestResponseValueRepository.java
src/main/resources/stashpullrequestbuilder/stashpullrequestbuilder/StashBuildTrigger/config.jelly
src/test/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/AdditionalParameterRegExTest.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/0a35b20e635a5a445ea4c6b0263216625dc4c719
Log:
Merge remote-tracking branch 'origin/master' into JENKINS-30558
Conflicts:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
Code changed in jenkins
User: Jonathan Strickland
Path:
README.md
pom.xml
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashRepository.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
src/main/resources/stashpullrequestbuilder/stashpullrequestbuilder/StashBuildTrigger/config.jelly
src/test/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/AdditionalParameterRegExTest.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/b71dd2cfd9366009a39a28166e0b123e8dc6132f
Log:
Merge remote-tracking branch 'origin/master' into JENKINS-30558
Conflicts:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
Code changed in jenkins
User: Jonathan Strickland
Path:
README.md
pom.xml
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashBuildTrigger.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashBuilds.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashCause.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/StashRepository.java
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
src/main/resources/stashpullrequestbuilder/stashpullrequestbuilder/StashBuildTrigger/config.jelly
src/test/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashPullRequestResponseValueRepositoryTest.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/9db4cbb1e24eab66090a80b7e1488f2e2a6453b6
Log:
Merge remote-tracking branch 'origin/master' into JENKINS-30558
Conflicts:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
Code changed in jenkins
User: Nathan
Path:
src/main/java/stashpullrequestbuilder/stashpullrequestbuilder/stash/StashApiClient.java
http://jenkins-ci.org/commit/stash-pullrequest-builder-plugin/02ddd009ec6a34ac4ca322e7204aef3babdf89ca
Log:
Merge pull request #9 from jwstric2/JENKINS-30558
Jenkins 30558
Compare: https://github.com/jenkinsci/stash-pullrequest-builder-plugin/compare/6424c319aec0...02ddd009ec6a
Resolved with Merge pull request #9 from jwstric2/JENKINS-30558.
Hi, I have updated our plugin to the latest version, and now cron based jobs can be triggered, but not as my expected, they become irregular. who can help me ...
Kang,
I would recommend opening up a new issue with your problem. If the upgrade of the Stash Trigger Plugin is causing issues, please note this in the new issue (to and from version)
Regards,
Jonathan
kanghao I have the exact issue as you do, that is the stash pull request plugin causes the cron based jobs to be triggered randomly. In addition, these jobs also get triggered when Stash becomes unavailable, such as during backup. Hence these jobs are being triggered every day at the beginning of the Stash backup, and all of them failed of course. This must have caused by the fix in the new release. They should be refixed to not introduce new bug. Hence it's better to continue use this same issue id to complete the resolution. Did you open a new issue for it? Can I have the new issue id?
Issue is still persisting.Entire Jenkins core trigger system has been affected due to stash pull request builder plugin :|
This issue has been automatically closed because of inactivity. Please reopen it if you think it's still valid
I see that the socket timeout and the request timeout were introduced in the same PR. Does anybody know a reason why request timeouts are needed in presence of socket timeouts? Running HTTP requests in separate threads adds a lot of complexity. Other plugins doesn't do it. I assume every request should use a limited number of packets sent over the socket, so the socket timeout should put a limit on the request duration.
I'm going to move requests back to the main thread unless I hear any objections.
Moving over to core. We saw this issue 2 times since then and found other areas of the system are impacted. The number of jobs + introduction of the cron polling to stash is putting a heavier load on system thus we are seeing the issue almost weekly now.