• Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Major Major
    • None
    • Centos 6.6
      Jenkins 1.621

      See attachment system_info.txt for full dump.

      This issue was originally opened off the stash-build-plugin but after further investigation we found that the cron system of core jenkins stop working as expected.

      Approximately weekly as of now with the introduction of more cron based plugins like stash-pullrequest-builder; the cron system stops working. With jenkins_debug_10072015.tar attachment; the system indicated for most jobs a last time polling of around 9:05-9:10am. All polling cron type jobs no longer responded.

      Very close to JENKINS-25704; almost a duplicate.

        1. triggers_fine_log_10_8_2015.txt
          47 kB
        2. system_info.txt
          10 kB
        3. support_2015-09-20_20.18.47.zip
          2.00 MB
        4. screenshot-2.png
          screenshot-2.png
          16 kB
        5. screenshot-1.png
          screenshot-1.png
          3 kB
        6. jenkins_threaddump_2.txt
          67 kB
        7. jenkins_threaddump_1.txt
          66 kB
        8. jenkins_logs.txt
          75 kB
        9. jenkins_cron_problem-10-7-2015.zip
          3.18 MB
        10. debug_sockets.txt
          14 kB

          [JENKINS-30558] Cron based jobs are no longer triggered

          Jonathan Strickland created issue -

          Moving over to core. We saw this issue 2 times since then and found other areas of the system are impacted. The number of jobs + introduction of the cron polling to stash is putting a heavier load on system thus we are seeing the issue almost weekly now.

          Jonathan Strickland added a comment - Moving over to core. We saw this issue 2 times since then and found other areas of the system are impacted. The number of jobs + introduction of the cron polling to stash is putting a heavier load on system thus we are seeing the issue almost weekly now.
          Jonathan Strickland made changes -
          Component/s New: core [ 15593 ]
          Component/s Original: stash-pullrequest-builder-plugin [ 20028 ]
          Description Original: This weekend we had an auth server act up causing errors with stash pull request builder. We no longer see logs incoming from pull requester being instantiated from the cron and new pull requests are not triggering Jenkins builds.

          Upon looking at Jenkins, we found a lot of close_wait sockets (meaning if I remember this correctly the remote end closed the connection but the client has not yet). Possibly running the garbage collector (I'll try after posting) may clean up the handles as the httpclient handle reference may be out of scope.
          New: This issue was originally opened off the stash-build-plugin but after further investigation we found that the cron system of core jenkins stop working as expected.

          Approximately weekly as of now with the introduction of more cron based plugins like stash-pullrequest-builder; the cron system stops working. With jenkins_debug_10072015.tar attachment; the system indicated for most jobs a last time polling of around 9:05-9:10am. All polling cron type jobs no longer responded.

          Very close to JENKINS-25704; almost a duplicate.
          Environment Original: Centos 6.6
          Jenkins 1.621
          stash-pullrequest-builder 1.3.1

          See attachment system_info.txt for full dump
          New: Centos 6.6
          Jenkins 1.621

          See attachment system_info.txt for full dump.
          Priority Original: Major [ 3 ] New: Critical [ 2 ]
          Summary Original: CLOSE_WAIT sockets on failures / Pull request builder no longer working New: Cron based jobs are no longer triggered
          Jonathan Strickland made changes -
          Assignee Original: nathan m [ nemccarthy ]

          Adding jenkins_cron_problem-10-7-2015 with all support, lsof dump, sysconfig, ect... As much as I could initially gather.

          Jonathan Strickland added a comment - Adding jenkins_cron_problem-10-7-2015 with all support, lsof dump, sysconfig, ect... As much as I could initially gather.
          Jonathan Strickland made changes -
          Attachment New: jenkins_cron_problem-10-7-2015.zip [ 30807 ]
          Jonathan Strickland made changes -
          Link New: This issue is related to JENKINS-25704 [ JENKINS-25704 ]

          Jonathan Strickland added a comment - - edited

          Hit the issue on 10/8/2015 too. 3:39:00 PM was the last log that was logged by the logger. This correlates with all the polling jobs no longer being scheduled.

          Only interesting item I found in the jenkins log was an exception at 3:37:00

          Oct 08, 2015 3:37:13 PM org.eclipse.jetty.util.log.JavaUtilLog warn
          WARNING:
          java.nio.channels.ClosedChannelException
          at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
          at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:479)
          at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293)
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402)
          at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:337)
          at org.eclipse.jetty.io.nio.SslConnection.access$900(SslConnection.java:48)
          at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.flush(SslConnection.java:738)
          at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.shutdownOutput(SslConnection.java:641)
          at org.eclipse.jetty.io.nio.SslConnection.onIdleExpired(SslConnection.java:260)
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint.onIdleExpired(SelectChannelEndPoint.java:349)
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:326)
          at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          at java.lang.Thread.run(Thread.java:745)

          Oct 08, 2015 3:38:00 PM stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder run

          Jonathan Strickland added a comment - - edited Hit the issue on 10/8/2015 too. 3:39:00 PM was the last log that was logged by the logger. This correlates with all the polling jobs no longer being scheduled. Only interesting item I found in the jenkins log was an exception at 3:37:00 Oct 08, 2015 3:37:13 PM org.eclipse.jetty.util.log.JavaUtilLog warn WARNING: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:479) at org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:293) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:402) at org.eclipse.jetty.io.nio.SslConnection.process(SslConnection.java:337) at org.eclipse.jetty.io.nio.SslConnection.access$900(SslConnection.java:48) at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.flush(SslConnection.java:738) at org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.shutdownOutput(SslConnection.java:641) at org.eclipse.jetty.io.nio.SslConnection.onIdleExpired(SslConnection.java:260) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.onIdleExpired(SelectChannelEndPoint.java:349) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:326) at winstone.BoundedExecutorService$1.run(BoundedExecutorService.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Oct 08, 2015 3:38:00 PM stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder run
          Jonathan Strickland made changes -
          Attachment New: triggers_fine_log_10_8_2015.txt [ 30810 ]

          The issue may possibly be a result of stashbuilder after looking at the jstack trace on the master. I attached 2 jstack traces 5 minutes apart, see thread 23311. Its sitting at java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int). I'm not very familiar with the Timer code; but was wondering if this single job's socket not completelying in the timertask could cause impact to the whole system.

          Thread 23311: (state = IN_NATIVE)

          • java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
          • java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
          • java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
          • sun.security.ssl.InputRecord.readFully(java.io.InputStream, byte[], int, int) @bci=21, line=442 (Compiled frame)
          • sun.security.ssl.InputRecord.read(java.io.InputStream, java.io.OutputStream) @bci=32, line=480 (Compiled frame)
          • sun.security.ssl.SSLSocketImpl.readRecord(sun.security.ssl.InputRecord, boolean) @bci=44, line=934 (Compiled frame)
          • sun.security.ssl.SSLSocketImpl.readDataRecord(sun.security.ssl.InputRecord) @bci=15, line=891 (Compiled frame)
          • sun.security.ssl.AppInputStream.read(byte[], int, int) @bci=72, line=102 (Compiled frame)
          • java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
          • java.io.BufferedInputStream.read() @bci=12, line=254 (Compiled frame)
          • org.apache.commons.httpclient.HttpParser.readRawLine(java.io.InputStream) @bci=19, line=78 (Compiled frame)
          • org.apache.commons.httpclient.HttpParser.readLine(java.io.InputStream, java.lang.String) @bci=11, line=106 (Interpreted frame)
          • org.apache.commons.httpclient.HttpConnection.readLine(java.lang.String) @bci=19, line=1116 (Interpreted frame)
          • org.apache.commons.httpclient.HttpMethodBase.readStatusLine(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=36, line=1973 (Interpreted frame)
          • org.apache.commons.httpclient.HttpMethodBase.readResponse(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=21, line=1735 (Compiled frame)
          • org.apache.commons.httpclient.HttpMethodBase.execute(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=68, line=1098 (Interpreted frame)
          • org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(org.apache.commons.httpclient.HttpMethod) @bci=135, line=398 (Interpreted frame)
          • org.apache.commons.httpclient.HttpMethodDirector.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=288, line=171 (Interpreted frame)
          • org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HostConfiguration, org.apache.commons.httpclient.HttpMethod, org.apache.commons.httpclient.HttpState) @bci=114, line=397 (Interpreted frame)
          • org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=14, line=323 (Interpreted frame)
          • stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getRequest(java.lang.String) @bci=69, line=140 (Interpreted frame)
          • stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getPullRequests() @bci=5, line=44 (Interpreted frame)
          • stashpullrequestbuilder.stashpullrequestbuilder.StashRepository.getTargetPullRequests() @bci=12, line=57 (Interpreted frame)
          • stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder.run() @bci=19, line=30 (Interpreted frame)
          • stashpullrequestbuilder.stashpullrequestbuilder.StashBuildTrigger.run() @bci=28, line=168 (Interpreted frame)
          • hudson.triggers.Trigger.checkTriggers(java.util.Calendar) @bci=253, line=278 (Compiled frame)
          • hudson.triggers.Trigger$Cron.doRun() @bci=43, line=217 (Interpreted frame)
          • hudson.triggers.SafeTimerTask.run() @bci=8, line=51 (Interpreted frame)
          • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 (Compiled frame)
          • java.util.concurrent.FutureTask.runAndReset() @bci=47, line=304 (Interpreted frame)
          • java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=178 (Interpreted frame)
          • java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=293 (Interpreted frame)
          • java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
          • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
          • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

          Jonathan Strickland added a comment - The issue may possibly be a result of stashbuilder after looking at the jstack trace on the master. I attached 2 jstack traces 5 minutes apart, see thread 23311. Its sitting at java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int). I'm not very familiar with the Timer code; but was wondering if this single job's socket not completelying in the timertask could cause impact to the whole system. Thread 23311: (state = IN_NATIVE) java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise) java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame) java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame) sun.security.ssl.InputRecord.readFully(java.io.InputStream, byte[], int, int) @bci=21, line=442 (Compiled frame) sun.security.ssl.InputRecord.read(java.io.InputStream, java.io.OutputStream) @bci=32, line=480 (Compiled frame) sun.security.ssl.SSLSocketImpl.readRecord(sun.security.ssl.InputRecord, boolean) @bci=44, line=934 (Compiled frame) sun.security.ssl.SSLSocketImpl.readDataRecord(sun.security.ssl.InputRecord) @bci=15, line=891 (Compiled frame) sun.security.ssl.AppInputStream.read(byte[], int, int) @bci=72, line=102 (Compiled frame) java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame) java.io.BufferedInputStream.read() @bci=12, line=254 (Compiled frame) org.apache.commons.httpclient.HttpParser.readRawLine(java.io.InputStream) @bci=19, line=78 (Compiled frame) org.apache.commons.httpclient.HttpParser.readLine(java.io.InputStream, java.lang.String) @bci=11, line=106 (Interpreted frame) org.apache.commons.httpclient.HttpConnection.readLine(java.lang.String) @bci=19, line=1116 (Interpreted frame) org.apache.commons.httpclient.HttpMethodBase.readStatusLine(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=36, line=1973 (Interpreted frame) org.apache.commons.httpclient.HttpMethodBase.readResponse(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=21, line=1735 (Compiled frame) org.apache.commons.httpclient.HttpMethodBase.execute(org.apache.commons.httpclient.HttpState, org.apache.commons.httpclient.HttpConnection) @bci=68, line=1098 (Interpreted frame) org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(org.apache.commons.httpclient.HttpMethod) @bci=135, line=398 (Interpreted frame) org.apache.commons.httpclient.HttpMethodDirector.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=288, line=171 (Interpreted frame) org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HostConfiguration, org.apache.commons.httpclient.HttpMethod, org.apache.commons.httpclient.HttpState) @bci=114, line=397 (Interpreted frame) org.apache.commons.httpclient.HttpClient.executeMethod(org.apache.commons.httpclient.HttpMethod) @bci=14, line=323 (Interpreted frame) stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getRequest(java.lang.String) @bci=69, line=140 (Interpreted frame) stashpullrequestbuilder.stashpullrequestbuilder.stash.StashApiClient.getPullRequests() @bci=5, line=44 (Interpreted frame) stashpullrequestbuilder.stashpullrequestbuilder.StashRepository.getTargetPullRequests() @bci=12, line=57 (Interpreted frame) stashpullrequestbuilder.stashpullrequestbuilder.StashPullRequestsBuilder.run() @bci=19, line=30 (Interpreted frame) stashpullrequestbuilder.stashpullrequestbuilder.StashBuildTrigger.run() @bci=28, line=168 (Interpreted frame) hudson.triggers.Trigger.checkTriggers(java.util.Calendar) @bci=253, line=278 (Compiled frame) hudson.triggers.Trigger$Cron.doRun() @bci=43, line=217 (Interpreted frame) hudson.triggers.SafeTimerTask.run() @bci=8, line=51 (Interpreted frame) java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471 (Compiled frame) java.util.concurrent.FutureTask.runAndReset() @bci=47, line=304 (Interpreted frame) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) @bci=1, line=178 (Interpreted frame) java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() @bci=37, line=293 (Interpreted frame) java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame) java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame) java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

            jwstric2 Jonathan Strickland
            jwstric2 Jonathan Strickland
            Votes:
            3 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: