Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-19123

ghprb prevents Jenkins cron thread from running

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      The ghprb trigger is blocking the running of other Jenkins scheduled jobs (such as the NodeProvisionerInvoker).

      https://gist.github.com/recampbell/ce903e95d846af8ce390

      It seems as though there is no timeout for these GitHub API requests, so if they are being blocked due to API quota issues, the entire Jenkins is effectively blocked.

        Attachments

          Issue Links

            Activity

            Hide
            rsennewald Ray Sennewald added a comment -

            I've seen the same behavior as well on our Jenkins servers running GHPRB. This only happens from time to time, and the temporary fix is to reboot Jenkins when this occurs. Would be nice if we could have a timeout for these as recampbell mentioned.

            Show
            rsennewald Ray Sennewald added a comment - I've seen the same behavior as well on our Jenkins servers running GHPRB. This only happens from time to time, and the temporary fix is to reboot Jenkins when this occurs. Would be nice if we could have a timeout for these as recampbell mentioned.
            Hide
            nnutter Nathan Nutter added a comment -

            I believe I'm having the same problem. I'm attaching a threadDump in case it helps. I really know nothing about Java or thread dumps but it looks like a bunch of timer threads are blocked and one timer thread it persisting indefinitely which happens to look like this problem.

            Show
            nnutter Nathan Nutter added a comment - I believe I'm having the same problem. I'm attaching a threadDump in case it helps. I really know nothing about Java or thread dumps but it looks like a bunch of timer threads are blocked and one timer thread it persisting indefinitely which happens to look like this problem.
            Hide
            nnutter Nathan Nutter added a comment -

            This snippet from logs may be relevant too:

            Oct 20, 2014 10:11:22 PM org.jenkinsci.plugins.ghprb.GhprbRepository initGhRepository
            SEVERE: Error while accessing rate limit API
            java.net.ConnectException: Connection timed out
                    at java.net.PlainSocketImpl.socketConnect(Native Method)
                    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
                    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
                    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
                    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
                    at java.net.Socket.connect(Socket.java:546)
                    at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:590)
                    at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160)
                    at sun.net.NetworkClient.doConnect(NetworkClient.java:178)
                    at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
                    at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
                    at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
                    at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:332)
                    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
                    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
                    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
                    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
                    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
                    at org.kohsuke.github.Requester.parse(Requester.java:383)
                    at org.kohsuke.github.Requester._to(Requester.java:185)
                    at org.kohsuke.github.Requester.to(Requester.java:160)
                    at org.kohsuke.github.GitHub.getRateLimit(GitHub.java:247)
                    at org.jenkinsci.plugins.ghprb.GhprbRepository.initGhRepository(GhprbRepository.java:50)
                    at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:72)
                    at org.jenkinsci.plugins.ghprb.Ghprb.run(Ghprb.java:97)
                    at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:143)
                    at hudson.triggers.Trigger.checkTriggers(Trigger.java:266)
                    at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
                    at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                    at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
                    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
                    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                    at java.lang.Thread.run(Thread.java:701)
            
            Show
            nnutter Nathan Nutter added a comment - This snippet from logs may be relevant too: Oct 20, 2014 10:11:22 PM org.jenkinsci.plugins.ghprb.GhprbRepository initGhRepository SEVERE: Error while accessing rate limit API java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) at java.net.Socket.connect(Socket.java:546) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:590) at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160) at sun.net.NetworkClient.doConnect(NetworkClient.java:178) at sun.net.www.http.HttpClient.openServer(HttpClient.java:409) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:332) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at org.kohsuke.github.Requester.parse(Requester.java:383) at org.kohsuke.github.Requester._to(Requester.java:185) at org.kohsuke.github.Requester.to(Requester.java:160) at org.kohsuke.github.GitHub.getRateLimit(GitHub.java:247) at org.jenkinsci.plugins.ghprb.GhprbRepository.initGhRepository(GhprbRepository.java:50) at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:72) at org.jenkinsci.plugins.ghprb.Ghprb.run(Ghprb.java:97) at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:143) at hudson.triggers.Trigger.checkTriggers(Trigger.java:266) at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701)
            Hide
            nnutter Nathan Nutter added a comment - - edited

            I'm pretty confident it is ghprb that is causing the lock up. Previously I would just restart Jenkins to fix the problem but when it happened today I wanted to find a less disruptive workaround.

            First I tried sending an interrupt to the thread (which did not work since it was blocking on I/O) using the script console:

            Thread.getAllStackTraces().keySet().each() { item ->
              if (item.getName().contains("jenkins.util.Timer [#10]")) {
                item.interrupt()
                if (item.isInterrupted()) {
                  println "Interrupted thread: " + item.getId() + " '" + item.getName() + "'";
                }
              }
            }
            println "Done."
            

            Then I tried tcpkill which did not work. Then I connected to the Jenkins process with GDB and closed the relevant sockets.

            # see file descriptors 531, 537, and 548 below
            $ lsof -p 28099 | grep github | grep TCP
            java    28099 apipe-tester  531u  IPv6          601603888      0t0       TCP vm86.gsc.wustl.edu:37938->github.com:https (ESTABLISHED)
            java    28099 apipe-tester  537u  IPv6          601586303      0t0       TCP vm86.gsc.wustl.edu:37894->github.com:https (ESTABLISHED)
            java    28099 apipe-tester  548u  IPv6          601613629      0t0       TCP vm86.gsc.wustl.edu:58180->api.github.com:https (ESTABLISHED)
            
            $ gdb -p 28099
            ...
            (gdb) call close(531)
            [New Thread 0x7f1af2293700 (LWP 22332)]
            $1 = 0
            (gdb) call close(537)                                                                                                                                                             
            $2 = 0
            (gdb) call close(548)                                                                                                                                                             
            $3 = 0
            (gdb) call close(547)                                                                                                                                                             
            [New Thread 0x7f1af4bc5700 (LWP 22460)]
            $4 = 0
            (gdb) quit
            A debugging session is active.
            
                    Inferior 1 [process 28099] will be detached.
            
            Quit anyway? (y or n) y
            Detaching from program: /usr/lib/jvm/java-6-openjdk/jre/bin/java, process 28099
            

            After which the crons started running again!

            EDIT: FD 547 was a retry after closing the other sockets. I probably didn't need to kill it.

            Show
            nnutter Nathan Nutter added a comment - - edited I'm pretty confident it is ghprb that is causing the lock up. Previously I would just restart Jenkins to fix the problem but when it happened today I wanted to find a less disruptive workaround. First I tried sending an interrupt to the thread (which did not work since it was blocking on I/O) using the script console: Thread.getAllStackTraces().keySet().each() { item -> if (item.getName().contains("jenkins.util.Timer [#10]")) { item.interrupt() if (item.isInterrupted()) { println "Interrupted thread: " + item.getId() + " '" + item.getName() + "'"; } } } println "Done." Then I tried tcpkill which did not work. Then I connected to the Jenkins process with GDB and closed the relevant sockets. # see file descriptors 531, 537, and 548 below $ lsof -p 28099 | grep github | grep TCP java 28099 apipe-tester 531u IPv6 601603888 0t0 TCP vm86.gsc.wustl.edu:37938->github.com:https (ESTABLISHED) java 28099 apipe-tester 537u IPv6 601586303 0t0 TCP vm86.gsc.wustl.edu:37894->github.com:https (ESTABLISHED) java 28099 apipe-tester 548u IPv6 601613629 0t0 TCP vm86.gsc.wustl.edu:58180->api.github.com:https (ESTABLISHED) $ gdb -p 28099 ... (gdb) call close(531) [New Thread 0x7f1af2293700 (LWP 22332)] $1 = 0 (gdb) call close(537) $2 = 0 (gdb) call close(548) $3 = 0 (gdb) call close(547) [New Thread 0x7f1af4bc5700 (LWP 22460)] $4 = 0 (gdb) quit A debugging session is active. Inferior 1 [process 28099] will be detached. Quit anyway? (y or n) y Detaching from program: /usr/lib/jvm/java-6-openjdk/jre/bin/java, process 28099 After which the crons started running again! EDIT: FD 547 was a retry after closing the other sockets. I probably didn't need to kill it.
            Hide
            pajasoft Pavel Janoušek added a comment - - edited

            Based on my investigation, ghprb uses a default behavior of GitHubBuilder which means RateLimitHandler.WAIT rate limit handler is used although plug-in code expects IOException can be thrown and seems to handle it correctly.

            I think the fix might be in GhprbGitHubAuth class where GitHubBuilder class is instanced - so the initialization here should look like:

                    GitHubBuilder builder = new GitHubBuilder()
                            .withEndpoint(serverAPIUrl)
                            .withConnector(new HttpConnectorWithJenkinsProxy())
                            .withRateLimitHandler(RateLimitHandler.FAIL);
            

            I'm not sure if there was a change in GitHub-Api in the past that changed that behavior, but it seems we have to explicitly declare RateLimitHandler.FAIL now.

            Show
            pajasoft Pavel Janoušek added a comment - - edited Based on my investigation, ghprb uses a default behavior of GitHubBuilder which means RateLimitHandler.WAIT rate limit handler is used although plug-in code expects IOException can be thrown and seems to handle it correctly. I think the fix might be in GhprbGitHubAuth class where GitHubBuilder class is instanced - so the initialization here should look like: GitHubBuilder builder = new GitHubBuilder() .withEndpoint(serverAPIUrl) .withConnector( new HttpConnectorWithJenkinsProxy()) .withRateLimitHandler(RateLimitHandler.FAIL); I'm not sure if there was a change in GitHub-Api in the past that changed that behavior, but it seems we have to explicitly declare RateLimitHandler.FAIL now.
            Hide
            pajasoft Pavel Janoušek added a comment -

            PR to fix this issue created and sent.

            Show
            pajasoft Pavel Janoušek added a comment - PR to fix this issue created and sent.

              People

              Assignee:
              pajasoft Pavel Janoušek
              Reporter:
              recampbell Ryan Campbell
              Votes:
              4 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: