• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • ghprb-plugin
    • None

      The ghprb trigger is blocking the running of other Jenkins scheduled jobs (such as the NodeProvisionerInvoker).

      https://gist.github.com/recampbell/ce903e95d846af8ce390

      It seems as though there is no timeout for these GitHub API requests, so if they are being blocked due to API quota issues, the entire Jenkins is effectively blocked.

          [JENKINS-19123] ghprb prevents Jenkins cron thread from running

          Ray Sennewald added a comment -

          I've seen the same behavior as well on our Jenkins servers running GHPRB. This only happens from time to time, and the temporary fix is to reboot Jenkins when this occurs. Would be nice if we could have a timeout for these as recampbell mentioned.

          Ray Sennewald added a comment - I've seen the same behavior as well on our Jenkins servers running GHPRB. This only happens from time to time, and the temporary fix is to reboot Jenkins when this occurs. Would be nice if we could have a timeout for these as recampbell mentioned.

          Nathan Nutter added a comment -

          I believe I'm having the same problem. I'm attaching a threadDump in case it helps. I really know nothing about Java or thread dumps but it looks like a bunch of timer threads are blocked and one timer thread it persisting indefinitely which happens to look like this problem.

          Nathan Nutter added a comment - I believe I'm having the same problem. I'm attaching a threadDump in case it helps. I really know nothing about Java or thread dumps but it looks like a bunch of timer threads are blocked and one timer thread it persisting indefinitely which happens to look like this problem.

          Nathan Nutter added a comment -

          This snippet from logs may be relevant too:

          Oct 20, 2014 10:11:22 PM org.jenkinsci.plugins.ghprb.GhprbRepository initGhRepository
          SEVERE: Error while accessing rate limit API
          java.net.ConnectException: Connection timed out
                  at java.net.PlainSocketImpl.socketConnect(Native Method)
                  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327)
                  at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193)
                  at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180)
                  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385)
                  at java.net.Socket.connect(Socket.java:546)
                  at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:590)
                  at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160)
                  at sun.net.NetworkClient.doConnect(NetworkClient.java:178)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
                  at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
                  at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
                  at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:332)
                  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
                  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
                  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
                  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
                  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
                  at org.kohsuke.github.Requester.parse(Requester.java:383)
                  at org.kohsuke.github.Requester._to(Requester.java:185)
                  at org.kohsuke.github.Requester.to(Requester.java:160)
                  at org.kohsuke.github.GitHub.getRateLimit(GitHub.java:247)
                  at org.jenkinsci.plugins.ghprb.GhprbRepository.initGhRepository(GhprbRepository.java:50)
                  at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:72)
                  at org.jenkinsci.plugins.ghprb.Ghprb.run(Ghprb.java:97)
                  at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:143)
                  at hudson.triggers.Trigger.checkTriggers(Trigger.java:266)
                  at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214)
                  at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                  at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                  at java.lang.Thread.run(Thread.java:701)
          

          Nathan Nutter added a comment - This snippet from logs may be relevant too: Oct 20, 2014 10:11:22 PM org.jenkinsci.plugins.ghprb.GhprbRepository initGhRepository SEVERE: Error while accessing rate limit API java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:327) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:193) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:180) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) at java.net.Socket.connect(Socket.java:546) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:590) at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160) at sun.net.NetworkClient.doConnect(NetworkClient.java:178) at sun.net.www.http.HttpClient.openServer(HttpClient.java:409) at sun.net.www.http.HttpClient.openServer(HttpClient.java:530) at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275) at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:332) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at org.kohsuke.github.Requester.parse(Requester.java:383) at org.kohsuke.github.Requester._to(Requester.java:185) at org.kohsuke.github.Requester.to(Requester.java:160) at org.kohsuke.github.GitHub.getRateLimit(GitHub.java:247) at org.jenkinsci.plugins.ghprb.GhprbRepository.initGhRepository(GhprbRepository.java:50) at org.jenkinsci.plugins.ghprb.GhprbRepository.check(GhprbRepository.java:72) at org.jenkinsci.plugins.ghprb.Ghprb.run(Ghprb.java:97) at org.jenkinsci.plugins.ghprb.GhprbTrigger.run(GhprbTrigger.java:143) at hudson.triggers.Trigger.checkTriggers(Trigger.java:266) at hudson.triggers.Trigger$Cron.doRun(Trigger.java:214) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:54) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701)

          Nathan Nutter added a comment - - edited

          I'm pretty confident it is ghprb that is causing the lock up. Previously I would just restart Jenkins to fix the problem but when it happened today I wanted to find a less disruptive workaround.

          First I tried sending an interrupt to the thread (which did not work since it was blocking on I/O) using the script console:

          Thread.getAllStackTraces().keySet().each() { item ->
            if (item.getName().contains("jenkins.util.Timer [#10]")) {
              item.interrupt()
              if (item.isInterrupted()) {
                println "Interrupted thread: " + item.getId() + " '" + item.getName() + "'";
              }
            }
          }
          println "Done."
          

          Then I tried tcpkill which did not work. Then I connected to the Jenkins process with GDB and closed the relevant sockets.

          # see file descriptors 531, 537, and 548 below
          $ lsof -p 28099 | grep github | grep TCP
          java    28099 apipe-tester  531u  IPv6          601603888      0t0       TCP vm86.gsc.wustl.edu:37938->github.com:https (ESTABLISHED)
          java    28099 apipe-tester  537u  IPv6          601586303      0t0       TCP vm86.gsc.wustl.edu:37894->github.com:https (ESTABLISHED)
          java    28099 apipe-tester  548u  IPv6          601613629      0t0       TCP vm86.gsc.wustl.edu:58180->api.github.com:https (ESTABLISHED)
          
          $ gdb -p 28099
          ...
          (gdb) call close(531)
          [New Thread 0x7f1af2293700 (LWP 22332)]
          $1 = 0
          (gdb) call close(537)                                                                                                                                                             
          $2 = 0
          (gdb) call close(548)                                                                                                                                                             
          $3 = 0
          (gdb) call close(547)                                                                                                                                                             
          [New Thread 0x7f1af4bc5700 (LWP 22460)]
          $4 = 0
          (gdb) quit
          A debugging session is active.
          
                  Inferior 1 [process 28099] will be detached.
          
          Quit anyway? (y or n) y
          Detaching from program: /usr/lib/jvm/java-6-openjdk/jre/bin/java, process 28099
          

          After which the crons started running again!

          EDIT: FD 547 was a retry after closing the other sockets. I probably didn't need to kill it.

          Nathan Nutter added a comment - - edited I'm pretty confident it is ghprb that is causing the lock up. Previously I would just restart Jenkins to fix the problem but when it happened today I wanted to find a less disruptive workaround. First I tried sending an interrupt to the thread (which did not work since it was blocking on I/O) using the script console: Thread.getAllStackTraces().keySet().each() { item -> if (item.getName().contains("jenkins.util.Timer [#10]")) { item.interrupt() if (item.isInterrupted()) { println "Interrupted thread: " + item.getId() + " '" + item.getName() + "'"; } } } println "Done." Then I tried tcpkill which did not work. Then I connected to the Jenkins process with GDB and closed the relevant sockets. # see file descriptors 531, 537, and 548 below $ lsof -p 28099 | grep github | grep TCP java 28099 apipe-tester 531u IPv6 601603888 0t0 TCP vm86.gsc.wustl.edu:37938->github.com:https (ESTABLISHED) java 28099 apipe-tester 537u IPv6 601586303 0t0 TCP vm86.gsc.wustl.edu:37894->github.com:https (ESTABLISHED) java 28099 apipe-tester 548u IPv6 601613629 0t0 TCP vm86.gsc.wustl.edu:58180->api.github.com:https (ESTABLISHED) $ gdb -p 28099 ... (gdb) call close(531) [New Thread 0x7f1af2293700 (LWP 22332)] $1 = 0 (gdb) call close(537) $2 = 0 (gdb) call close(548) $3 = 0 (gdb) call close(547) [New Thread 0x7f1af4bc5700 (LWP 22460)] $4 = 0 (gdb) quit A debugging session is active. Inferior 1 [process 28099] will be detached. Quit anyway? (y or n) y Detaching from program: /usr/lib/jvm/java-6-openjdk/jre/bin/java, process 28099 After which the crons started running again! EDIT: FD 547 was a retry after closing the other sockets. I probably didn't need to kill it.

          Pavel Janoušek added a comment - - edited

          Based on my investigation, ghprb uses a default behavior of GitHubBuilder which means RateLimitHandler.WAIT rate limit handler is used although plug-in code expects IOException can be thrown and seems to handle it correctly.

          I think the fix might be in GhprbGitHubAuth class where GitHubBuilder class is instanced - so the initialization here should look like:

                  GitHubBuilder builder = new GitHubBuilder()
                          .withEndpoint(serverAPIUrl)
                          .withConnector(new HttpConnectorWithJenkinsProxy())
                          .withRateLimitHandler(RateLimitHandler.FAIL);
          

          I'm not sure if there was a change in GitHub-Api in the past that changed that behavior, but it seems we have to explicitly declare RateLimitHandler.FAIL now.

          Pavel Janoušek added a comment - - edited Based on my investigation, ghprb uses a default behavior of GitHubBuilder which means RateLimitHandler.WAIT rate limit handler is used although plug-in code expects IOException can be thrown and seems to handle it correctly. I think the fix might be in GhprbGitHubAuth class where GitHubBuilder class is instanced - so the initialization here should look like: GitHubBuilder builder = new GitHubBuilder() .withEndpoint(serverAPIUrl) .withConnector( new HttpConnectorWithJenkinsProxy()) .withRateLimitHandler(RateLimitHandler.FAIL); I'm not sure if there was a change in GitHub-Api in the past that changed that behavior, but it seems we have to explicitly declare RateLimitHandler.FAIL now.

          PR to fix this issue created and sent.

          Pavel Janoušek added a comment - PR to fix this issue created and sent.

            pajasoft Pavel Janoušek
            recampbell Ryan Campbell
            Votes:
            4 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: