Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28175

config change deadlock Jenkins when pircx.shutdown() is invoked

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • ircbot-plugin
    • None
    • jenkins 1.596.2
      ircbot-plugin 2.26

      We have recently upgraded the IRC plugin from 2.25 to 2.26. On configuration change, the ircbot plugin invokes PircBotX.shutdown(). For some reason it never finish and the conf change is stalled.

      A side effect is that jobs sending notifications ends up being blocked waiting for an instance of the irc connection provider. The only fix is to restart Jenkins entirely.

      Our bug has a few more explanations https://phabricator.wikimedia.org/T96183 and a full thread dump attached https://phabricator.wikimedia.org/P584

      Here are the blocked threads:

      Two jobs are blocked:

      "Executor #2 for integration-slave-trusty-1016 : executing browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce #234" prio=5 BLOCKED
      "Executor #1 for integration-slave-trusty-1012 : executing browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce #494" prio=5 BLOCKED
      
      	hudson.plugins.ircbot.v2.IRCConnectionProvider.getInstance(IRCConnectionProvider.java:14)
      	hudson.plugins.ircbot.IrcPublisher.getIMConnection(IrcPublisher.java:102)
      	hudson.plugins.im.IMPublisher.sendNotification(IMPublisher.java:374)
      	hudson.plugins.im.IMPublisher.notifyChatsOnBuildEnd(IMPublisher.java:585)
      	hudson.plugins.im.IMPublisher.notifyOnBuildEnd(IMPublisher.java:304)
      	hudson.plugins.im.IMPublisher.perform(IMPublisher.java:291)
              ...
      

      A configuration submit change is blocked as well:

      "Handling POST /ci/configSubmit from X.X.X.X : RequestHandlerThread[#1683]" daemon prio=5 WAITING
      	sun.misc.Unsafe.park(Native Method)
      	java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
      	java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
      	org.pircbotx.Channel.getMode(Channel.java:127)
      	org.pircbotx.Channel.getModeArgument(Channel.java:182)
      	org.pircbotx.Channel.getChannelKey(Channel.java:239)
      	org.pircbotx.PircBotX.shutdown(PircBotX.java:2872)
      	hudson.plugins.ircbot.v2.IRCConnection.close(IRCConnection.java:102)
      	hudson.plugins.im.IMConnectionProvider.releaseConnection(IMConnectionProvider.java:92)
      	hudson.plugins.ircbot.v2.IRCConnectionProvider.setDesc(IRCConnectionProvider.java:19)
      	hudson.plugins.ircbot.IrcPublisher$DescriptorImpl.configure(IrcPublisher.java:336)
      	jenkins.model.Jenkins.configureDescriptor(Jenkins.java:2915)
      	jenkins.model.Jenkins.doConfigSubmit(Jenkins.java:2878)
              ...
      

      Some other related threads:

      "JenkinsIsBusyListener-thread" daemon prio=5 BLOCKED
       hudson.plugins.im.IMConnectionProvider.currentConnection(IMConnectionProvider.java:83)
       hudson.plugins.im.JenkinsIsBusyListener.setStatus(JenkinsIsBusyListener.java:118)
       hudson.plugins.im.JenkinsIsBusyListener.updateIMStatus(JenkinsIsBusyListener.java:109)
       hudson.plugins.im.JenkinsIsBusyListener.access$000(JenkinsIsBusyListener.java:20)
       hudson.plugins.im.JenkinsIsBusyListener$3.run(JenkinsIsBusyListener.java:98)
       java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       java.util.concurrent.FutureTask.run(FutureTask.java:262)
       java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
       java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
      
      "IM-Reconnector-Thread" daemon prio=5 BLOCKED
       hudson.plugins.im.IMConnectionProvider$ConnectorRunnable.run(IMConnectionProvider.java:175)
       java.lang.Thread.run(Thread.java:745)
      

      Looking at the git changelog I noticed https://github.com/jenkinsci/ircbot-plugin/commit/98b0105a743d062abf957c285cdded06fedd3fa3 which changes in IRCConnection.close:

      - this.pircConnection.disconnect();
      + this.pircConnection.shutdown(true);
      

      Which was done to fix a leak (JENKINS-25349).

          [JENKINS-28175] config change deadlock Jenkins when pircx.shutdown() is invoked

          Leon Blakey added a comment -

          ERROR :Closing link: (webqatestbo@nat-qa.scl3.mozilla.com) [Registration timeout]

          I think that's causing the issue. Server says it doesn't have enough information to finish connecting the bot, bot doesn't dispatch ConnectEvent, irc-plugin blocks then times out after 2 minutes waiting for that ConnectEvent ( https://github.com/TheLQ/ircbot-plugin/blob/master/src/main/java/hudson/plugins/ircbot/v2/IRCConnection.java#L208 )

          I tried to reproduce this with my Jenkins server on irc.mozilla.org, even registered it with nickserv, but didn't run into any issues.

          Dave do you see any errors from the IRC server before that line? If not can you paste any IRC logs between the " -Starting Connect attempt 1/5-" line to before the "ERROR: Closing link" line you posted?

          Leon Blakey added a comment - ERROR :Closing link: (webqatestbo@nat-qa.scl3.mozilla.com) [Registration timeout] I think that's causing the issue. Server says it doesn't have enough information to finish connecting the bot, bot doesn't dispatch ConnectEvent, irc-plugin blocks then times out after 2 minutes waiting for that ConnectEvent ( https://github.com/TheLQ/ircbot-plugin/blob/master/src/main/java/hudson/plugins/ircbot/v2/IRCConnection.java#L208 ) I tried to reproduce this with my Jenkins server on irc.mozilla.org, even registered it with nickserv, but didn't run into any issues. Dave do you see any errors from the IRC server before that line? If not can you paste any IRC logs between the " - Starting Connect attempt 1/5 -" line to before the "ERROR: Closing link" line you posted?

          Dave Hunt added a comment -

          thelq I'm unable to replicate this unless some time has passed since the initial IRC connection. The first comment mentions something about a stale mode, which might be related? The log is pretty verbose as the version I have seems to echo all IRC conversation. I'm not sure if that's expected, but it's not ideal.

          Dave Hunt added a comment - thelq I'm unable to replicate this unless some time has passed since the initial IRC connection. The first comment mentions something about a stale mode, which might be related? The log is pretty verbose as the version I have seems to echo all IRC conversation. I'm not sure if that's expected, but it's not ideal.

          Dave Hunt added a comment -

          Any updates on this? We're still seeing it on our main Jenkins instance.

          Dave Hunt added a comment - Any updates on this? We're still seeing it on our main Jenkins instance.

          Leon Blakey added a comment -

          PircBotX 2.1 was released last night.

          Leon Blakey added a comment - PircBotX 2.1 was released last night.

          Dave Hunt added a comment -

          kutzi could we get a new plugin release to see if it addresses this issue?

          Dave Hunt added a comment - kutzi could we get a new plugin release to see if it addresses this issue?

          kutzi added a comment -

          Sorry, I missed the last updates here.

          thelq: since you did already the needed changes on your fork: can you update your fork to use the released version 2.1 and open a pull request?

          kutzi added a comment - Sorry, I missed the last updates here. thelq : since you did already the needed changes on your fork: can you update your fork to use the released version 2.1 and open a pull request?

          Antoine Musso added a comment -

          From the discussion on https://github.com/jenkinsci/ircbot-plugin/commit/d18cc7b617155100f8afadb73b324f378c5661da (which bumps Pircbotx to 2.0.1) the deadlock might be solved by Pircbotx 2.1.

          Would be nice to have a commit that bump the dependency.

          Antoine Musso added a comment - From the discussion on https://github.com/jenkinsci/ircbot-plugin/commit/d18cc7b617155100f8afadb73b324f378c5661da (which bumps Pircbotx to 2.0.1) the deadlock might be solved by Pircbotx 2.1. Would be nice to have a commit that bump the dependency.

          Please make sure to update to the latest version of the ircbot plugin which uses 2.1 of the pircbotx plugin. If this does not solve your issue, then please re-attach a new thread dump to diagnose the deadlock.

          Steven Christou added a comment - Please make sure to update to the latest version of the ircbot plugin which uses 2.1 of the pircbotx plugin. If this does not solve your issue, then please re-attach a new thread dump to diagnose the deadlock.

          Antoine Musso added a comment -

          I am not sure I have seen that issue again with 2.27 / pircbotx 2.0.1, so maybe that fixed it.

          Antoine Musso added a comment - I am not sure I have seen that issue again with 2.27 / pircbotx 2.0.1, so maybe that fixed it.

          Antoine Musso added a comment -

          Has not happened again since I have upgraded to 2.27 / pircbotx 2.0.1

          Antoine Musso added a comment - Has not happened again since I have upgraded to 2.27 / pircbotx 2.0.1

            kutzi kutzi
            hashar Antoine Musso
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: