Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-28175

config change deadlock Jenkins when pircx.shutdown() is invoked

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Critical Critical
    • ircbot-plugin
    • None
    • jenkins 1.596.2
      ircbot-plugin 2.26

      We have recently upgraded the IRC plugin from 2.25 to 2.26. On configuration change, the ircbot plugin invokes PircBotX.shutdown(). For some reason it never finish and the conf change is stalled.

      A side effect is that jobs sending notifications ends up being blocked waiting for an instance of the irc connection provider. The only fix is to restart Jenkins entirely.

      Our bug has a few more explanations https://phabricator.wikimedia.org/T96183 and a full thread dump attached https://phabricator.wikimedia.org/P584

      Here are the blocked threads:

      Two jobs are blocked:

      "Executor #2 for integration-slave-trusty-1016 : executing browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-10-sauce #234" prio=5 BLOCKED
      "Executor #1 for integration-slave-trusty-1012 : executing browsertests-PdfHandler-test2.wikipedia.org-linux-firefox-sauce #494" prio=5 BLOCKED
      
      	hudson.plugins.ircbot.v2.IRCConnectionProvider.getInstance(IRCConnectionProvider.java:14)
      	hudson.plugins.ircbot.IrcPublisher.getIMConnection(IrcPublisher.java:102)
      	hudson.plugins.im.IMPublisher.sendNotification(IMPublisher.java:374)
      	hudson.plugins.im.IMPublisher.notifyChatsOnBuildEnd(IMPublisher.java:585)
      	hudson.plugins.im.IMPublisher.notifyOnBuildEnd(IMPublisher.java:304)
      	hudson.plugins.im.IMPublisher.perform(IMPublisher.java:291)
              ...
      

      A configuration submit change is blocked as well:

      "Handling POST /ci/configSubmit from X.X.X.X : RequestHandlerThread[#1683]" daemon prio=5 WAITING
      	sun.misc.Unsafe.park(Native Method)
      	java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
      	java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
      	java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
      	org.pircbotx.Channel.getMode(Channel.java:127)
      	org.pircbotx.Channel.getModeArgument(Channel.java:182)
      	org.pircbotx.Channel.getChannelKey(Channel.java:239)
      	org.pircbotx.PircBotX.shutdown(PircBotX.java:2872)
      	hudson.plugins.ircbot.v2.IRCConnection.close(IRCConnection.java:102)
      	hudson.plugins.im.IMConnectionProvider.releaseConnection(IMConnectionProvider.java:92)
      	hudson.plugins.ircbot.v2.IRCConnectionProvider.setDesc(IRCConnectionProvider.java:19)
      	hudson.plugins.ircbot.IrcPublisher$DescriptorImpl.configure(IrcPublisher.java:336)
      	jenkins.model.Jenkins.configureDescriptor(Jenkins.java:2915)
      	jenkins.model.Jenkins.doConfigSubmit(Jenkins.java:2878)
              ...
      

      Some other related threads:

      "JenkinsIsBusyListener-thread" daemon prio=5 BLOCKED
       hudson.plugins.im.IMConnectionProvider.currentConnection(IMConnectionProvider.java:83)
       hudson.plugins.im.JenkinsIsBusyListener.setStatus(JenkinsIsBusyListener.java:118)
       hudson.plugins.im.JenkinsIsBusyListener.updateIMStatus(JenkinsIsBusyListener.java:109)
       hudson.plugins.im.JenkinsIsBusyListener.access$000(JenkinsIsBusyListener.java:20)
       hudson.plugins.im.JenkinsIsBusyListener$3.run(JenkinsIsBusyListener.java:98)
       java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
       java.util.concurrent.FutureTask.run(FutureTask.java:262)
       java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
       java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       java.lang.Thread.run(Thread.java:745)
      
      "IM-Reconnector-Thread" daemon prio=5 BLOCKED
       hudson.plugins.im.IMConnectionProvider$ConnectorRunnable.run(IMConnectionProvider.java:175)
       java.lang.Thread.run(Thread.java:745)
      

      Looking at the git changelog I noticed https://github.com/jenkinsci/ircbot-plugin/commit/98b0105a743d062abf957c285cdded06fedd3fa3 which changes in IRCConnection.close:

      - this.pircConnection.disconnect();
      + this.pircConnection.shutdown(true);
      

      Which was done to fix a leak (JENKINS-25349).

          [JENKINS-28175] config change deadlock Jenkins when pircx.shutdown() is invoked

          Antoine Musso created issue -
          kutzi made changes -
          Link New: This issue is related to JENKINS-25349 [ JENKINS-25349 ]

          kutzi added a comment -

          Quite difficult to reproduce this. Seems only to happen when the channel mode is stale while PircBot is shut down. And stale mode seem only to happen, when the server returns something for mode, what Pircbot cannot parse.

          kutzi added a comment - Quite difficult to reproduce this. Seems only to happen when the channel mode is stale while PircBot is shut down. And stale mode seem only to happen, when the server returns something for mode, what Pircbot cannot parse.

          kutzi added a comment -

          Opened an issue against PircBotX: https://code.google.com/p/pircbotx/issues/detail?id=240
          Not sure if the issue persists in PircBotX 2.x as a lot has changed there.
          Trying to update to PircBotX 2.0.1

          kutzi added a comment - Opened an issue against PircBotX: https://code.google.com/p/pircbotx/issues/detail?id=240 Not sure if the issue persists in PircBotX 2.x as a lot has changed there. Trying to update to PircBotX 2.0.1
          kutzi made changes -
          Link New: This issue is related to JENKINS-22042 [ JENKINS-22042 ]

          kutzi added a comment -

          I've create a plugin build which uses PircBotX 2.0.1 which should fix the issue. Would be great, if you could test it, before I build a new release:

          https://dl.dropboxusercontent.com/u/25863594/ircbot.hpi

          kutzi added a comment - I've create a plugin build which uses PircBotX 2.0.1 which should fix the issue. Would be great, if you could test it, before I build a new release: https://dl.dropboxusercontent.com/u/25863594/ircbot.hpi

          kutzi added a comment -

          Antoine, any chance you can try the unreleased IRC plugin?

          kutzi added a comment - Antoine, any chance you can try the unreleased IRC plugin?

          Dave Hunt added a comment -

          We are seeing this too, but I'd like to avoid trying an unreleased plugin on our production Jenkins instance. We do have a staging instance that until today did not use the IRC plugin. I have it running now and if I manage to replicate the issue I'll try the unreleased plugin to see if that resolves it.

          Dave Hunt added a comment - We are seeing this too, but I'd like to avoid trying an unreleased plugin on our production Jenkins instance. We do have a staging instance that until today did not use the IRC plugin. I have it running now and if I manage to replicate the issue I'll try the unreleased plugin to see if that resolves it.

          Dave Hunt added a comment -

          I was able to replicate this on our staging instance, so I've installed the unreleased plugin, and will see if this appears to fix the issue.

          Dave Hunt added a comment - I was able to replicate this on our staging instance, so I've installed the unreleased plugin, and will see if this appears to fix the issue.

          Antoine Musso added a comment -

          kutzi wrote:
          > Antoine, any chance you can try the unreleased IRC plugin?

          Unlikely, the issue caused much havoc on the system and I can reproduce it on my dev instance. Moreover I have no idea how to reproduce it on the prod instance :-/

          That being said, I found out that when stopping Jenkins it ends up waiting for org.pircbotx.Channel.getMode(). That might be the same root cause. You can find a full stacktrace at https://phabricator.wikimedia.org/T98976 , I can fill it as another bug if it is unrelated.

          kutzi wrote:
          > https://dl.dropboxusercontent.com/u/25863594/ircbot.hpi

          I guess I can just build it from https://github.com/jenkinsci/ircbot-plugin/commit/d18cc7b617155100f8afadb73b324f378c5661da and maybe grab as well the latest instant-messaging-plugin https://github.com/jenkinsci/ircbot-plugin/commit/a78e066fc57001168a8ffc1893a3de2c91a1d518 :-}

          Antoine Musso added a comment - kutzi wrote: > Antoine, any chance you can try the unreleased IRC plugin? Unlikely, the issue caused much havoc on the system and I can reproduce it on my dev instance. Moreover I have no idea how to reproduce it on the prod instance :-/ That being said, I found out that when stopping Jenkins it ends up waiting for org.pircbotx.Channel.getMode(). That might be the same root cause. You can find a full stacktrace at https://phabricator.wikimedia.org/T98976 , I can fill it as another bug if it is unrelated. kutzi wrote: > https://dl.dropboxusercontent.com/u/25863594/ircbot.hpi I guess I can just build it from https://github.com/jenkinsci/ircbot-plugin/commit/d18cc7b617155100f8afadb73b324f378c5661da and maybe grab as well the latest instant-messaging-plugin https://github.com/jenkinsci/ircbot-plugin/commit/a78e066fc57001168a8ffc1893a3de2c91a1d518 :-}

            kutzi kutzi
            hashar Antoine Musso
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: