• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • durable-task-plugin
    • None
    • RHEL 6.x
      Jenkins LTS 1.596.1
      Durable Task Plugin 1.4

      Our Jenkins instance are getting locked up every day.
      It seems like this is due to the durable task plugin.

      Usins JConsole and connecting to the running java process I find dadlocks and gets this stacktrace:

      Name: Computer.threadPoolForRemoting [#179]
      State: BLOCKED on hudson.slaves.RetentionStrategy$Demand@1455cecd owned by: jenkins.util.Timer [#1]
      Total blocked: 26  Total waited: 522
      
      Stack trace: 
      hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:212)
      hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:172)
      hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:661)
      hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
      hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
         - locked java.lang.Object@68ed76f9
      jenkins.model.Jenkins.updateComputerList(Jenkins.java:1218)
      jenkins.model.Jenkins.setNodes(Jenkins.java:1714)
      jenkins.model.Jenkins.removeNode(Jenkins.java:1709)
         - locked hudson.model.Hudson@794217b7
      hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:65)
      org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$1.run(OnceRetentionStrategy.java:125)
         - locked hudson.model.Queue@5a25192e
      jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      java.util.concurrent.FutureTask.run(FutureTask.java:166)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:722)
      

      We are running the Jenkins LTS version 1.596.1 and Durable Task Plugin 1.4.
      We also had this problem with Durable Task plugin 1.3.

      Running Durable Task plugin 1.2 on Jenkins LTS 1.580.3 seemd to work OK.

          [JENKINS-27476] Plugin casue deadlock on Jenkins LTS 1.596.1

          Per Arnold Blaasmo created issue -

          Jesse Glick added a comment -

          stephenconnolly these are your changes; any idea?

          Jesse Glick added a comment - stephenconnolly these are your changes; any idea?
          Jesse Glick made changes -
          Assignee Original: Jesse Glick [ jglick ] New: Stephen Connolly [ stephenconnolly ]

          Ultimately fixing this is part of https://github.com/jenkinsci/jenkins/pull/1596

          If this is a CloudBees customer we have two hotfixes that seem to work around the deadlock with the side-effect of degrading UI performance

          Stephen Connolly added a comment - Ultimately fixing this is part of https://github.com/jenkinsci/jenkins/pull/1596 If this is a CloudBees customer we have two hotfixes that seem to work around the deadlock with the side-effect of degrading UI performance

          Thanks for looking into this.
          Just for the record, my company is not currently a CloudBees customer.

          Some more info:
          To try to cope with the problem I downgraded to version 1.2 of the Durable Task Plugin.
          This seems to make things much more stable. We might still get a deadlock, but more seldom.

          Per Arnold Blaasmo added a comment - Thanks for looking into this. Just for the record, my company is not currently a CloudBees customer. Some more info: To try to cope with the problem I downgraded to version 1.2 of the Durable Task Plugin. This seems to make things much more stable. We might still get a deadlock, but more seldom.

          You are awaiting this change in Jenkins core: https://github.com/stephenc/jenkins/blob/threadsafe-node-queue/core/src/main/java/hudson/slaves/ComputerRetentionWork.java

          You can work around it with a bit of Groovy script...

          Basically you need to create a sub-class of ComputerRetentionWork where the doRun method wraps a call to it's super.doRun and then modify the extension list for PeriodicWork, removing the old ComputerRetentionWork instance and adding an instance of your sub-class

          Stephen Connolly added a comment - You are awaiting this change in Jenkins core: https://github.com/stephenc/jenkins/blob/threadsafe-node-queue/core/src/main/java/hudson/slaves/ComputerRetentionWork.java You can work around it with a bit of Groovy script... Basically you need to create a sub-class of ComputerRetentionWork where the doRun method wraps a call to it's super.doRun and then modify the extension list for PeriodicWork, removing the old ComputerRetentionWork instance and adding an instance of your sub-class

          For cloudbees customers, the hotfix you want is hotfix-zd-23541

          Stephen Connolly added a comment - For cloudbees customers, the hotfix you want is hotfix-zd-23541
          Per Arnold Blaasmo made changes -
          Link New: This issue depends on JENKINS-27565 [ JENKINS-27565 ]

          We still have deadlocks even though we downgraded the plugin. Seemingly not so often, but...
          The workaround you suggested seems a little "hairy" for me

          I guess I need to wait for the fix of the https://github.com/jenkinsci/jenkins/pull/1596 to finish.

          Per Arnold Blaasmo added a comment - We still have deadlocks even though we downgraded the plugin. Seemingly not so often, but... The workaround you suggested seems a little "hairy" for me I guess I need to wait for the fix of the https://github.com/jenkinsci/jenkins/pull/1596 to finish.

          Basically this is what you want the retention work that you are running to look like:

          public class SynchronizedComputerRetentionWork extends ComputerRetentionWork {
          
              @Override
              protected void doRun() {
                  Queue.withLock(new Runnable() {
                      @Override
                      public void run() {
                          synchronized (Jenkins.getInstance()) {
                              SynchronizedComputerRetentionWork.super.doRun();
                          }
                      }
                  });
              }
          
          }
          

          and then you do something like

          Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class);
          Jenkins.getInstance().getExtensionList(PeriodicWork.class).add(new SynchronizedComputerRetentionWork());
          

          Now all the above is more Java than Groovy, so would need translating into Groovy. Then you just put it in your init.groovy and you are fine.

          As all this does is delegate back to the base method it would be "safe" if you forgot to remove it after upgrading to something with PR#1596 merged as the thread would simply get the two locks twice and they are re-entrant locks

          Stephen Connolly added a comment - Basically this is what you want the retention work that you are running to look like: public class SynchronizedComputerRetentionWork extends ComputerRetentionWork { @Override protected void doRun() { Queue.withLock( new Runnable () { @Override public void run() { synchronized (Jenkins.getInstance()) { SynchronizedComputerRetentionWork. super .doRun(); } } }); } } and then you do something like Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class); Jenkins.getInstance().getExtensionList(PeriodicWork.class).add( new SynchronizedComputerRetentionWork()); Now all the above is more Java than Groovy, so would need translating into Groovy. Then you just put it in your init.groovy and you are fine. As all this does is delegate back to the base method it would be "safe" if you forgot to remove it after upgrading to something with PR#1596 merged as the thread would simply get the two locks twice and they are re-entrant locks

            stephenconnolly Stephen Connolly
            pablaasmo Per Arnold Blaasmo
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: