• Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • durable-task-plugin
    • None
    • RHEL 6.x
      Jenkins LTS 1.596.1
      Durable Task Plugin 1.4

      Our Jenkins instance are getting locked up every day.
      It seems like this is due to the durable task plugin.

      Usins JConsole and connecting to the running java process I find dadlocks and gets this stacktrace:

      Name: Computer.threadPoolForRemoting [#179]
      State: BLOCKED on hudson.slaves.RetentionStrategy$Demand@1455cecd owned by: jenkins.util.Timer [#1]
      Total blocked: 26  Total waited: 522
      
      Stack trace: 
      hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:212)
      hudson.slaves.RetentionStrategy$Demand.check(RetentionStrategy.java:172)
      hudson.slaves.SlaveComputer.setNode(SlaveComputer.java:661)
      hudson.model.AbstractCIBase.updateComputer(AbstractCIBase.java:120)
      hudson.model.AbstractCIBase.updateComputerList(AbstractCIBase.java:180)
         - locked java.lang.Object@68ed76f9
      jenkins.model.Jenkins.updateComputerList(Jenkins.java:1218)
      jenkins.model.Jenkins.setNodes(Jenkins.java:1714)
      jenkins.model.Jenkins.removeNode(Jenkins.java:1709)
         - locked hudson.model.Hudson@794217b7
      hudson.slaves.AbstractCloudSlave.terminate(AbstractCloudSlave.java:65)
      org.jenkinsci.plugins.durabletask.executors.OnceRetentionStrategy$1.run(OnceRetentionStrategy.java:125)
         - locked hudson.model.Queue@5a25192e
      jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
      java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
      java.util.concurrent.FutureTask.run(FutureTask.java:166)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:722)
      

      We are running the Jenkins LTS version 1.596.1 and Durable Task Plugin 1.4.
      We also had this problem with Durable Task plugin 1.3.

      Running Durable Task plugin 1.2 on Jenkins LTS 1.580.3 seemd to work OK.

          [JENKINS-27476] Plugin casue deadlock on Jenkins LTS 1.596.1

          We still have deadlocks even though we downgraded the plugin. Seemingly not so often, but...
          The workaround you suggested seems a little "hairy" for me

          I guess I need to wait for the fix of the https://github.com/jenkinsci/jenkins/pull/1596 to finish.

          Per Arnold Blaasmo added a comment - We still have deadlocks even though we downgraded the plugin. Seemingly not so often, but... The workaround you suggested seems a little "hairy" for me I guess I need to wait for the fix of the https://github.com/jenkinsci/jenkins/pull/1596 to finish.

          Basically this is what you want the retention work that you are running to look like:

          public class SynchronizedComputerRetentionWork extends ComputerRetentionWork {
          
              @Override
              protected void doRun() {
                  Queue.withLock(new Runnable() {
                      @Override
                      public void run() {
                          synchronized (Jenkins.getInstance()) {
                              SynchronizedComputerRetentionWork.super.doRun();
                          }
                      }
                  });
              }
          
          }
          

          and then you do something like

          Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class);
          Jenkins.getInstance().getExtensionList(PeriodicWork.class).add(new SynchronizedComputerRetentionWork());
          

          Now all the above is more Java than Groovy, so would need translating into Groovy. Then you just put it in your init.groovy and you are fine.

          As all this does is delegate back to the base method it would be "safe" if you forgot to remove it after upgrading to something with PR#1596 merged as the thread would simply get the two locks twice and they are re-entrant locks

          Stephen Connolly added a comment - Basically this is what you want the retention work that you are running to look like: public class SynchronizedComputerRetentionWork extends ComputerRetentionWork { @Override protected void doRun() { Queue.withLock( new Runnable () { @Override public void run() { synchronized (Jenkins.getInstance()) { SynchronizedComputerRetentionWork. super .doRun(); } } }); } } and then you do something like Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class); Jenkins.getInstance().getExtensionList(PeriodicWork.class).add( new SynchronizedComputerRetentionWork()); Now all the above is more Java than Groovy, so would need translating into Groovy. Then you just put it in your init.groovy and you are fine. As all this does is delegate back to the base method it would be "safe" if you forgot to remove it after upgrading to something with PR#1596 merged as the thread would simply get the two locks twice and they are re-entrant locks

          @stephenconnolly, thanks. I will try to see if I can do this.
          I will report back about the result

          Per Arnold Blaasmo added a comment - @stephenconnolly, thanks. I will try to see if I can do this. I will report back about the result

          Code changed in jenkins
          User: Stephen Connolly
          Path:
          core/src/main/java/hudson/Functions.java
          core/src/main/java/hudson/model/AbstractCIBase.java
          core/src/main/java/hudson/model/Computer.java
          core/src/main/java/hudson/model/Executor.java
          core/src/main/java/hudson/model/Hudson.java
          core/src/main/java/hudson/model/Node.java
          core/src/main/java/hudson/model/Queue.java
          core/src/main/java/hudson/model/ResourceController.java
          core/src/main/java/hudson/slaves/AbstractCloudSlave.java
          core/src/main/java/hudson/slaves/ComputerRetentionWork.java
          core/src/main/java/hudson/slaves/NodeProvisioner.java
          core/src/main/java/hudson/slaves/RetentionStrategy.java
          core/src/main/java/hudson/slaves/SlaveComputer.java
          core/src/main/java/jenkins/model/Jenkins.java
          core/src/main/java/jenkins/model/Nodes.java
          core/src/main/java/jenkins/util/AtmostOneTaskExecutor.java
          core/src/main/resources/hudson/model/Messages.properties
          core/src/main/resources/lib/hudson/executors.jelly
          core/src/main/resources/lib/layout/layout.jelly
          http://jenkins-ci.org/commit/jenkins/92147c3597308bc05e6448ccc41409fcc7c05fd7
          Log:
          [FIXED JENKINS-27565] Refactor the Queue and Nodes to use a consistent locking strategy

          The test system I set up to verify resolution of customer(s)' issues driving this change, required
          additional changes in order to fully resolve the issues at hand. As a result I am bundling these
          changes:

          • Moves nodes to being store in separate config files outside of the main config file (improves performance) [FIXED JENKINS-27562]
          • Makes the Jenkins is loading screen not block on the extensions loading lock [FIXED JENKINS-27563]
          • Removes race condition rendering the list of executors [FIXED JENKINS-27564] [FIXED JENKINS-15355]
          • Tidy up the locks that were causing deadlocks with the once retention strategy in durable tasks [FIXED JENKINS-27476]
          • Remove any requirement from Jenkins Core to lock on the Queue when rendering the Jenkins UI [FIXED-JENKINS-27566]

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: core/src/main/java/hudson/Functions.java core/src/main/java/hudson/model/AbstractCIBase.java core/src/main/java/hudson/model/Computer.java core/src/main/java/hudson/model/Executor.java core/src/main/java/hudson/model/Hudson.java core/src/main/java/hudson/model/Node.java core/src/main/java/hudson/model/Queue.java core/src/main/java/hudson/model/ResourceController.java core/src/main/java/hudson/slaves/AbstractCloudSlave.java core/src/main/java/hudson/slaves/ComputerRetentionWork.java core/src/main/java/hudson/slaves/NodeProvisioner.java core/src/main/java/hudson/slaves/RetentionStrategy.java core/src/main/java/hudson/slaves/SlaveComputer.java core/src/main/java/jenkins/model/Jenkins.java core/src/main/java/jenkins/model/Nodes.java core/src/main/java/jenkins/util/AtmostOneTaskExecutor.java core/src/main/resources/hudson/model/Messages.properties core/src/main/resources/lib/hudson/executors.jelly core/src/main/resources/lib/layout/layout.jelly http://jenkins-ci.org/commit/jenkins/92147c3597308bc05e6448ccc41409fcc7c05fd7 Log: [FIXED JENKINS-27565] Refactor the Queue and Nodes to use a consistent locking strategy The test system I set up to verify resolution of customer(s)' issues driving this change, required additional changes in order to fully resolve the issues at hand. As a result I am bundling these changes: Moves nodes to being store in separate config files outside of the main config file (improves performance) [FIXED JENKINS-27562] Makes the Jenkins is loading screen not block on the extensions loading lock [FIXED JENKINS-27563] Removes race condition rendering the list of executors [FIXED JENKINS-27564] [FIXED JENKINS-15355] Tidy up the locks that were causing deadlocks with the once retention strategy in durable tasks [FIXED JENKINS-27476] Remove any requirement from Jenkins Core to lock on the Queue when rendering the Jenkins UI [FIXED-JENKINS-27566]

          As promised I would report back of the result of using the workaround.

          If made this code in the 'init.groovy' file based on the tips in this issue:

          import jenkins.model.Jenkins
          import java.util.logging.LogManager
          import hudson.model.PeriodicWork
          import hudson.slaves.ComputerRetentionWork
          
          def logger = LogManager.getLogManager().getLogger("")
          
          /* JENKINS_HOME environment variable is not reliable */
          def jenkinsHome = Jenkins.instance.getRootDir().absolutePath
          logger.info("RUNNING init.groovy from ${jenkinsHome}")
          
          logger.info("--> workaround for deadlock in durable task plugin")
          
          public class SynchronizedComputerRetentionWork extends ComputerRetentionWork {
          
              @Override
              protected void doRun() {
                  Queue.withLock(new Runnable() {
                      @Override
                      public void run() {
                          synchronized (Jenkins.getInstance()) {
                              SynchronizedComputerRetentionWork.super.doRun();
                          }
                      }
                  });
              }
          
          }
          
          Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class);
          Jenkins.getInstance().getExtensionList(PeriodicWork.class).add(new SynchronizedComputerRetentionWork());
          
          

          And the result is that I have not had any deadlocks the last 24 hours

          I also see that the JENKINS-27565 is fixed, so I will await a new LTS version with that bugfix included.

          Thank you for your help Stephen!

          Per Arnold Blaasmo added a comment - As promised I would report back of the result of using the workaround. If made this code in the 'init.groovy' file based on the tips in this issue: import jenkins.model.Jenkins import java.util.logging.LogManager import hudson.model.PeriodicWork import hudson.slaves.ComputerRetentionWork def logger = LogManager.getLogManager().getLogger("") /* JENKINS_HOME environment variable is not reliable */ def jenkinsHome = Jenkins.instance.getRootDir().absolutePath logger.info( "RUNNING init.groovy from ${jenkinsHome}" ) logger.info( "--> workaround for deadlock in durable task plugin" ) public class SynchronizedComputerRetentionWork extends ComputerRetentionWork { @Override protected void doRun() { Queue.withLock( new Runnable () { @Override public void run() { synchronized (Jenkins.getInstance()) { SynchronizedComputerRetentionWork. super .doRun(); } } }); } } Jenkins.getInstance().getExtensionList(PeriodicWork.class).remove(ComputerRetentionWork.class); Jenkins.getInstance().getExtensionList(PeriodicWork.class).add( new SynchronizedComputerRetentionWork()); And the result is that I have not had any deadlocks the last 24 hours I also see that the JENKINS-27565 is fixed, so I will await a new LTS version with that bugfix included. Thank you for your help Stephen!

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Stephen Connolly Path: changelog.html http://jenkins-ci.org/commit/jenkins/46dc6850edb1d7ef52592794b15e69db7dfbed1a Log: Noting merges JENKINS-15355 JENKINS-21618 JENKINS-22941 JENKINS-25938 JENKINS-26391 JENKINS-26900 JENKINS-27476 JENKINS-27563 JENKINS-27564 JENKINS-27565 JENKINS-27566 Fixing link text for JENKINS-6167

          Jesse Glick added a comment -

          This is filed in a plugin and so by definition cannot be lts-candidate. stephenconnolly what is its status?

          Jesse Glick added a comment - This is filed in a plugin and so by definition cannot be lts-candidate . stephenconnolly what is its status?

          Daniel Beck added a comment -

          pablaasmo Does this issue still occur in Jenkins 1.607 or higher, or can it be considered resolved?

          Daniel Beck added a comment - pablaasmo Does this issue still occur in Jenkins 1.607 or higher, or can it be considered resolved?

          Jesse Glick added a comment -

          Closing as covered by the core fix unless I hear information to the contrary.

          Jesse Glick added a comment - Closing as covered by the core fix unless I hear information to the contrary.

          I have not seen this issue again. So I think it is ok to close

          Per Arnold Blaasmo added a comment - I have not seen this issue again. So I think it is ok to close

            stephenconnolly Stephen Connolly
            pablaasmo Per Arnold Blaasmo
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: