Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48725

Jenkins startup workflow can deadlock if exceptions occur during extensions lookup

      Underlying issue

      Reactor run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115) only catches TunnelException.

      If another exception occurs in the task run method,

      • done is set to true
      • fatal is not set
      • but pending variable is not decremented

      This makes Reactor execute method wait for ever (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266) as pending will never reach 0.


      How can task.run throw something else than a TunnelException?

      task run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207) catches exceptions from the runTask method and wraps them in a TunnelException.

      But listener methods are not protected by try/catch blocks (except onTaskCompleted for technical reasons)

      So if an exception occurs in any of the onTaskStarted or onTaskFailed, it won't be wrapped in a TunnelException, making the problem arise.

      For milestones, another type of run method is used (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176), calling the listener's onAttained method, unprotected by try/catch.

       Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?


      Where does a listener throw an exception?

      InitReactorRunner provides a listener implementation calling InitReactorRunner's onInitMilestoneAttained method, when onAttained is called on a milestone of type InitMilestone.

      https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83

      onInitMilestoneAttained implementation in Jenkins class (https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092) can throw if ExtensionList.lookup(ExtensionFinder.class).getComponents(); throws.

      Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason

       Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63, and they might provide implementations which throw exception.


      How to correct?

      I would have proposed a fix, but I'm not sure of the direction to take:

      • enhance the Reactor component
        • by protecting listeners calls ?
        • by adding a way for listener methods to mark the task as fatal for milestones?
      • work around the limitations on listeners error-handling in Jenkins core code 
        • by try catching, logging, and swallowing the exception (not sure of the consequences) ?
        • or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
      • just fix all plugins ?
      • some other way?

          [JENKINS-48725] Jenkins startup workflow can deadlock if exceptions occur during extensions lookup

          Oleg Nenashev added a comment -

          The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners).

          I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there

          Oleg Nenashev added a comment - The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners). I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there

          Oleg Nenashev added a comment -

          greybird have you seen it in the wild? If yes, it is definitely an LTS candidate

          Oleg Nenashev added a comment - greybird have you seen it in the wild? If yes, it is definitely an LTS candidate

          Nope, only in unit tests as of now.

          As the issue originates from plugin dependencies mismatch, this problem might not surface very often?

           

          Arnaud TAMAILLON added a comment - Nope, only in unit tests as of now. As the issue originates from plugin dependencies mismatch, this problem might not surface very often?  

          Oleg Nenashev added a comment -

          Plugin mismatch definitely happens more often than I would like

          Oleg Nenashev added a comment - Plugin mismatch definitely happens more often than I would like

          Code changed in jenkins
          User: Arnaud
          Path:
          src/main/java/org/jvnet/hudson/reactor/Reactor.java
          src/test/java/org/jvnet/hudson/reactor/SessionTest.java
          http://jenkins-ci.org/commit/lib-task-reactor/fe40d4f703e0d487549ee722f49603fa9de1e593
          Log:
          JENKINS-48725 fix listener exceptions not being handled properly

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Arnaud Path: src/main/java/org/jvnet/hudson/reactor/Reactor.java src/test/java/org/jvnet/hudson/reactor/SessionTest.java http://jenkins-ci.org/commit/lib-task-reactor/fe40d4f703e0d487549ee722f49603fa9de1e593 Log: JENKINS-48725 fix listener exceptions not being handled properly

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          src/main/java/org/jvnet/hudson/reactor/Reactor.java
          src/main/java/org/jvnet/hudson/reactor/ReactorListener.java
          src/test/java/org/jvnet/hudson/reactor/SessionTest.java
          http://jenkins-ci.org/commit/lib-task-reactor/32147b876e5f92551e32429a4a93445c6889cc90
          Log:
          Merge pull request #3 from Greybird/JENKINS-48725

          JENKINS-48725 fix listener exceptions not being handled properly

          Compare: https://github.com/jenkinsci/lib-task-reactor/compare/8d7e39fe811a...32147b876e5f

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml src/main/java/org/jvnet/hudson/reactor/Reactor.java src/main/java/org/jvnet/hudson/reactor/ReactorListener.java src/test/java/org/jvnet/hudson/reactor/SessionTest.java http://jenkins-ci.org/commit/lib-task-reactor/32147b876e5f92551e32429a4a93445c6889cc90 Log: Merge pull request #3 from Greybird/ JENKINS-48725 JENKINS-48725 fix listener exceptions not being handled properly Compare: https://github.com/jenkinsci/lib-task-reactor/compare/8d7e39fe811a...32147b876e5f

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/720db118a808c04f2ae04dc36582ce7dd5031b9b
          Log:
          JENKINS-48725 - Update Lib Task Reactor to 1.5

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/720db118a808c04f2ae04dc36582ce7dd5031b9b Log: JENKINS-48725 - Update Lib Task Reactor to 1.5

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/e86e6f9e428787de244e45ce1d7581498880ad56
          Log:
          Merge pull request #3270 from oleg-nenashev/task-reactor/1.5

          JENKINS-48725 - Update Lib Task Reactor to 1.5

          Compare: https://github.com/jenkinsci/jenkins/compare/1a7881ce60cb...e86e6f9e4287

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/e86e6f9e428787de244e45ce1d7581498880ad56 Log: Merge pull request #3270 from oleg-nenashev/task-reactor/1.5 JENKINS-48725 - Update Lib Task Reactor to 1.5 Compare: https://github.com/jenkinsci/jenkins/compare/1a7881ce60cb...e86e6f9e4287

          Oleg Nenashev added a comment -

          It has been released in Jenkins 2.105

          Oleg Nenashev added a comment - It has been released in Jenkins 2.105

            greybird Arnaud TAMAILLON
            greybird Arnaud TAMAILLON
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: