Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48725

Jenkins startup workflow can deadlock if exceptions occur during extensions lookup

      Underlying issue

      Reactor run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115) only catches TunnelException.

      If another exception occurs in the task run method,

      • done is set to true
      • fatal is not set
      • but pending variable is not decremented

      This makes Reactor execute method wait for ever (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266) as pending will never reach 0.


      How can task.run throw something else than a TunnelException?

      task run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207) catches exceptions from the runTask method and wraps them in a TunnelException.

      But listener methods are not protected by try/catch blocks (except onTaskCompleted for technical reasons)

      So if an exception occurs in any of the onTaskStarted or onTaskFailed, it won't be wrapped in a TunnelException, making the problem arise.

      For milestones, another type of run method is used (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176), calling the listener's onAttained method, unprotected by try/catch.

       Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?


      Where does a listener throw an exception?

      InitReactorRunner provides a listener implementation calling InitReactorRunner's onInitMilestoneAttained method, when onAttained is called on a milestone of type InitMilestone.

      https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83

      onInitMilestoneAttained implementation in Jenkins class (https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092) can throw if ExtensionList.lookup(ExtensionFinder.class).getComponents(); throws.

      Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason

       Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63, and they might provide implementations which throw exception.


      How to correct?

      I would have proposed a fix, but I'm not sure of the direction to take:

      • enhance the Reactor component
        • by protecting listeners calls ?
        • by adding a way for listener methods to mark the task as fatal for milestones?
      • work around the limitations on listeners error-handling in Jenkins core code 
        • by try catching, logging, and swallowing the exception (not sure of the consequences) ?
        • or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
      • just fix all plugins ?
      • some other way?

          [JENKINS-48725] Jenkins startup workflow can deadlock if exceptions occur during extensions lookup

          Arnaud TAMAILLON created issue -
          Arnaud TAMAILLON made changes -
          Description Original: *Underlying issue*

          _Reactor_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115]) only catches _TunnelException_.

          If another exception occurs in the task run method,
           * _done_ is set to true
           * _fatal_ is not set
           * but _pending_ variable +is not decremented+

          This makes _Reactor_ _execute_ method wait for ever ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266)] as +_pending_ will never reach 0+.
          ----
          *How can task.run throw something else than a TunnelException?*

          _task_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207)] catches exceptions from the _runTask_ method and wraps them in a _TunnelException_.

          But _listener_ methods are not protected by try/catch blocks (except _onTaskCompleted_ for technical reasons)

          So if an exception occurs in any of the _onTaskStarted_ or _onTaskFailed_, it won't be wrapped in a _TunnelException_, making the problem arise.

          For milestones, another type of _run_ method is used ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176)], calling the listener's _onAttained_ method, unprotected by try/catch.

          (?) Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?
          ----
          *Where does a listener throw an exception?*

          _InitReactorRunner_ provides a listener implementation calling _InitReactorRunner_'s _onInitMilestoneAttained_ method, when _onAttained_ is called on a milestone of type _InitMilestone_.

          [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83]

          _onInitMilestoneAttained_ implementation in Jenkins class ([https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092)] can throw if _ExtensionList.lookup(ExtensionFinder.class).getComponents();_ throws.

          Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason
           * example of this use case can be found here: [https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83|https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83)], which throws a AssertionException when plugin is null in the static field initialization.
           * Symptom (in unit tests) can be seen there: [https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fclaim-plugin/detail/PR-39/1/] => InjectedTests never run. Note that this was not the case with Jenkins 2.7.3.

          (?) Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63,] and they might be unprotected also.
          ----
          *How to correct?*

          I would have proposed a fix, but I'm not sure of the direction to take:
           * enhance the Reactor component
           ** by protecting listeners calls ?
           ** by adding a way for listener methods to mark the task as fatal for milestones?
           * work around the limitations on listeners error-handling in Jenkins core code 
           ** by try catching, logging, and swallowing the exception (not sure of the consequences) ?
           ** or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
           * just fix all plugins ?
           * some other way?
          New: *Underlying issue*

          _Reactor_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115]) only catches _TunnelException_.

          If another exception occurs in the task run method,
           * _done_ is set to true
           * _fatal_ is not set
           * but _pending_ variable +is not decremented+

          This makes _Reactor_ _execute_ method wait for ever ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266)] as +_pending_ will never reach 0+.
          ----
          *How can task.run throw something else than a TunnelException?*

          _task_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207)] catches exceptions from the _runTask_ method and wraps them in a _TunnelException_.

          But _listener_ methods are not protected by try/catch blocks (except _onTaskCompleted_ for technical reasons)

          So if an exception occurs in any of the _onTaskStarted_ or _onTaskFailed_, it won't be wrapped in a _TunnelException_, making the problem arise.

          For milestones, another type of _run_ method is used ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176)], calling the listener's _onAttained_ method, unprotected by try/catch.

          (?) Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?
          ----
          *Where does a listener throw an exception?*

          _InitReactorRunner_ provides a listener implementation calling _InitReactorRunner_'s _onInitMilestoneAttained_ method, when _onAttained_ is called on a milestone of type _InitMilestone_.

          [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83]

          _onInitMilestoneAttained_ implementation in Jenkins class ([https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092)] can throw if _ExtensionList.lookup(ExtensionFinder.class).getComponents();_ throws.

          Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason
           * example of this use case can be found here: [https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83|https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83)], which throws a AssertionException when plugin is null in the static field initialization.
           * Symptom (in unit tests) can be seen there: [https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fclaim-plugin/detail/PR-39/1/] => InjectedTests never run. Note that this was not the case with Jenkins 2.7.3.

          (?) Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63,] and they might provide implementations which throw exception.
          ----
          *How to correct?*

          I would have proposed a fix, but I'm not sure of the direction to take:
           * enhance the Reactor component
           ** by protecting listeners calls ?
           ** by adding a way for listener methods to mark the task as fatal for milestones?
           * work around the limitations on listeners error-handling in Jenkins core code 
           ** by try catching, logging, and swallowing the exception (not sure of the consequences) ?
           ** or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
           * just fix all plugins ?
           * some other way?

          Oleg Nenashev added a comment -

          The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners).

          I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there

          Oleg Nenashev added a comment - The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners). I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there
          Oleg Nenashev made changes -
          Issue Type Original: Improvement [ 4 ] New: Bug [ 1 ]
          Oleg Nenashev made changes -
          Labels Original: exception robustness New: exception lib-task-reactor robustness
          Arnaud TAMAILLON made changes -
          Assignee New: Arnaud TAMAILLON [ greybird ]
          Arnaud TAMAILLON made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]

          Oleg Nenashev added a comment -

          greybird have you seen it in the wild? If yes, it is definitely an LTS candidate

          Oleg Nenashev added a comment - greybird have you seen it in the wild? If yes, it is definitely an LTS candidate

          Nope, only in unit tests as of now.

          As the issue originates from plugin dependencies mismatch, this problem might not surface very often?

           

          Arnaud TAMAILLON added a comment - Nope, only in unit tests as of now. As the issue originates from plugin dependencies mismatch, this problem might not surface very often?  

          Oleg Nenashev added a comment -

          Plugin mismatch definitely happens more often than I would like

          Oleg Nenashev added a comment - Plugin mismatch definitely happens more often than I would like

            greybird Arnaud TAMAILLON
            greybird Arnaud TAMAILLON
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: