Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-48725

Jenkins startup workflow can deadlock if exceptions occur during extensions lookup

    XMLWordPrintable

    Details

    • Similar Issues:

      Description

      Underlying issue

      Reactor run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115) only catches TunnelException.

      If another exception occurs in the task run method,

      • done is set to true
      • fatal is not set
      • but pending variable is not decremented

      This makes Reactor execute method wait for ever (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266) as pending will never reach 0.


      How can task.run throw something else than a TunnelException?

      task run method (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207) catches exceptions from the runTask method and wraps them in a TunnelException.

      But listener methods are not protected by try/catch blocks (except onTaskCompleted for technical reasons)

      So if an exception occurs in any of the onTaskStarted or onTaskFailed, it won't be wrapped in a TunnelException, making the problem arise.

      For milestones, another type of run method is used (https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176), calling the listener's onAttained method, unprotected by try/catch.

       Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?


      Where does a listener throw an exception?

      InitReactorRunner provides a listener implementation calling InitReactorRunner's onInitMilestoneAttained method, when onAttained is called on a milestone of type InitMilestone.

      https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83

      onInitMilestoneAttained implementation in Jenkins class (https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092) can throw if ExtensionList.lookup(ExtensionFinder.class).getComponents(); throws.

      Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason

       Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63, and they might provide implementations which throw exception.


      How to correct?

      I would have proposed a fix, but I'm not sure of the direction to take:

      • enhance the Reactor component
        • by protecting listeners calls ?
        • by adding a way for listener methods to mark the task as fatal for milestones?
      • work around the limitations on listeners error-handling in Jenkins core code 
        • by try catching, logging, and swallowing the exception (not sure of the consequences) ?
        • or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
      • just fix all plugins ?
      • some other way?

        Attachments

          Activity

          greybird Arnaud TAMAILLON created issue -
          greybird Arnaud TAMAILLON made changes -
          Field Original Value New Value
          Description *Underlying issue*

          _Reactor_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115]) only catches _TunnelException_.

          If another exception occurs in the task run method,
           * _done_ is set to true
           * _fatal_ is not set
           * but _pending_ variable +is not decremented+

          This makes _Reactor_ _execute_ method wait for ever ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266)] as +_pending_ will never reach 0+.
          ----
          *How can task.run throw something else than a TunnelException?*

          _task_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207)] catches exceptions from the _runTask_ method and wraps them in a _TunnelException_.

          But _listener_ methods are not protected by try/catch blocks (except _onTaskCompleted_ for technical reasons)

          So if an exception occurs in any of the _onTaskStarted_ or _onTaskFailed_, it won't be wrapped in a _TunnelException_, making the problem arise.

          For milestones, another type of _run_ method is used ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176)], calling the listener's _onAttained_ method, unprotected by try/catch.

          (?) Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?
          ----
          *Where does a listener throw an exception?*

          _InitReactorRunner_ provides a listener implementation calling _InitReactorRunner_'s _onInitMilestoneAttained_ method, when _onAttained_ is called on a milestone of type _InitMilestone_.

          [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83]

          _onInitMilestoneAttained_ implementation in Jenkins class ([https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092)] can throw if _ExtensionList.lookup(ExtensionFinder.class).getComponents();_ throws.

          Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason
           * example of this use case can be found here: [https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83|https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83)], which throws a AssertionException when plugin is null in the static field initialization.
           * Symptom (in unit tests) can be seen there: [https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fclaim-plugin/detail/PR-39/1/] => InjectedTests never run. Note that this was not the case with Jenkins 2.7.3.

          (?) Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63,] and they might be unprotected also.
          ----
          *How to correct?*

          I would have proposed a fix, but I'm not sure of the direction to take:
           * enhance the Reactor component
           ** by protecting listeners calls ?
           ** by adding a way for listener methods to mark the task as fatal for milestones?
           * work around the limitations on listeners error-handling in Jenkins core code 
           ** by try catching, logging, and swallowing the exception (not sure of the consequences) ?
           ** or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
           * just fix all plugins ?
           * some other way?
          *Underlying issue*

          _Reactor_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L115]) only catches _TunnelException_.

          If another exception occurs in the task run method,
           * _done_ is set to true
           * _fatal_ is not set
           * but _pending_ variable +is not decremented+

          This makes _Reactor_ _execute_ method wait for ever ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L266)] as +_pending_ will never reach 0+.
          ----
          *How can task.run throw something else than a TunnelException?*

          _task_ _run_ method ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L207)] catches exceptions from the _runTask_ method and wraps them in a _TunnelException_.

          But _listener_ methods are not protected by try/catch blocks (except _onTaskCompleted_ for technical reasons)

          So if an exception occurs in any of the _onTaskStarted_ or _onTaskFailed_, it won't be wrapped in a _TunnelException_, making the problem arise.

          For milestones, another type of _run_ method is used ([https://github.com/jenkinsci/lib-task-reactor/blob/master/src/main/java/org/jvnet/hudson/reactor/Reactor.java#L176)], calling the listener's _onAttained_ method, unprotected by try/catch.

          (?) Even if it can make sense not to protect listener methods, I am wondering if this is a good idea to have different behaviors depending on the listener method as one of them is in fact protected?
          ----
          *Where does a listener throw an exception?*

          _InitReactorRunner_ provides a listener implementation calling _InitReactorRunner_'s _onInitMilestoneAttained_ method, when _onAttained_ is called on a milestone of type _InitMilestone_.

          [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L83]

          _onInitMilestoneAttained_ implementation in Jenkins class ([https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/model/Jenkins.java#L1092)] can throw if _ExtensionList.lookup(ExtensionFinder.class).getComponents();_ throws.

          Exceptions can occur through a static call to a plugin class in an exception point if this same plugin will not start for a dependency reason
           * example of this use case can be found here: [https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83|https://github.com/jenkinsci/build-failure-analyzer-plugin/pull/80/commits/343a9b21c60a411b6cc7e48b05a9936e6f0c937b#diff-475044ef0a13791f000702f5f3e09874L83)], which throws a AssertionException when plugin is null in the static field initialization.
           * Symptom (in unit tests) can be seen there: [https://ci.jenkins.io/blue/organizations/jenkins/Plugins%2Fclaim-plugin/detail/PR-39/1/] => InjectedTests never run. Note that this was not the case with Jenkins 2.7.3.

          (?) Note that even if this seems an advanced scenario, other implementations of _ReactorListener_s could be provided through [https://github.com/jenkinsci/jenkins/blob/a82a47a8f2a90727d873c637c99aa0d087500dfe/core/src/main/java/jenkins/InitReactorRunner.java#L63,] and they might provide implementations which throw exception.
          ----
          *How to correct?*

          I would have proposed a fix, but I'm not sure of the direction to take:
           * enhance the Reactor component
           ** by protecting listeners calls ?
           ** by adding a way for listener methods to mark the task as fatal for milestones?
           * work around the limitations on listeners error-handling in Jenkins core code 
           ** by try catching, logging, and swallowing the exception (not sure of the consequences) ?
           ** or by adding back to the reactor a task designed to throw the exception in case of errors (will occur at some unknown point in the future, but at least will stop Jenkins startup) ?
           * just fix all plugins ?
           * some other way?
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners).

          I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there

          Show
          oleg_nenashev Oleg Nenashev added a comment - The issue description looks correct to me. I think the best way would be to fix the Task Reactor Component (at least by wrapping exceptions in listeners). I have some changes pending there (component refresh + extra API enhancements for SECURITY-667). If you submit a PR, please ping me there
          oleg_nenashev Oleg Nenashev made changes -
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          oleg_nenashev Oleg Nenashev made changes -
          Labels exception robustness exception lib-task-reactor robustness
          greybird Arnaud TAMAILLON made changes -
          Assignee Arnaud TAMAILLON [ greybird ]
          greybird Arnaud TAMAILLON made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Arnaud TAMAILLON have you seen it in the wild? If yes, it is definitely an LTS candidate

          Show
          oleg_nenashev Oleg Nenashev added a comment - Arnaud TAMAILLON have you seen it in the wild? If yes, it is definitely an LTS candidate
          Hide
          greybird Arnaud TAMAILLON added a comment -

          Nope, only in unit tests as of now.

          As the issue originates from plugin dependencies mismatch, this problem might not surface very often?

           

          Show
          greybird Arnaud TAMAILLON added a comment - Nope, only in unit tests as of now. As the issue originates from plugin dependencies mismatch, this problem might not surface very often?  
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          Plugin mismatch definitely happens more often than I would like

          Show
          oleg_nenashev Oleg Nenashev added a comment - Plugin mismatch definitely happens more often than I would like
          greybird Arnaud TAMAILLON made changes -
          Status In Progress [ 3 ] In Review [ 10005 ]
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Arnaud
          Path:
          src/main/java/org/jvnet/hudson/reactor/Reactor.java
          src/test/java/org/jvnet/hudson/reactor/SessionTest.java
          http://jenkins-ci.org/commit/lib-task-reactor/fe40d4f703e0d487549ee722f49603fa9de1e593
          Log:
          JENKINS-48725 fix listener exceptions not being handled properly

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Arnaud Path: src/main/java/org/jvnet/hudson/reactor/Reactor.java src/test/java/org/jvnet/hudson/reactor/SessionTest.java http://jenkins-ci.org/commit/lib-task-reactor/fe40d4f703e0d487549ee722f49603fa9de1e593 Log: JENKINS-48725 fix listener exceptions not being handled properly
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          pom.xml
          src/main/java/org/jvnet/hudson/reactor/Reactor.java
          src/main/java/org/jvnet/hudson/reactor/ReactorListener.java
          src/test/java/org/jvnet/hudson/reactor/SessionTest.java
          http://jenkins-ci.org/commit/lib-task-reactor/32147b876e5f92551e32429a4a93445c6889cc90
          Log:
          Merge pull request #3 from Greybird/JENKINS-48725

          JENKINS-48725 fix listener exceptions not being handled properly

          Compare: https://github.com/jenkinsci/lib-task-reactor/compare/8d7e39fe811a...32147b876e5f

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: pom.xml src/main/java/org/jvnet/hudson/reactor/Reactor.java src/main/java/org/jvnet/hudson/reactor/ReactorListener.java src/test/java/org/jvnet/hudson/reactor/SessionTest.java http://jenkins-ci.org/commit/lib-task-reactor/32147b876e5f92551e32429a4a93445c6889cc90 Log: Merge pull request #3 from Greybird/ JENKINS-48725 JENKINS-48725 fix listener exceptions not being handled properly Compare: https://github.com/jenkinsci/lib-task-reactor/compare/8d7e39fe811a...32147b876e5f
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/720db118a808c04f2ae04dc36582ce7dd5031b9b
          Log:
          JENKINS-48725 - Update Lib Task Reactor to 1.5

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/720db118a808c04f2ae04dc36582ce7dd5031b9b Log: JENKINS-48725 - Update Lib Task Reactor to 1.5
          Hide
          scm_issue_link SCM/JIRA link daemon added a comment -

          Code changed in jenkins
          User: Oleg Nenashev
          Path:
          core/pom.xml
          http://jenkins-ci.org/commit/jenkins/e86e6f9e428787de244e45ce1d7581498880ad56
          Log:
          Merge pull request #3270 from oleg-nenashev/task-reactor/1.5

          JENKINS-48725 - Update Lib Task Reactor to 1.5

          Compare: https://github.com/jenkinsci/jenkins/compare/1a7881ce60cb...e86e6f9e4287

          Show
          scm_issue_link SCM/JIRA link daemon added a comment - Code changed in jenkins User: Oleg Nenashev Path: core/pom.xml http://jenkins-ci.org/commit/jenkins/e86e6f9e428787de244e45ce1d7581498880ad56 Log: Merge pull request #3270 from oleg-nenashev/task-reactor/1.5 JENKINS-48725 - Update Lib Task Reactor to 1.5 Compare: https://github.com/jenkinsci/jenkins/compare/1a7881ce60cb...e86e6f9e4287
          Hide
          oleg_nenashev Oleg Nenashev added a comment -

          It has been released in Jenkins 2.105

          Show
          oleg_nenashev Oleg Nenashev added a comment - It has been released in Jenkins 2.105
          oleg_nenashev Oleg Nenashev made changes -
          Resolution Fixed [ 1 ]
          Status In Review [ 10005 ] Resolved [ 5 ]

            People

            Assignee:
            greybird Arnaud TAMAILLON
            Reporter:
            greybird Arnaud TAMAILLON
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: