Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67164

Pipelines missing from FlowExecutionList hang forever after resuming

      Pipeline builds that are missing from FlowExecutionList, but which are still in progress, may hang forever after a Jenkins restart.

      Normally, FlowExecutionList is responsible for resuming running Pipeline builds after a restart, but really anything that causes the build to be loaded will make it resume. However, if the Pipeline is missing from FlowExecutionList and resumes because it is loaded directly, then this code is skipped, and any step executions in that build are not resumed. This can result in the Pipeline hanging forever.

      I ran into this issue while backing up and restoring a large Jenkins controller using a file-based backup system while Jenkins was running. Since Jenkins was running, the serialized state of FlowExecutionList and the build itself did not match in the backup. I am not sure if it is possible to run into this issue in non-backup scenarios.

      That said, we can harden against this issue by having Pipelines resume their step executions directly when they are loaded, rather than relying on FlowExecutionList to do so. This way it does not matter if the serialized state of FlowExecutionList is somehow incorrect and something else causes a Pipeline to resume. See jenkinsci/workflow-api-plugin#178.

          [JENKINS-67164] Pipelines missing from FlowExecutionList hang forever after resuming

          Devin Nusbaum created issue -
          Devin Nusbaum made changes -
          Description Original: Pipeline builds that are missing from {{FlowExecutionList}}, but which are still in progress, may hang forever after a Jenkins restart.

          Normally, {{FlowExecutionList}} is responsible for resuming running Pipeline builds after a restart, but really anything that causes the build to be loaded will make it resume. However, if the Pipeline is missing from {{FlowExecutionList}} and resumes because it is loaded directly, then [this code](https://github.com/jenkinsci/workflow-api-plugin/blob/b922745a12d0a7816c74028cfed232b73b531767/src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionList.java#L177-L197) is skipped, and any step executions in that build are not resumed. This can result in the Pipeline hanging forever.

          I ran into this issue while backing up and restoring a large Jenkins controller using a file-based backup system while Jenkins was running. Since Jenkins was running, the serialized state of FlowExecutionList and the build itself did not match in the backup. I am not sure if it is possible to run into this issue in non-backup scenarios.

          That said, we can harden against this issue by having Pipelines resume their step executions directly when they are loaded, rather than relying on {{FlowExecutionList}} to do so. This way it does not matter if the serialized state of {{FlowExecutionList}} is somehow incorrect and something else causes a Pipeline to resume. See [jenkinsci/workflow-api-plugin#178](https://github.com/jenkinsci/workflow-api-plugin/pull/178).
          New: Pipeline builds that are missing from {{FlowExecutionList}}, but which are still in progress, may hang forever after a Jenkins restart.

          Normally, {{FlowExecutionList}} is responsible for resuming running Pipeline builds after a restart, but really anything that causes the build to be loaded will make it resume. However, if the Pipeline is missing from {{FlowExecutionList}} and resumes because it is loaded directly, then [this code|https://github.com/jenkinsci/workflow-api-plugin/blob/b922745a12d0a7816c74028cfed232b73b531767/src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionList.java#L177-L197] is skipped, and any step executions in that build are not resumed. This can result in the Pipeline hanging forever.

          I ran into this issue while backing up and restoring a large Jenkins controller using a file-based backup system while Jenkins was running. Since Jenkins was running, the serialized state of FlowExecutionList and the build itself did not match in the backup. I am not sure if it is possible to run into this issue in non-backup scenarios.

          That said, we can harden against this issue by having Pipelines resume their step executions directly when they are loaded, rather than relying on {{FlowExecutionList}} to do so. This way it does not matter if the serialized state of {{FlowExecutionList}} is somehow incorrect and something else causes a Pipeline to resume. See [jenkinsci/workflow-api-plugin#178|https://github.com/jenkinsci/workflow-api-plugin/pull/178].
          Devin Nusbaum made changes -
          Link New: This issue relates to JENKINS-43587 [ JENKINS-43587 ]
          Devin Nusbaum made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Devin Nusbaum made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]
          Devin Nusbaum made changes -
          Remote Link New: This issue links to "jenkinsci/workflow-api-plugin#178 (Web Link)" [ 27229 ]
          Devin Nusbaum made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: In Review [ 10005 ] New: Fixed but Unreleased [ 10203 ]
          Devin Nusbaum made changes -
          Released As New: 1105.v3de5e2efac97
          Status Original: Fixed but Unreleased [ 10203 ] New: Resolved [ 5 ]
          Devin Nusbaum made changes -
          Released As Original: 1105.v3de5e2efac97 New: workflow-api 1105.v3de5e2efac97
          Devin Nusbaum made changes -
          Link New: This issue causes JENKINS-67351 [ JENKINS-67351 ]
          Jesse Glick made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]

            jglick Jesse Glick
            dnusbaum Devin Nusbaum
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: