Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-67164

Pipelines missing from FlowExecutionList hang forever after resuming


      Pipeline builds that are missing from FlowExecutionList, but which are still in progress, may hang forever after a Jenkins restart.

      Normally, FlowExecutionList is responsible for resuming running Pipeline builds after a restart, but really anything that causes the build to be loaded will make it resume. However, if the Pipeline is missing from FlowExecutionList and resumes because it is loaded directly, then this code is skipped, and any step executions in that build are not resumed. This can result in the Pipeline hanging forever.

      I ran into this issue while backing up and restoring a large Jenkins controller using a file-based backup system while Jenkins was running. Since Jenkins was running, the serialized state of FlowExecutionList and the build itself did not match in the backup. I am not sure if it is possible to run into this issue in non-backup scenarios.

      That said, we can harden against this issue by having Pipelines resume their step executions directly when they are loaded, rather than relying on FlowExecutionList to do so. This way it does not matter if the serialized state of FlowExecutionList is somehow incorrect and something else causes a Pipeline to resume. See jenkinsci/workflow-api-plugin#178.

            jglick Jesse Glick
            dnusbaum Devin Nusbaum
            0 Vote for this issue
            2 Start watching this issue