Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-51539

A paused Workflow job does not resume after safeExit when parallel step is wrapped by a node step

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • workflow-cps 2.56

      Hi,

      We have a Jenkins running many pipeline jobs that waits for promotions using input step at the end of the pipeline.

      The input step is NOT tied to a node.

      We also use a docker based cloud plugin to spawn slaves and terminate them based on labels. So no executer is defined on the master and a node('label'){} declaration just do its thing and terminates the slave.
       
      When we upgrade jenkins or restart it from time to time, all of our jobs that wait for promotion are resumed correctly and deployment to prod and staging can continue from where we left them.

      We do have several jobs that do not resume correctly and during startup, their logs looks like this:

      Waiting to resume part of #JOB-NAME: There are no nodes with the label ‘<UNIQUE-DOCKER-LABEL’
      ...
      

      And then after ~6 minutes they fail.

      Trying to understand the issue, I managed to create a VERY simple pipeline that behaves the same

      state('build'){
          node('generic'){
              parallel (
                  a: {
                      echo 'inside a'
                  },
                  b: {
                      echo 'inside b'
                  }
              )
          }
      }
      stage('wait'){
          input message: 'wait??', id: 'wait-for-job'
      }
      

      The 'generic' label is a defined in docker-plugin with jnlp-slave docker image, but I managed to reproduce it with ecs-plugin and kubernetes-plugin as well.

      In order to reproduce the issue, you just need to run this pipeline and wait it will reach the wait stage. Then when it is waiting for input, restart the Jenkins instance gracefully and you will see that the job cannot resume.

      If you move the parallel outside the node, it will resume correctly.

            Unassigned Unassigned
            odavid Ohad David
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: