Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-53709

Parallel blocks in node blocks cause executors to be persisted outside of the node block

    • Pipeline Groovy 2.56

      When a parallel step is nested in a node step, the executor associated with the node appears to outlive both the parallel and node steps. This leads to the executor being rehydrated when a pipeline is restarted, even if the pipeline is outside of the node block.

      Reproduction test case:

      @Test public void shouldNotLeakExecutorsViaContextVars() {
          story.then(r -> {
              DumbSlave s = r.createOnlineSlave();
              WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
              p.setDefinition(new CpsFlowDefinition("node('" + s.getNodeName() + "') {\n" +
                      "  parallel one: {\n" +
                      "    echo '" + s.getNodeName() + "'\n" +
                      "  }\n" +
                      "}\n" +
                      "semaphore 'wait'\n", false));
              WorkflowRun b = p.scheduleBuild2(0).waitForStart();
              SemaphoreStep.waitForStart("wait/1", b);
              r.jenkins.removeNode(s);
          });
          story.then(r -> {
              WorkflowRun b = r.jenkins.getItemByFullName("demo", WorkflowJob.class).getBuildByNumber(1);
              SemaphoreStep.waitForStart("wait/1", b);
              SemaphoreStep.success("wait/1", null);
              while (b.isBuilding()) {
                  r.assertLogNotContains("Jenkins doesn’t have label", b);
                  Thread.sleep(100);
              }
              r.assertBuildStatusSuccess(b);
          });
      }
      

      This test currently fails because the pipeline waits for the 'Test' agent to become available after restarting even though we are not in a node block.

          [JENKINS-53709] Parallel blocks in node blocks cause executors to be persisted outside of the node block

          Devin Nusbaum created issue -
          Devin Nusbaum made changes -
          Status Original: Open [ 1 ] New: In Progress [ 3 ]
          Devin Nusbaum made changes -
          Description Original: When a {{parallel}} step is nested in a {{node}} step, the executor associated with the node appears to outlive both the {{parallel}} and {{node}} steps. This leads to the executor being rehydrated when a pipeline is restarted, even if the pipeline is outside of the node block.

          Reproduction test case:

          {code}
              @Test public void shouldNotLeakExecutorsViaContextVars() {
                  story.then(r -> {
                      DumbSlave s = r.createOnlineSlave();
                      WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
                      p.setDefinition(new CpsFlowDefinition("node('" + s.getNodeName() + "') {\n" +
                              " parallel one: {\n" +
                              " echo '" + s.getNodeName() + "'\n" +
                              " }\n" +
                              "}\n" +
                              "semaphore 'wait'\n", false));
                      WorkflowRun b = p.scheduleBuild2(0).waitForStart();
                      SemaphoreStep.waitForStart("wait/1", b);
                      r.disconnectSlave(s);
                  });
                  story.then(r -> {
                      WorkflowRun b = r.jenkins.getItemByFullName("demo", WorkflowJob.class).getBuildByNumber(1);
                      SemaphoreStep.waitForStart("wait/1", b);
                      SemaphoreStep.success("wait/1", null);
                      while (b.isBuilding()) {
                          r.assertLogNotContains(" is offline", b);
                          Thread.sleep(100);
                      }
                      r.assertBuildStatusSuccess(b);
                  });
              }
          {code}

          This test currently fails because the pipeline waits for the 'Test' agent to become available after restarting even though we are not in a node block.

          From a quick investigation, I think this may have been introduced by JENKINS-26034 ([commit|https://github.com/jenkinsci/workflow-cps-plugin/commit/c8c668f2b60a19c33add92e2b14345f23f58aabc]), because if I remove [ResultHandler.stepExecution|https://github.com/jenkinsci/workflow-cps-plugin/blob/54d2f4fe8069fde53789bfe21229ce8e545300bb/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStep.java#L70], the test case passes successfully. I'm not sure if we shouldn't be persisting the execution there, or if we need to clear it out after the step completes, or if the persistence is fine and the root problem is somewhere else.
          New: When a {{parallel}} step is nested in a {{node}} step, the executor associated with the node appears to outlive both the {{parallel}} and {{node}} steps. This leads to the executor being rehydrated when a pipeline is restarted, even if the pipeline is outside of the node block.

          Reproduction test case:

          {code}
          @Test public void shouldNotLeakExecutorsViaContextVars() {
              story.then(r -> {
                  DumbSlave s = r.createOnlineSlave();
                  WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
                  p.setDefinition(new CpsFlowDefinition("node('" + s.getNodeName() + "') {\n" +
                          " parallel one: {\n" +
                          " echo '" + s.getNodeName() + "'\n" +
                          " }\n" +
                          "}\n" +
                          "semaphore 'wait'\n", false));
                  WorkflowRun b = p.scheduleBuild2(0).waitForStart();
                  SemaphoreStep.waitForStart("wait/1", b);
                  r.jenkins.removeNode(s);
              });
              story.then(r -> {
                  WorkflowRun b = r.jenkins.getItemByFullName("demo", WorkflowJob.class).getBuildByNumber(1);
                  SemaphoreStep.waitForStart("wait/1", b);
                  SemaphoreStep.success("wait/1", null);
                  while (b.isBuilding()) {
                      r.assertLogNotContains("Jenkins doesn’t have label", b);
                      Thread.sleep(100);
                  }
                  r.assertBuildStatusSuccess(b);
              });
          }
          {code}

          This test currently fails because the pipeline waits for the 'Test' agent to become available after restarting even though we are not in a node block.

          From a quick investigation, I think this may have been introduced by JENKINS-26034 ([commit|https://github.com/jenkinsci/workflow-cps-plugin/commit/c8c668f2b60a19c33add92e2b14345f23f58aabc]), because if I remove [ResultHandler.stepExecution|https://github.com/jenkinsci/workflow-cps-plugin/blob/54d2f4fe8069fde53789bfe21229ce8e545300bb/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStep.java#L70], the test case passes successfully. I'm not sure if we shouldn't be persisting the execution there, or if we need to clear it out after the step completes, or if the persistence is fine and the root problem is somewhere else.
          Devin Nusbaum made changes -
          Description Original: When a {{parallel}} step is nested in a {{node}} step, the executor associated with the node appears to outlive both the {{parallel}} and {{node}} steps. This leads to the executor being rehydrated when a pipeline is restarted, even if the pipeline is outside of the node block.

          Reproduction test case:

          {code}
          @Test public void shouldNotLeakExecutorsViaContextVars() {
              story.then(r -> {
                  DumbSlave s = r.createOnlineSlave();
                  WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
                  p.setDefinition(new CpsFlowDefinition("node('" + s.getNodeName() + "') {\n" +
                          " parallel one: {\n" +
                          " echo '" + s.getNodeName() + "'\n" +
                          " }\n" +
                          "}\n" +
                          "semaphore 'wait'\n", false));
                  WorkflowRun b = p.scheduleBuild2(0).waitForStart();
                  SemaphoreStep.waitForStart("wait/1", b);
                  r.jenkins.removeNode(s);
              });
              story.then(r -> {
                  WorkflowRun b = r.jenkins.getItemByFullName("demo", WorkflowJob.class).getBuildByNumber(1);
                  SemaphoreStep.waitForStart("wait/1", b);
                  SemaphoreStep.success("wait/1", null);
                  while (b.isBuilding()) {
                      r.assertLogNotContains("Jenkins doesn’t have label", b);
                      Thread.sleep(100);
                  }
                  r.assertBuildStatusSuccess(b);
              });
          }
          {code}

          This test currently fails because the pipeline waits for the 'Test' agent to become available after restarting even though we are not in a node block.

          From a quick investigation, I think this may have been introduced by JENKINS-26034 ([commit|https://github.com/jenkinsci/workflow-cps-plugin/commit/c8c668f2b60a19c33add92e2b14345f23f58aabc]), because if I remove [ResultHandler.stepExecution|https://github.com/jenkinsci/workflow-cps-plugin/blob/54d2f4fe8069fde53789bfe21229ce8e545300bb/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStep.java#L70], the test case passes successfully. I'm not sure if we shouldn't be persisting the execution there, or if we need to clear it out after the step completes, or if the persistence is fine and the root problem is somewhere else.
          New: When a {{parallel}} step is nested in a {{node}} step, the executor associated with the node appears to outlive both the {{parallel}} and {{node}} steps. This leads to the executor being rehydrated when a pipeline is restarted, even if the pipeline is outside of the node block.

          Reproduction test case:

          {code}
          @Test public void shouldNotLeakExecutorsViaContextVars() {
              story.then(r -> {
                  DumbSlave s = r.createOnlineSlave();
                  WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "demo");
                  p.setDefinition(new CpsFlowDefinition("node('" + s.getNodeName() + "') {\n" +
                          " parallel one: {\n" +
                          " echo '" + s.getNodeName() + "'\n" +
                          " }\n" +
                          "}\n" +
                          "semaphore 'wait'\n", false));
                  WorkflowRun b = p.scheduleBuild2(0).waitForStart();
                  SemaphoreStep.waitForStart("wait/1", b);
                  r.jenkins.removeNode(s);
              });
              story.then(r -> {
                  WorkflowRun b = r.jenkins.getItemByFullName("demo", WorkflowJob.class).getBuildByNumber(1);
                  SemaphoreStep.waitForStart("wait/1", b);
                  SemaphoreStep.success("wait/1", null);
                  while (b.isBuilding()) {
                      r.assertLogNotContains("Jenkins doesn’t have label", b);
                      Thread.sleep(100);
                  }
                  r.assertBuildStatusSuccess(b);
              });
          }
          {code}

          This test currently fails because the pipeline waits for the 'Test' agent to become available after restarting even though we are not in a node block.
          Devin Nusbaum made changes -
          Status Original: In Progress [ 3 ] New: In Review [ 10005 ]
          Devin Nusbaum made changes -
          Remote Link New: This issue links to "jenkinsci/workflow-cps-plugin#245 (Web Link)" [ 21829 ]
          Devin Nusbaum made changes -
          Released As New: Pipeline Groovy 2.56
          Resolution New: Fixed [ 1 ]
          Status Original: In Review [ 10005 ] New: Resolved [ 5 ]
          Devin Nusbaum made changes -
          Link New: This issue is duplicated by JENKINS-51539 [ JENKINS-51539 ]
          Jesse Glick made changes -
          Link New: This issue relates to JENKINS-41791 [ JENKINS-41791 ]
          Devin Nusbaum made changes -
          Link New: This issue relates to JENKINS-63164 [ JENKINS-63164 ]
          Devin Nusbaum made changes -
          Link New: This issue is duplicated by JENKINS-39552 [ JENKINS-39552 ]

            dnusbaum Devin Nusbaum
            dnusbaum Devin Nusbaum
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: