Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25504

Failing a Step with a body while the body is running breaks FlowNodeGraph

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • pipeline
    • None

      Reported by jglick

      I found a StepEndNode of an ExecutorStep with an ErrorAction encoding

      java.io.NotSerializableException: hudson.model.Executor
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:890)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:584)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:679)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
      	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:128)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:320)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:304)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:278)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:68)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:168)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:166)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      However its descendant StepEndNode did not have an ErrorAction, nor did that node’s descendant StepEndNode, nor the final FlowEndNode; so FlowExecution.getCauseOfFailure was null and there was no stack trace in the log.

      I assumed from the stack trace that the exception would have been caught in saveProgram, and that propagateErrorToWorkflow was therefore called, but the log contained no message about program state save failed.

      Pretty well reproducible: just run a flow allocating a docker-plugin slave, let it run a slow shell step, and restart in the middle. The exact set of errors seems to differ from run to run, but they are never printed to the log.

          [JENKINS-25504] Failing a Step with a body while the body is running breaks FlowNodeGraph

          Jesse feels we need to do something (perhaps even as simple as throwing IllegalStateException) before 1.0, that the current behaviour of silently constructing illegal flow graph is not acceptable.

          Kohsuke Kawaguchi added a comment - Jesse feels we need to do something (perhaps even as simple as throwing IllegalStateException ) before 1.0, that the current behaviour of silently constructing illegal flow graph is not acceptable.

          Jesse Glick added a comment -

          The evaluation here has omitted any analysis of the NotSerializableException. Perhaps this is a distinct problem, I am not sure. The first comment, about setting a new Object() to a local variable, is missing the point: the program state should never be trying to serialize an Executor, because we have ExecutorPickle to replace it. So if we get this exception, it means that something in the serialization code is broken.

          Jesse Glick added a comment - The evaluation here has omitted any analysis of the NotSerializableException . Perhaps this is a distinct problem, I am not sure. The first comment, about setting a new Object() to a local variable, is missing the point: the program state should never be trying to serialize an Executor , because we have ExecutorPickle to replace it. So if we get this exception, it means that something in the serialization code is broken.

          Code changed in jenkins
          User: Jesse Glick
          Path:
          support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
          http://jenkins-ci.org/commit/workflow-plugin/6fd47a3bfa55663c42db14ac6f4c86cf31b5317f
          Log:
          JENKINS-25504 Trying to at least remove the trigger for the exceptions encountered when shutting down Jenkins.
          Does not address the general possibility of the bogus NotSerializableException on Executor.
          Does not have anything to do with actual interruption of an executor step
          (as opposed to the InterruptedException thrown during shutdown).

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Jesse Glick Path: support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java http://jenkins-ci.org/commit/workflow-plugin/6fd47a3bfa55663c42db14ac6f4c86cf31b5317f Log: JENKINS-25504 Trying to at least remove the trigger for the exceptions encountered when shutting down Jenkins. Does not address the general possibility of the bogus NotSerializableException on Executor. Does not have anything to do with actual interruption of an executor step (as opposed to the InterruptedException thrown during shutdown).

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          basic-steps/src/main/java/org/jenkinsci/plugins/workflow/steps/PushdStep.java
          basic-steps/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodySubContext.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStep.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStepExecution.java
          step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyExecution.java
          step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyExecutionCallback.java
          step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyInvoker.java
          support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java
          support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/WorkspaceStepExecution.java
          support/src/test/java/org/jenkinsci/plugins/workflow/test/steps/BlockSemaphoreStep.java
          support/src/test/java/org/jenkinsci/plugins/workflow/test/steps/TmpDirStepExecution.java
          http://jenkins-ci.org/commit/workflow-plugin/62b92d7a563fcb7a20d5b0aa69058de5be120b8c
          Log:
          JENKINS-25504

          Introducing BodyExecutionCallback as a beefed up version of
          FutureCallback, so that the step implementation can interact with the
          start body block / end body block, such as adding log messages.

          We do this by creating a different StepContext objects that are only
          valid during the body start and body end. This in turn allow Steps to
          re-inject the context to access correct TaskListeners and things like
          that (which is not in this commit.)

          Copying my email to jenkinsci-dev for the motivation:
          ------
          I'm working on this card, where we want to be able to write to console
          log and do stuff at the end of the body invocation.

          The challenge here is that the Step API has no dependency to the
          workflow Flow Node API (it goes the other way around), so we need to do
          this in a way that doesn't force Flow API into the likes of RetryStep.
          That pretty much means we need to do the context injection of
          TaskListener like StepContext does.

          I thought about two ways to do this.

          One is to reuse the existing StepContext instance. We could just say
          that at the point of the callback from the body execution,
          StepContet.get() would return the context objects for the end of the
          body (TaskListener connecte to the right FlowNode, etc.) We could then
          provide some convenience base class that performs @StepContextParameter
          injection, so that everything gets reinjected.

          This works, but I also feel that it's bit brittle, especially for
          ParallelStep that invokes multiple bodies at the same time, and thus it
          wants multiple different post body execution context. So I started
          thinking about another way, where the end of the body execution is
          notified to FutureCallback<BodyExecutionResult> instead of
          FutureCallback<Object>, and the newly added BodyExecutionResult class
          would have properties like outcome, the same get(Class) method that
          returns contextual object, and so on.

          That also makes it more explicit that the context injection works
          differently post body execution.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: basic-steps/src/main/java/org/jenkinsci/plugins/workflow/steps/PushdStep.java basic-steps/src/main/java/org/jenkinsci/plugins/workflow/steps/TimeoutStepExecution.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodySubContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/DSL.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStep.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/steps/ParallelStepExecution.java step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyExecution.java step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyExecutionCallback.java step-api/src/main/java/org/jenkinsci/plugins/workflow/steps/BodyInvoker.java support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepExecution.java support/src/main/java/org/jenkinsci/plugins/workflow/support/steps/WorkspaceStepExecution.java support/src/test/java/org/jenkinsci/plugins/workflow/test/steps/BlockSemaphoreStep.java support/src/test/java/org/jenkinsci/plugins/workflow/test/steps/TmpDirStepExecution.java http://jenkins-ci.org/commit/workflow-plugin/62b92d7a563fcb7a20d5b0aa69058de5be120b8c Log: JENKINS-25504 Introducing BodyExecutionCallback as a beefed up version of FutureCallback, so that the step implementation can interact with the start body block / end body block, such as adding log messages. We do this by creating a different StepContext objects that are only valid during the body start and body end. This in turn allow Steps to re-inject the context to access correct TaskListeners and things like that (which is not in this commit.) Copying my email to jenkinsci-dev for the motivation: ------ I'm working on this card, where we want to be able to write to console log and do stuff at the end of the body invocation. The challenge here is that the Step API has no dependency to the workflow Flow Node API (it goes the other way around), so we need to do this in a way that doesn't force Flow API into the likes of RetryStep. That pretty much means we need to do the context injection of TaskListener like StepContext does. I thought about two ways to do this. One is to reuse the existing StepContext instance. We could just say that at the point of the callback from the body execution, StepContet.get() would return the context objects for the end of the body (TaskListener connecte to the right FlowNode, etc.) We could then provide some convenience base class that performs @StepContextParameter injection, so that everything gets reinjected. This works, but I also feel that it's bit brittle, especially for ParallelStep that invokes multiple bodies at the same time, and thus it wants multiple different post body execution context. So I started thinking about another way, where the end of the body execution is notified to FutureCallback<BodyExecutionResult> instead of FutureCallback<Object>, and the newly added BodyExecutionResult class would have properties like outcome, the same get(Class) method that returns contextual object, and so on. That also makes it more explicit that the context injection works differently post body execution.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          http://jenkins-ci.org/commit/workflow-plugin/bd51b04ebeec4ece43f396626ae7e6f94a25e96f
          Log:
          JENKINS-25504

          CPS engine side of the change to fire newly added callbacks of
          BodyExecutionCallback

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java http://jenkins-ci.org/commit/workflow-plugin/bd51b04ebeec4ece43f396626ae7e6f94a25e96f Log: JENKINS-25504 CPS engine side of the change to fire newly added callbacks of BodyExecutionCallback

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          http://jenkins-ci.org/commit/workflow-plugin/1e843f10aa189d7f14f218ea642286438fce08d5
          Log:
          JENKINS-25504 Async step should continue executing until all the bodies are done and the outcome is set.

          Previously, as soon as an outcome is set the step was considered done, even when the body was running.
          This corrupts the flow graph as multiple CpsThreads collide on trying to update the same FlowHead.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyInvoker.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java http://jenkins-ci.org/commit/workflow-plugin/1e843f10aa189d7f14f218ea642286438fce08d5 Log: JENKINS-25504 Async step should continue executing until all the bodies are done and the outcome is set. Previously, as soon as an outcome is set the step was considered done, even when the body was running. This corrupts the flow graph as multiple CpsThreads collide on trying to update the same FlowHead.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          http://jenkins-ci.org/commit/workflow-plugin/17206dcbadc892231e43b8e11962b80cff28ff15
          Log:
          JENKINS-25504

          If the step is marked as failed while the body is still running, try to interrupt the body execution.
          This would speed up the step as a whole completing as a failure.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java http://jenkins-ci.org/commit/workflow-plugin/17206dcbadc892231e43b8e11962b80cff28ff15 Log: JENKINS-25504 If the step is marked as failed while the body is still running, try to interrupt the body execution. This would speed up the step as a whole completing as a failure.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
          http://jenkins-ci.org/commit/workflow-plugin/68de87c231d01605f61344cc3eccbafcbaf71470
          Log:
          JENKINS-25504

          The previous attempt failed because CpsStepContext can get duplicated because of persistence.

          This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time.

          That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-plugin/68de87c231d01605f61344cc3eccbafcbaf71470 Log: JENKINS-25504 The previous attempt failed because CpsStepContext can get duplicated because of persistence. This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time. That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.

          Problem sufficiently patched.

          Kohsuke Kawaguchi added a comment - Problem sufficiently patched.

          Code changed in jenkins
          User: Kohsuke Kawaguchi
          Path:
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java
          cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java
          http://jenkins-ci.org/commit/workflow-cps-plugin/e22ae99a0994358fdb10ca66ca77c1f47ed73460
          Log:
          JENKINS-25504

          The previous attempt failed because CpsStepContext can get duplicated because of persistence.

          This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time.

          That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls.

          Originally-Committed-As: 68de87c231d01605f61344cc3eccbafcbaf71470

          SCM/JIRA link daemon added a comment - Code changed in jenkins User: Kohsuke Kawaguchi Path: cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsStepContext.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThread.java cps/src/main/java/org/jenkinsci/plugins/workflow/cps/CpsThreadGroup.java http://jenkins-ci.org/commit/workflow-cps-plugin/e22ae99a0994358fdb10ca66ca77c1f47ed73460 Log: JENKINS-25504 The previous attempt failed because CpsStepContext can get duplicated because of persistence. This is a weaker fix that only waits for the "primary" body execution. That is, this doesn't work correctly if a Step is like parallel step that executes multiple bodies at the same time. That said, the current ParallelStep implementation never reports itself completed until all the bodies check in, so this code should work correctly with all the known Step impls. Originally-Committed-As: 68de87c231d01605f61344cc3eccbafcbaf71470

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: