Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-25504

Failing a Step with a body while the body is running breaks FlowNodeGraph

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • pipeline
    • None

      Reported by jglick

      I found a StepEndNode of an ExecutorStep with an ErrorAction encoding

      java.io.NotSerializableException: hudson.model.Executor
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:890)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:584)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:679)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1062)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1018)
      	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:884)
      	at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
      	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:111)
      	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:128)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:320)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:304)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:278)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$000(CpsThreadGroup.java:68)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:168)
      	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:166)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      

      However its descendant StepEndNode did not have an ErrorAction, nor did that node’s descendant StepEndNode, nor the final FlowEndNode; so FlowExecution.getCauseOfFailure was null and there was no stack trace in the log.

      I assumed from the stack trace that the exception would have been caught in saveProgram, and that propagateErrorToWorkflow was therefore called, but the log contained no message about program state save failed.

      Pretty well reproducible: just run a flow allocating a docker-plugin slave, let it run a slow shell step, and restart in the middle. The exact set of errors seems to differ from run to run, but they are never printed to the log.

          [JENKINS-25504] Failing a Step with a body while the body is running breaks FlowNodeGraph

          Kohsuke Kawaguchi created issue -
          Kohsuke Kawaguchi made changes -
          Remote Link New: This issue links to "Original report (Web Link)" [ 11905 ]

          It's unclear to me how this has anything to do with restart. The problem appears to be that an unserializable object is a part of the program state, so I used the following workflow script to induce NotSerializableException:

          node {
            def x = new Object(); // some unserializable object
            sh 'sleep 60'
          }
          

          This failed the workflow instantly as it tried to execute the sh step, and the stack trace was visible in the console output.

          Kohsuke Kawaguchi added a comment - It's unclear to me how this has anything to do with restart. The problem appears to be that an unserializable object is a part of the program state, so I used the following workflow script to induce NotSerializableException : node { def x = new Object (); // some unserializable object sh 'sleep 60' } This failed the workflow instantly as it tried to execute the sh step, and the stack trace was visible in the console output .

          I've also run the following script and restarted the workflow in the middle, but this didn't result in any failure, and breakpoint on NotSerializablException didn't hit:

          node {
            sh 'sleep 60'
          }
          

          Kohsuke Kawaguchi added a comment - I've also run the following script and restarted the workflow in the middle, but this didn't result in any failure, and breakpoint on NotSerializablException didn't hit: node { sh 'sleep 60' }

          My suggestion is to close this with "Cannot Reproduce", but I'll let you do that in case you think I missed the critical steps to reproduce the problem.

          Kohsuke Kawaguchi added a comment - My suggestion is to close this with "Cannot Reproduce", but I'll let you do that in case you think I missed the critical steps to reproduce the problem.

          Kohsuke Kawaguchi added a comment - - edited

          (deleted; comment meant for another issue.)

          Kohsuke Kawaguchi added a comment - - edited (deleted; comment meant for another issue.)
          Kohsuke Kawaguchi made changes -
          Assignee Original: Jesse Glick [ jglick ] New: Kohsuke Kawaguchi [ kohsuke ]

          Scratch that, I just found out a way to reproduce this problem.

          Kohsuke Kawaguchi added a comment - Scratch that, I just found out a way to reproduce this problem.

          This problem happens when a step with a body (such as ExecutorStep) reports a failure to CpsStepContext.onFailure() while the body is still running.

          CpsStepContext.onFailure() is probably expected to do something about the running body, but it doesn't do anything, and leaves the body running. The failed node methods returns in CPS code, and that ends up running in parallel with the body as two CpsThread objects.

          This breaks the flow graph badly, and in a program like the following, the last node in the flow graph comes as a result of successful completion of the sh step, therefore FlowEndNode will not have any ErrorAction.

          node {
            sh 'sleep 60'
          }
          

          Kohsuke Kawaguchi added a comment - This problem happens when a step with a body (such as ExecutorStep ) reports a failure to CpsStepContext.onFailure() while the body is still running. CpsStepContext.onFailure() is probably expected to do something about the running body, but it doesn't do anything, and leaves the body running. The failed node methods returns in CPS code, and that ends up running in parallel with the body as two CpsThread objects. This breaks the flow graph badly, and in a program like the following, the last node in the flow graph comes as a result of successful completion of the sh step, therefore FlowEndNode will not have any ErrorAction . node { sh 'sleep 60' }
          Kohsuke Kawaguchi made changes -
          Attachment New: graphViz.png [ 27962 ]

            kohsuke Kohsuke Kawaguchi
            kohsuke Kohsuke Kawaguchi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: