Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-70267

Internal pipeline crashes leave stuck executors

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None

      If a pipeline crashes due to some unexpected internal errors, jobs get stuck and block the executors from running other things.

      I have built the following minimal example for a pipeline (Taken from another issue: JENKINS-70080. This is not the main problem, just an easy way to cause the issue described later):

      class TestRunner {
        private def testCases = []
      
        @NonCPS
        void add(newTests) {
          testCases += newTests
        }
      }
      
      
      node('test'){
          stage('a'){
              script {
                  def t = new TestRunner()
              }
              echo "Hello"   
          }
      }
      

      This code produces the following crash:

      12:33:04  [Pipeline] Start of Pipeline
      12:33:04  [Pipeline] node
      12:33:05  Running on test in /home/jenkins/workspace/test pipeline
      12:33:05  [Pipeline] End of Pipeline
      12:33:05  java.lang.VerifyError: (class: TestRunner, method: add signature: (Ljava/lang/Object;)V) Unable to pop operand off an empty stack
      12:33:05  	at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
      12:33:05  	at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3166)
      12:33:05  	at java.base/java.lang.Class.getDeclaredMethod(Class.java:2473)
      12:33:05  	at java.base/jdk.internal.reflect.ReflectionFactory.findReadWriteObjectForSerialization(ReflectionFactory.java:556)
      12:33:05  	at java.base/jdk.internal.reflect.ReflectionFactory.readObjectForSerialization(ReflectionFactory.java:537)
      12:33:05  	at jdk.unsupported/sun.reflect.ReflectionFactory.readObjectForSerialization(ReflectionFactory.java:144)
      12:33:05  	at org.jboss.marshalling.reflect.JDKSpecific$SerMethods.<init>(JDKSpecific.java:61)
      12:33:05  	at org.jboss.marshalling.reflect.SerializableClass.<init>(SerializableClass.java:84)
      12:33:05  	at org.jboss.marshalling.reflect.SerializableClassRegistry$1.computeValue(SerializableClassRegistry.java:62)
      12:33:05  	at org.jboss.marshalling.reflect.SerializableClassRegistry$1.computeValue(SerializableClassRegistry.java:59)
      12:33:05  	at java.base/java.lang.ClassValue.getFromHashMap(ClassValue.java:228)
      12:33:05  	at java.base/java.lang.ClassValue.getFromBackup(ClassValue.java:210)
      12:33:05  	at java.base/java.lang.ClassValue.get(ClassValue.java:116)
      12:33:05  	at org.jboss.marshalling.reflect.SerializableClassRegistry.lookup(SerializableClassRegistry.java:83)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeNewSerializableClass(RiverMarshaller.java:1514)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeNewClass(RiverMarshaller.java:1417)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeClass(RiverMarshaller.java:1268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeClassClass(RiverMarshaller.java:1256)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:166)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeArrayObject(RiverMarshaller.java:312)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:222)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.writeArrayObject(RiverMarshaller.java:312)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:222)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1080)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1080)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.BlockMarshaller.doWriteObject(BlockMarshaller.java:65)
      12:33:05  	at org.jboss.marshalling.river.BlockMarshaller.writeObject(BlockMarshaller.java:56)
      12:33:05  	at org.jboss.marshalling.MarshallerObjectOutputStream.writeObjectOverride(MarshallerObjectOutputStream.java:50)
      12:33:05  	at org.jboss.marshalling.river.RiverObjectOutputStream.writeObjectOverride(RiverObjectOutputStream.java:179)
      12:33:05  	at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:341)
      12:33:05  	at java.base/java.util.HashMap.internalWriteEntries(HashMap.java:1858)
      12:33:05  	at java.base/java.util.HashMap.writeObject(HashMap.java:1412)
      12:33:05  	at org.jboss.marshalling.reflect.JDKSpecific$SerMethods.callWriteObject(JDKSpecific.java:89)
      12:33:05  	at org.jboss.marshalling.reflect.SerializableClass.callWriteObject(SerializableClass.java:199)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1089)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteFields(RiverMarshaller.java:1143)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteSerializableObject(RiverMarshaller.java:1101)
      12:33:05  	at org.jboss.marshalling.river.RiverMarshaller.doWriteObject(RiverMarshaller.java:268)
      12:33:05  	at org.jboss.marshalling.AbstractObjectOutput.writeObject(AbstractObjectOutput.java:58)
      12:33:05  	at org.jboss.marshalling.AbstractMarshaller.writeObject(AbstractMarshaller.java:116)
      12:33:05  	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.lambda$writeObject$1(RiverWriter.java:144)
      12:33:05  	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox.runInSandbox(GroovySandbox.java:331)
      12:33:05  	at org.jenkinsci.plugins.workflow.support.pickles.serialization.RiverWriter.writeObject(RiverWriter.java:143)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:577)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgram(CpsThreadGroup.java:554)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.saveProgramIfPossible(CpsThreadGroup.java:537)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:461)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:330)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:294)
      12:33:05  	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
      12:33:05  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      12:33:05  	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
      12:33:05  	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:30)
      12:33:05  	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:70)
      12:33:05  	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
      12:33:05  	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
      12:33:05  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      12:33:05  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      12:33:05  	at java.base/java.lang.Thread.run(Thread.java:829)
      12:33:05  Finished: FAILURE
      

      The stack trace according to the monitoring looks like this:

      Stack-trace
      java.lang.IllegalStateException: trying to open a build log on test pipeline #53 after it has completed
            at org.jenkinsci.plugins.workflow.job.WorkflowRun.getListener(WorkflowRun.java:232)
            at org.jenkinsci.plugins.workflow.job.WorkflowRun$NodePrintListener.onNewHead(WorkflowRun.java:1079)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1556)
            at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.notifyNewHead(CpsThreadGroup.java:492)
            at org.jenkinsci.plugins.workflow.cps.FlowHead.setNewHead(FlowHead.java:158)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution.launch(CpsBodyExecution.java:125)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyInvoker.launch(CpsBodyInvoker.java:188)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyInvoker.launch(CpsBodyInvoker.java:183)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyInvoker$1.onSuccess(CpsBodyInvoker.java:159)
            at org.jenkinsci.plugins.workflow.cps.CpsBodyInvoker$1.onSuccess(CpsBodyInvoker.java:154)
            at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$5$1.run(CpsFlowExecution.java:930)
            at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$1.run(CpsVmExecutorService.java:38)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
            at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:30)
            at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:70)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
      

      After the crash, one of the executors for a given agent will be permanently blocked. There is no way of aborting the build via UI through the common means. While the red X button is there, it has no effect instead just prompts a popup saying:

      Are you sure you want to abort null?
      

      We were able to find a workaround by explicitly removing the executor from the computer via CLI:

      // Example Code, partially based on https://github.com/cloudbees/jenkins-scripts/blob/master/ProperlyStopOnlyRunningPipelines.groovy
      
      jenkins.model.Jenkins.instanceOrNull.getComputers().each { computer ->
        computer.executors.findAll { exec -> exec.isBusy() && exec.currentExecutable }.each { exec ->
          if (exec.currentExecutable.getFullDisplayName().contains("test pipeline")) {
            println "Stopping ${exec.currentExecutable.getFullDisplayName()}"
            computer.removeExecutor(exec)
          }
        }
      }
      

      This issue keeps reappearing in our systems within different build jobs. This example was based on our Test System, where we reproduced the issue. However we have the same error in production (see attached environment).

          [JENKINS-70267] Internal pipeline crashes leave stuck executors

          Marian Degel added a comment -

          We keep getting random crashes within the pipelines on occasions, which then continue to block our executors with non-removable jobs.

          Some of our newer issues:

          [2023-01-28T08:57:35.706Z] [Pipeline] }
          [2023-01-28T08:57:35.709Z] [Pipeline] // timeout
          [2023-01-28T08:57:35.713Z] [Pipeline] }
          [2023-01-28T08:57:35.716Z] [Pipeline] // stage
          [2023-01-28T08:57:35.720Z] [Pipeline] }
          [2023-01-28T08:57:35.722Z] [Pipeline] End of Pipeline
          [2023-01-28T08:57:35.744Z] java.lang.NullPointerException
          [2023-01-28T08:57:35.744Z] 	at com.dabsquared.gitlabjenkins.workflow.GitLabCommitStatusStep.access$400(GitLabCommitStatusStep.java:34)
          [2023-01-28T08:57:35.744Z] 	at com.dabsquared.gitlabjenkins.workflow.GitLabCommitStatusStep$GitLabCommitStatusStepExecution$1.onSuccess(GitLabCommitStatusStep.java:103)
          [2023-01-28T08:57:35.744Z] 	at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:372)
          [2023-01-28T08:57:35.744Z] 	at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73)
          [2023-01-28T08:57:35.744Z] 	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:155)
          [2023-01-28T08:57:35.744Z] 	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
          [2023-01-28T08:57:35.744Z] 	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
          [2023-01-28T08:57:35.744Z] 	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
          [2023-01-28T08:57:35.744Z] 	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:152)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:420)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:95)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:330)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:294)
          [2023-01-28T08:57:35.745Z] 	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          [2023-01-28T08:57:35.745Z] 	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
          [2023-01-28T08:57:35.745Z] 	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
          [2023-01-28T08:57:35.745Z] 	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
          [2023-01-28T08:57:35.745Z] 	at java.base/java.lang.Thread.run(Thread.java:829)
          [2023-01-28T08:57:35.755Z] Finished: FAILURE
          

          It appears, as if Jenkins suddenly cannot handle any exceptions that originate from the pipeline script itself. This behavior is new in our opinion.

          Marian Degel added a comment - We keep getting random crashes within the pipelines on occasions, which then continue to block our executors with non-removable jobs. Some of our newer issues: [2023-01-28T08:57:35.706Z] [Pipeline] } [2023-01-28T08:57:35.709Z] [Pipeline] // timeout [2023-01-28T08:57:35.713Z] [Pipeline] } [2023-01-28T08:57:35.716Z] [Pipeline] // stage [2023-01-28T08:57:35.720Z] [Pipeline] } [2023-01-28T08:57:35.722Z] [Pipeline] End of Pipeline [2023-01-28T08:57:35.744Z] java.lang.NullPointerException [2023-01-28T08:57:35.744Z] at com.dabsquared.gitlabjenkins.workflow.GitLabCommitStatusStep.access$400(GitLabCommitStatusStep.java:34) [2023-01-28T08:57:35.744Z] at com.dabsquared.gitlabjenkins.workflow.GitLabCommitStatusStep$GitLabCommitStatusStepExecution$1.onSuccess(GitLabCommitStatusStep.java:103) [2023-01-28T08:57:35.744Z] at org.jenkinsci.plugins.workflow.cps.CpsBodyExecution$SuccessAdapter.receive(CpsBodyExecution.java:372) [2023-01-28T08:57:35.744Z] at com.cloudbees.groovy.cps.Outcome.resumeFrom(Outcome.java:73) [2023-01-28T08:57:35.744Z] at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:155) [2023-01-28T08:57:35.744Z] at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152) [2023-01-28T08:57:35.744Z] at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136) [2023-01-28T08:57:35.744Z] at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275) [2023-01-28T08:57:35.744Z] at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:152) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:420) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:95) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:330) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:294) [2023-01-28T08:57:35.745Z] at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67) [2023-01-28T08:57:35.745Z] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [2023-01-28T08:57:35.745Z] at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139) [2023-01-28T08:57:35.745Z] at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) [2023-01-28T08:57:35.745Z] at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68) [2023-01-28T08:57:35.745Z] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [2023-01-28T08:57:35.745Z] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [2023-01-28T08:57:35.745Z] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [2023-01-28T08:57:35.745Z] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [2023-01-28T08:57:35.745Z] at java.base/java.lang.Thread.run(Thread.java:829) [2023-01-28T08:57:35.755Z] Finished: FAILURE It appears, as if Jenkins suddenly cannot handle any exceptions that originate from the pipeline script itself. This behavior is new in our opinion.

          degelma Is this still happening in recent version of the workflow plugin stack ? In particular since https://github.com/jenkinsci/workflow-job-plugin/releases/tag/1301.v054d9cea_9593 ?

          Allan BURDAJEWICZ added a comment - degelma Is this still happening in recent version of the workflow plugin stack ? In particular since https://github.com/jenkinsci/workflow-job-plugin/releases/tag/1301.v054d9cea_9593 ?

          Marian Degel added a comment -

          allan_burdajewicz: We have not been able to observe this behavior any longer.

          It's hard to give a specific timepoint, as the issues were appearing infrequently and the only way easy way to reproduce them was our sample from JENKINS-70080.
          Thus, the main issues stopped for us around the time when JENKINS-70080 was fixed, as this was the main culprit.

          I think we had seen a few random occasions after that, but they also quickly stopped, which would coincide with your mentioned plugin patch.
          We haven't seen the issue reappearing since then.

          Marian Degel added a comment - allan_burdajewicz : We have not been able to observe this behavior any longer. It's hard to give a specific timepoint, as the issues were appearing infrequently and the only way easy way to reproduce them was our sample from JENKINS-70080 . Thus, the main issues stopped for us around the time when JENKINS-70080 was fixed, as this was the main culprit. I think we had seen a few random occasions after that, but they also quickly stopped, which would coincide with your mentioned plugin patch. We haven't seen the issue reappearing since then.

          Devin Nusbaum added a comment - - edited

          I wonder if my proposed fix for JENKINS-71692 would also address this case, or if it is distinct. To check, we would have to roll back the fix associated with JENKINS-70080 and then see if the reproducer above triggers CpsVmExecutorService.reportProblem (eventually) or really only happens during serialization. If the former is true, I think this can be closed as a duplicate of JENKINS-71692, otherwise we would need to look into making changes so that exceptions here end up making their way to CpsVmExecutorService.reportProblem, at least in this case.

          From a quick look I suspect that CpsThreadGroup.propagateToWorkflow instead needs to be treated as a fatal VM thread error that kills the build, or only logged while the Pipeline is allowed to continue running normally.

          Devin Nusbaum added a comment - - edited I wonder if my proposed fix for JENKINS-71692 would also address this case, or if it is distinct. To check, we would have to roll back the fix associated with JENKINS-70080 and then see if the reproducer above triggers CpsVmExecutorService.reportProblem (eventually) or really only happens during serialization. If the former is true, I think this can be closed as a duplicate of JENKINS-71692 , otherwise we would need to look into making changes so that exceptions here end up making their way to CpsVmExecutorService.reportProblem , at least in this case. From a quick look I suspect that CpsThreadGroup.propagateToWorkflow instead needs to be treated as a fatal VM thread error that kills the build, or only logged while the Pipeline is allowed to continue running normally.

          Devin Nusbaum added a comment -

          Oh and the NPE in GitLabCommitStatusStep is kind of similar, but looks like a distinct issue. https://github.com/jenkinsci/gitlab-plugin/pull/1410 should have fixed it (really the plugin could probably use a simpler API like BodyExecutionCallback.TailCall), but in general we should make CpsBodyExecution.onSuccess and CpsBodyExecution.onFailure robust against exceptions thrown by the callbacks.

          Devin Nusbaum added a comment - Oh and the NPE in GitLabCommitStatusStep is kind of similar, but looks like a distinct issue. https://github.com/jenkinsci/gitlab-plugin/pull/1410 should have fixed it (really the plugin could probably use a simpler API like BodyExecutionCallback.TailCall ), but in general we should make CpsBodyExecution.onSuccess and CpsBodyExecution.onFailure robust against exceptions thrown by the callbacks.

          Devin Nusbaum added a comment -

          Ok, I was able to confirm that both the issue in the description and the one with the GitLab plugin BodyExecutionCallback are fixed by my proposed fix in JENKINS-71692, so I am closing this as a duplicate of that issue. Thanks for your report degelma!

          Note though my reproduction of the issue in the description required a slightly different approach: I had to load the class indirectly via load otherwise things failed before node even started. Unfortunately I do not think we can fully reproduce that issue in a test, at least not trivially, because you cannot simply throw a Throwable from methods like writeReplace or writeObject to get the same behavior, it has to come from a place where jboss-marshalling does not catch the exception and wrap it in a non-Throwable so we can bypass this logic. I do still think propagateErrorToWorkflow seems problematic, but as long as it allows all of the threads to keep running I guess it should work fine.

          Devin Nusbaum added a comment - Ok, I was able to confirm that both the issue in the description and the one with the GitLab plugin BodyExecutionCallback are fixed by my proposed fix in JENKINS-71692 , so I am closing this as a duplicate of that issue. Thanks for your report degelma ! Note though my reproduction of the issue in the description required a slightly different approach: I had to load the class indirectly via load otherwise things failed before node even started. Unfortunately I do not think we can fully reproduce that issue in a test, at least not trivially, because you cannot simply throw a Throwable from methods like writeReplace or writeObject to get the same behavior, it has to come from a place where jboss-marshalling does not catch the exception and wrap it in a non- Throwable so we can bypass this logic . I do still think propagateErrorToWorkflow seems problematic, but as long as it allows all of the threads to keep running I guess it should work fine.

            Unassigned Unassigned
            degelma Marian Degel
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: