Details
-
Type:
Bug
-
Status: Open (View Workflow)
-
Priority:
Blocker
-
Resolution: Unresolved
-
Component/s: workflow-cps-plugin
-
Labels:None
-
Environment:Jenkins ver. 2.138.2 Pipeline: Groovy 2.61
-
Similar Issues:
Description
IMPORTANT: NOTE FROM A MAINTAINER:
STOP! YOUR STACK TRACE ALONE IS NOT GOING TO HELP SOLVE THIS!
(sorry to all caps but we're not going to make progress on this issue with commenters adding insufficient information)
Note from maintainer: We'd like to be able to fix this, but we really need more information to do so. Please, whenever you encounter the error in the description of the ticket, zip the build folder ($JENKINS_HOME/jobs/$PATH_TO_JOB/builds/$BUILD_NUMBER/) of the build that failed and upload it here along with the Jenkins system logs, redacting any sensitive content as necessary, and include any relevant information on frequency of the issue, steps to reproduce (did it happen after Jenkins was restarted normally, or did Jenkins crash), any messages in the Jenkins system logs that seem relevant, etc. In addition, please check service or other system level logs for Jenkins to see if there are any issues with Jenkins taking too long to shut down or anything like that. Thanks!
The main thing we are currently looking for is whether these messages are present in the Jenkins logs right before Jenkins shut down for the build which has the error:
- About to try to checkpoint the program for buildCpsFlowExecutionOwner[YourJobName/BuildNumber:YourJobName #BuildNumber]]
- Trying to save program before shutdown org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$8@RandomHash
- Finished saving program before shutdown org.jenkinsci.plugins.workflow.cps.CpsFlowExecution$8@RandomHash
If these messages are not present, it means that Jenkins was unable to save the Pipeline, so the error is expected. If that is the case, fixing the issue probably requires changes to Jenkins packaging to configure longer service timeouts on shutdown, or totally changing how PERFORMANCE_OPTIMIZED works. If the messages are present, then something else is happening.
Exception:
Creating placeholder flownodes because failed loading originals. java.io.IOException: Tried to load head FlowNodes for execution Owner[Platform Service FBI Test/1605:Platform Service FBI Test #1605] but FlowNode was not found in storage for head id:FlowNodeId 1:17 at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:678) at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:715) at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:659) at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:525) at hudson.model.RunMap.retrieve(RunMap.java:225) at hudson.model.RunMap.retrieve(RunMap.java:57) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:499) at jenkins.model.lazy.AbstractLazyLoadRunMap.load(AbstractLazyLoadRunMap.java:481) at jenkins.model.lazy.AbstractLazyLoadRunMap.getByNumber(AbstractLazyLoadRunMap.java:379) at hudson.model.RunMap.getById(RunMap.java:205) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:896) at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:907) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:65) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:57) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$ItemListenerImpl.onLoaded(FlowExecutionList.java:178) at jenkins.model.Jenkins.<init>(Jenkins.java:975) at hudson.model.Hudson.<init>(Hudson.java:85) at hudson.model.Hudson.<init>(Hudson.java:81) at hudson.WebAppMain$3.run(WebAppMain.java:233) Finished: FAILURE
Attachments
Issue Links
- is duplicated by
-
JENKINS-54676 Error Creating Flownodes causes Jenkins to not add to queue or mark builds finished
-
- Closed
-
-
JENKINS-55170 Failure to load flow node: FlowNode was not found in storage for head
-
- Closed
-
- links to
Mark Hollingsworth Thanks for the feedback. As far as I can tell though there is no way for the "Creating placeholder flownodes because failed loading originals" message to be printed to the build log unless the Pipeline is resuming, which should only happen when Jenkins starts (although bugs in historical builds that have a broken state could also cause them to attempt to resume when they shouldn't). The full stack trace would help clarify, if "jenkins.model.Jenkins.<init>" is part of the stack trace for the exception, then Jenkins is starting up.
As a general update, my current understanding of this problem based on the data I have received is that in most cases it happens for Pipelines using the PERFORMANCE_OPTIMIZED durability level when Jenkins crashes. The PERFORMANCE_OPTIMIZED durability level makes no guarantees that Pipelines will be resumable if Jenkins crashes, so this behavior is expected in that case. There is supposed to be a more user-friendly error message explaining that the Pipeline cannot be resumed for these reasons rather than just the raw error of what exactly kept the Pipeline from resuming, but that is broken because of JENKINS-53358.
I have a draft PR up to try to improve the messaging around this case, so that something like the following would be printed in these cases instead:
If anyone has seen this issue with Pipelines that are not using the PERFORMANCE_OPTIMIZED durability level, or can show Jenkins system logs and service logs that show this issue occurring with a PERFORMANCE_OPTIMIZED Pipeline even with a normal Jenkins restart, with log messages as described in this comment showing that the Pipeline was persisted before shutdown, that would be very interesting and we should create new tickets to track those things because they would be distinct issues. For some of the stack traces here, it looks like there is a problem where CpsStepContext.isReady is resulting in Pipelines being resumed, which is strange; I am not sure how to reproduce those issues and they probably need to be investigated separately.