Uploaded image for project: 'Jenkins'
  1. Jenkins
  2. JENKINS-40057

Stage view is completely missing for long durations after restarts

      When running any pipeline job, I can see that the build progresses but stages blocks don't always show up on the main job screen (localhost:8080/job/pipeline1). Nor do I see stages when pressing 'Full stage view' (getting a white screen). The console, pipeline syntax and Jenkins logs are OK and show no interesting remarks, and the jobs run to completion.

      This behavior occurs at the beginning of a day; usually this is resolved during the day at noon/afternoon when running some pipeline job with stages (meaning, I can see the stage blocks, suddenly, apparently without having done anything). It should be noted that the Jenkins windows server (the OS) is being restarted every night at 10PM (this is mandatory and beyond my control). I'm not sure if this is related, as all other server and plugin functionalities are fine in the mornings.

      I reinstalled the stage view plugin several times now, to no avail.

      After extensive searches all over the net, I can't find anything on this particular issue and would like to ask for directions on how to proceed.

          [JENKINS-40057] Stage view is completely missing for long durations after restarts

          Sam Van Oort added a comment -

          From the description, the serverside components work but the stage view frontend is broken. A couple questions:

          • What web browser are you using?
          • Are there any errors in the jenkins log upon startup?
          • Are there any browser errors or pages that fail to load when you visit the page for your job?
          • Is the pipeline rest-api plugin installed and working?

          Sam Van Oort added a comment - From the description, the serverside components work but the stage view frontend is broken. A couple questions: What web browser are you using? Are there any errors in the jenkins log upon startup? Are there any browser errors or pages that fail to load when you visit the page for your job? Is the pipeline rest-api plugin installed and working?

          Nash Paz added a comment - - edited
          • Chrome, Internet explorer, will try firefox today for the heck of it.
          • Will try to check tonight, it's a messy jenkins so we have plenty of errors from way before I installed the pipeline plugins. Any particular errors I ought to be looking for?
          • No browser errors I can think of. FYI, stage view frontend is missing when viewing from other comps, as well.
          • Pipeline rest-api is installed, seems fine in the 'installed' tab under plugins. How do I go about checking that it's working?

          Thanks a million for your time

          Nash Paz added a comment - - edited Chrome, Internet explorer, will try firefox today for the heck of it. Will try to check tonight, it's a messy jenkins so we have plenty of errors from way before I installed the pipeline plugins. Any particular errors I ought to be looking for? No browser errors I can think of . FYI, stage view frontend is missing when viewing from other comps, as well. Pipeline rest-api is installed, seems fine in the 'installed' tab under plugins. How do I go about checking that it's working? Thanks a million for your time

          Nash Paz added a comment -

          java_err.txt
          Ok, there was an error when inspecting the blank Full stage view in Chrome:
          http://build01:8080/job/stability_post_nightly1/wfapi/runs?fullStages=true&_=1480420606537 Failed to load resource: the server responded with a status of 500 (Server Error)

          please see the attached for a full error log.

          Nash Paz added a comment - java_err.txt Ok, there was an error when inspecting the blank Full stage view in Chrome: http://build01:8080/job/stability_post_nightly1/wfapi/runs?fullStages=true&_=1480420606537 Failed to load resource: the server responded with a status of 500 (Server Error) please see the attached for a full error log.

          Sam Van Oort added a comment -

          nashpaz That appears to be an error in the workflow-support plugin preventing loading the workflow data for the start node:

          Caused by: java.lang.IllegalStateException: Could not load matching start node: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: failed to load flow node from D:\Jenkins\jobs\stability_post_nightly1\builds\212\workflow\4.xml: <?xml version='1.0' encoding='UTF-8'?>

          What workflow-support version are you using? Does the issue persist if you update to the latest workflow-support plugin (and workflow-api plugin too, for good measure)?

          Sam Van Oort added a comment - nashpaz That appears to be an error in the workflow-support plugin preventing loading the workflow data for the start node: Caused by: java.lang.IllegalStateException: Could not load matching start node: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: failed to load flow node from D:\Jenkins\jobs\stability_post_nightly1\builds\212\workflow\4.xml: <?xml version='1.0' encoding='UTF-8'?> What workflow-support version are you using? Does the issue persist if you update to the latest workflow-support plugin (and workflow-api plugin too, for good measure)?

          Sam Van Oort added a comment -

          For good measure: if you are on the latest workflow-support plugin, does the issue persist if you downgrade?

          Sam Van Oort added a comment - For good measure: if you are on the latest workflow-support plugin, does the issue persist if you downgrade?

          Nash Paz added a comment - - edited

          I'm on workflow-support 2.10 (latest is 2.11 but haven't had the chance to clear it with infosec and install it yet).
          Any particular version to downgrade to?
          In the meantime I noticed that new pipelines run fine. A stupid workaround for my problem is to copy the entire script from the job that displays an empty stage view to a new job, in which case I get the view (before running I see: Stage View - "No stages b/c no pipelines have run yet"). Multiple old jobs (jobs that have run a few times) don't display the stage view.
          This is a bit of a relief because I have a presentation coming up, where things ought to look nice. Still banging my head over this though.

          Nash Paz added a comment - - edited I'm on workflow-support 2.10 (latest is 2.11 but haven't had the chance to clear it with infosec and install it yet). Any particular version to downgrade to? In the meantime I noticed that new pipelines run fine. A stupid workaround for my problem is to copy the entire script from the job that displays an empty stage view to a new job, in which case I get the view (before running I see: Stage View - "No stages b/c no pipelines have run yet"). Multiple old jobs (jobs that have run a few times) don't display the stage view. This is a bit of a relief because I have a presentation coming up, where things ought to look nice. Still banging my head over this though.

          Sam Van Oort added a comment -

          nashpaz23 Okay, so it's relieving to hear that the issue is on 2.10 and not 2.11. I'm going to reverse my advice here after having a spark of insight – I think a plugin change isn't going to fix this one. This is linked to a deserialization issue we've seen in rare cases. The pattern fits perfectly.

          There's one (or a few) runs that are a problem. You can find them with this Groovy script script in the script console – warning: it will touch all the build data for every pipeline execution run, so do it during a quiet period and be patient for it to run.

          import org.jenkinsci.plugins.workflow.job.WorkflowJob;
          import org.jenkinsci.plugins.workflow.job.WorkflowRun;
          import org.jenkinsci.plugins.workflow.graph.FlowGraphWalker;
          import org.jenkinsci.plugins.workflow.graph.FlowNode;
          import org.jenkinsci.plugins.workflow.flow.FlowExecution;
          import org.jenkinsci.plugins.workflow.graph.FlowStartNode;
          
          List<FlowNode> failures = new ArrayList<FlowNode>();
          for (WorkflowJob job : Jenkins.instance.getAllItems(WorkflowJob.class)) {
              for (WorkflowRun run : job.builds) {
                  FlowExecution fe = run.execution;
                  if (fe == null || fe.currentHeads.size() == 0) { continue;}
                  FlowNode lastNode = fe.currentHeads.get(0);
                  if (lastNode.getParents().size() == 0) {
                      out.println("Parentless end flownode!!! [Job, Build, NodeId]: ["+job.fullName+", "+run.id+", "+fe.currentHeads.get(0).id+"]");
                      failures.add(f);
                      continue;
                  }
                  for (FlowNode f : new FlowGraphWalker(run.execution)) {
                      if (f.getActions().size() == 0 && !(f instanceof FlowStartNode)) {
                          out.println("Actionless FlowNode!!! [Job, Build, NodeID, class, isLast]: ["
                              +job.fullName+", "
                              +run.id+", "
                              +f.id+", "
                              +f.class.name.replace('org.jenkinsci.plugins.','o.j.p.')
                              +", "+(f.equals(lastNode))
                          +"]")
                          failures.add(f);
                      }
                  }
              }
          }
          

          If you remove the offending run(s) pipeline will just work. I believe this will be one more datapoint for fixing the issue in question, as well.

          Sam Van Oort added a comment - nashpaz23 Okay, so it's relieving to hear that the issue is on 2.10 and not 2.11. I'm going to reverse my advice here after having a spark of insight – I think a plugin change isn't going to fix this one. This is linked to a deserialization issue we've seen in rare cases. The pattern fits perfectly. There's one (or a few) runs that are a problem. You can find them with this Groovy script script in the script console – warning: it will touch all the build data for every pipeline execution run, so do it during a quiet period and be patient for it to run. import org.jenkinsci.plugins.workflow.job.WorkflowJob; import org.jenkinsci.plugins.workflow.job.WorkflowRun; import org.jenkinsci.plugins.workflow.graph.FlowGraphWalker; import org.jenkinsci.plugins.workflow.graph.FlowNode; import org.jenkinsci.plugins.workflow.flow.FlowExecution; import org.jenkinsci.plugins.workflow.graph.FlowStartNode; List<FlowNode> failures = new ArrayList<FlowNode>(); for (WorkflowJob job : Jenkins.instance.getAllItems(WorkflowJob.class)) { for (WorkflowRun run : job.builds) { FlowExecution fe = run.execution; if (fe == null || fe.currentHeads.size() == 0) { continue ;} FlowNode lastNode = fe.currentHeads.get(0); if (lastNode.getParents().size() == 0) { out.println( "Parentless end flownode!!! [Job, Build, NodeId]: [" +job.fullName+ ", " +run.id+ ", " +fe.currentHeads.get(0).id+ "]" ); failures.add(f); continue ; } for (FlowNode f : new FlowGraphWalker(run.execution)) { if (f.getActions().size() == 0 && !(f instanceof FlowStartNode)) { out.println( "Actionless FlowNode!!! [Job, Build, NodeID, class, isLast]: [" +job.fullName+ ", " +run.id+ ", " +f.id+ ", " +f. class. name.replace( 'org.jenkinsci.plugins.' , 'o.j.p.' ) + ", " +(f.equals(lastNode)) + "]" ) failures.add(f); } } } } If you remove the offending run(s) pipeline will just work. I believe this will be one more datapoint for fixing the issue in question, as well.

          Nash Paz added a comment -

          By 'runs' do you mean jobs? Or builds of the jobs?
          I'm in an off-the-internet environment so it'll be a bit of a mess to bring this in. I could just remove all the jobs, and delete all the builds (I'm the only one currently using pipelines). Should that fix the issue?

          Nash Paz added a comment - By 'runs' do you mean jobs? Or builds of the jobs? I'm in an off-the-internet environment so it'll be a bit of a mess to bring this in. I could just remove all the jobs, and delete all the builds (I'm the only one currently using pipelines). Should that fix the issue?

          Nash Paz added a comment - - edited

          No Dice
          I've run the script, it indeed identified a bunch of runs that were stuck/disruptive. I deleted all the pipelines in the server, made sure nothing remained, then Restarted the Jenkins service. A new pipeline shows the stage view, but still after the nightly restart the stage view is gone (for pipelines that existed before the restart).
          Right now I'm dropping this issue as I truly can't find time to deal with it. The log is enough for the DevOps team, for management showoffs we can use the workaround.

          Nash Paz added a comment - - edited No Dice I've run the script, it indeed identified a bunch of runs that were stuck/disruptive. I deleted all the pipelines in the server, made sure nothing remained, then Restarted the Jenkins service. A new pipeline shows the stage view, but still after the nightly restart the stage view is gone (for pipelines that existed before the restart). Right now I'm dropping this issue as I truly can't find time to deal with it. The log is enough for the DevOps team, for management showoffs we can use the workaround.

          Joachim Herb added a comment - - edited

          I see exactly the same behavior.
          Jenkins: 2.44
          stage view plugin: 2.4
          Windows 7

          Perhaps interesting information: I changed

          -Dcom.cloudbees.workflow.rest.external.JobExt.maxRunsPerJob=50
          

          in jenkins.xml. After that the stage view was empty all the time.
          Then I changed it to

          -Dcom.cloudbees.workflow.rest.external.JobExt.maxRunsPerJob=5
          

          and restarted Jenkins.
          Until and including the completion of the 5th build of the pipeline job, the stage view did not show up. But after I started the sixth build, it worked again and also showed the first 5 builds.

          For whatever reasons, on Linux it seems to work (even with 50 builds to be shown in the stage view).

          Joachim Herb added a comment - - edited I see exactly the same behavior. Jenkins: 2.44 stage view plugin: 2.4 Windows 7 Perhaps interesting information: I changed -Dcom.cloudbees.workflow. rest .external.JobExt.maxRunsPerJob=50 in jenkins.xml. After that the stage view was empty all the time. Then I changed it to -Dcom.cloudbees.workflow. rest .external.JobExt.maxRunsPerJob=5 and restarted Jenkins. Until and including the completion of the 5th build of the pipeline job, the stage view did not show up. But after I started the sixth build, it worked again and also showed the first 5 builds. For whatever reasons, on Linux it seems to work (even with 50 builds to be shown in the stage view).

          Joachim Herb added a comment -

          Joachim Herb added a comment - possible duplicate of https://issues.jenkins-ci.org/browse/JENKINS-40096

          Same issue here, on Jenkins 2.45. Just migrated all my old chained jobs to pipeline scripts to discover they stop working after a restart. =[
          Any tips here will be nice, thanks.

          Marcos Brigante added a comment - Same issue here, on Jenkins 2.45. Just migrated all my old chained jobs to pipeline scripts to discover they stop working after a restart. =[ Any tips here will be nice, thanks.

          Joachim Herb added a comment -

          Joachim Herb added a comment - Possible duplicate of https://issues.jenkins-ci.org/browse/JENKINS-39143

          Clint Chapman added a comment -

          I think this is a duplicate of https://issues.jenkins-ci.org/browse/JENKINS-39143

          It appears some state is intermittently not being persisted to files.  The error doesn't occur until the jenkins master is restarted and it tries to reload state from disk.  It also appears this might be a windows only problem.  I don't believe this is a client/browser side issue.

          Clint Chapman added a comment - I think this is a duplicate of  https://issues.jenkins-ci.org/browse/JENKINS-39143 It appears some state is intermittently not being persisted to files.  The error doesn't occur until the jenkins master is restarted and it tries to reload state from disk.  It also appears this might be a windows only problem.  I don't believe this is a client/browser side issue.

          It not only appears on windows, but on our Linux Jenkins instance too.

          Deleting the mentioned build from the history helps to show the stage view again, but that s not exactly a solution to the cause of the problem.

          Daniel Geißler added a comment - It not only appears on windows, but on our Linux Jenkins instance too. Deleting the mentioned build from the history helps to show the stage view again, but that s not exactly a solution to the cause of the problem.

          Mor L added a comment -

          possible duplicate according to the log

          Mor L added a comment - possible duplicate according to the log

            Unassigned Unassigned
            nashpaz23 Nash Paz
            Votes:
            13 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated: